🐣 Korean Light Weight Language Model

Overview

KoMiniLM

🐣 Korean mini language model

Overview

Current language models usually consist of hundreds of millions of parameters which brings challenges for fine-tuning and online serving in real-life applications due to latency and capacity constraints. In this project, we release a light weight korean language model to address the aforementioned shortcomings of existing language models.

Quick tour

from transformers import AutoTokenizer, AutoModel

tokenizer = AutoTokenizer.from_pretrained("BM-K/KoMiniLM") # 23M model
model = AutoModel.from_pretrained("BM-K/KoMiniLM")

inputs = tokenizer("안녕 세상아!", return_tensors="pt")
outputs = model(**inputs)

Update history

** Updates on 2022.06.20 **

  • Release KoMiniLM-bert-68M

** Updates on 2022.05.24 **

  • Release KoMiniLM-bert-23M

Pre-training

Teacher Model: KLUE-BERT(base)

Object

Self-Attention Distribution and Self-Attention Value-Relation [Wang et al., 2020] were distilled from each discrete layer of the teacher model to the student model. Wang et al. distilled in the last layer of the transformer, but that was not the case in this project.

Data sets

Data News comments News article
size 10G 10G

Note

  • Performance can be further improved by adding wiki data to training.
  • The crawling and preprocessing code for the News article is here.

Config

{
  "architectures": [
    "BertForPreTraining"
  ],
  "attention_probs_dropout_prob": 0.1,
  "classifier_dropout": null,
  "hidden_act": "gelu",
  "hidden_dropout_prob": 0.1,
  "hidden_size": 384,
  "initializer_range": 0.02,
  "intermediate_size": 1536,
  "layer_norm_eps": 1e-12,
  "max_position_embeddings": 512,
  "model_type": "bert",
  "num_attention_heads": 12,
  "num_hidden_layers": 6,
  "output_attentions": true,
  "pad_token_id": 0,
  "position_embedding_type": "absolute",
  "return_dict": false,
  "torch_dtype": "float32",
  "transformers_version": "4.13.0",
  "type_vocab_size": 2,
  "use_cache": true,
  "vocab_size": 32000
}
{
  "architectures": [
    "BertForPreTraining"
  ],
  "attention_probs_dropout_prob": 0.1,
  "classifier_dropout": null,
  "hidden_act": "gelu",
  "hidden_dropout_prob": 0.1,
  "hidden_size": 768,
  "initializer_range": 0.02,
  "intermediate_size": 3072,
  "layer_norm_eps": 1e-12,
  "max_position_embeddings": 512,
  "model_type": "bert",
  "num_attention_heads": 12,
  "num_hidden_layers": 6,
  "output_attentions": true,
  "pad_token_id": 0,
  "position_embedding_type": "absolute",
  "return_dict": false,
  "torch_dtype": "float32",
  "transformers_version": "4.13.0",
  "type_vocab_size": 2,
  "use_cache": true,
  "vocab_size": 32000
}

Performance on subtasks

  • The results of our fine-tuning experiments are an average of 3 runs for each task.
cd KoMiniLM-Finetune
bash scripts/run_all_kominilm.sh
#Param Average NSMC
(Acc)
Naver NER
(F1)
PAWS
(Acc)
KorNLI
(Acc)
KorSTS
(Spearman)
Question Pair
(Acc)
KorQuaD
(Dev)
(EM/F1)
KoBERT(KLUE) 110M 86.84 90.20±0.07 87.11±0.05 81.36±0.21 81.06±0.33 82.47±0.14 95.03±0.44 84.43±0.18 /
93.05±0.04
KcBERT 108M 78.94 89.60±0.10 84.34±0.13 67.02±0.42 74.17±0.52 76.57±0.51 93.97±0.27 60.87±0.27 /
85.01±0.14
KoBERT(SKT) 92M 79.73 89.28±0.42 87.54±0.04 80.93±0.91 78.18±0.45 75.98±2.81 94.37±0.31 51.94±0.60 /
79.69±0.66
DistilKoBERT 28M 74.73 88.39±0.08 84.22±0.01 61.74±0.45 70.22±0.14 72.11±0.27 92.65±0.16 52.52±0.48 /
76.00±0.71
KoMiniLM 68M 85.90 89.84±0.02 85.98±0.09 80.78±0.30 79.28±0.17 81.00±0.07 94.89±0.37 83.27±0.08 /
92.08±0.06
KoMiniLM 23M 84.79 89.67±0.03 84.79±0.09 78.67±0.45 78.10±0.07 78.90±0.11 94.81±0.12 82.11±0.42 /
91.21±0.29
  • NSMC (Naver Sentiment Movie Corpus)
  • Naver NER (NER task on Naver NLP Challenge 2018)
  • PAWS (Korean Paraphrase Adversaries from Word Scrambling)
  • KorNLI/KorSTS (Korean Natural Language Understanding)
  • Question Pair (Paired Question)
  • KorQuAD (The Korean Question Answering Dataset)


Reference

You might also like...

Official implementation for paper "LightViT: Towards Light-Weight Convolution-Free Vision Transformers"

Official implementation for paper

LightViT Official implementation for paper "LightViT: Towards Light-Weight Convolution-Free Vision Transformers". By Tao Huang, Lang Huang, Shan You,

Nov 22, 2022

A light-weight library for adding fault tolerance to large-scale PyTorch distributed training workloads.

torchsnapshot This library is currently in Alpha and currently does not have a stable release. The API may change and may not be backward compatible.

Aug 21, 2022

This repo provides a Weight Conversion Tool which can be used to export a Yolov5 model (e.g., yolov5s.pt) to a CoreML model (e.g., yolov5s.mlmodel) with a decoding layer and an non maximum suppression layer (NMS).

This repo provides a Weight Conversion Tool which can be used to export a Yolov5 model (e.g., yolov5s.pt) to a CoreML model (e.g., yolov5s.mlmodel) with a decoding layer and an non maximum suppression layer (NMS).

English | 简体中文 Introduction This repo provides a Weight Conversion Tool which can be used to export a Yolov5 model (e.g., yolov5s.pt) to a CoreML mode

Oct 25, 2022

✍️ Korean Paraphrasing Tool Using Round-trip Translation

KoQuillBot: Korean Paraphrasing Tool Implemented with Korean ↔️ English round trip translation. Description 🤗 Huggingface Link Github Repo Korean ➡️

Aug 2, 2022

korean corpus preprocessor for PLM Pre-training

PLM 사전 학습을 위한 한국어 말뭉치 전처리 툴 사전 학습을 위한 한국어 말뭉치를 전처리 하기 위한 툴입니다. 병렬처리 라이브러리인 ray를 사용해 전처리 속도를 향상시켰습니다. 말뭉치를 kss를 통해 문장 분리 후 html 태그 제거, 맞춤법 처리, 특수 문자를 정

Oct 11, 2022

This is a compiler developed in Python. It compiles a new language MiniDecaf, a subset of C language into assamble language of RISC-V.

MiniDecaf Python Framework This is a compiler developed in Python. It compiles a new language MiniDecaf, a subset of C language into assamble language

Sep 20, 2022

RuLeanALBERT is a pretrained masked language model for the Russian language that uses a memory-efficient architecture.

RuLeanALBERT is a pretrained masked language model for the Russian language that uses a memory-efficient architecture.

RuLeanALBERT RuLeanALBERT is a pretrained masked language model for the Russian language using a memory-efficient architecture. Using the model You ca

Nov 14, 2022

[ICML 2021] Learning to Weight Imperfect Demonstrations

Learning to Weight Imperfect Demonstrations This repository contains the PyTorch code for the paper "Learning to Weight Imperfect Demonstrations" in I

Oct 27, 2022
Owner
Self-softmax
null
Light musician is a tool to convert song to its light version. With Light Player, vocals in a song can be convert to other instruments using vst plugin.

Light musician is a tool to convert song to its light version. With Light Player, vocals in a song can be convert to other instruments using vst plugi

Nardo 11 Aug 15, 2022
Code and scripts for NAACL 2022 industry track paper Fast and Light-weight Answer Text Retrieval in Dialogue Systems.

Code and scripts for NAACL 2022 industry track paper Fast and Light-weight Answer Text Retrieval in Dialogue Systems. Built on top of ColBERT

International Business Machines 4 Aug 24, 2022
A easy to use, light-weight and powerful Discord RPC client for VALORANT.

VALORANT Discord Rich Presence Client Isn't the official VALORANT Discord RPC client a bit too lame? This project is heavily motivated by colinhartiga

KR 2 Aug 3, 2022
This is an unofficial PyTorch implementation of EdgeViT in "EdgeViTs: Competing Light-weight CNNs on Mobile Devices with Vision Transformers", arXiv 2022.

This is an unofficial PyTorch implementation of EdgeViT in "EdgeViTs: Competing Light-weight CNNs on Mobile Devices with Vision Transformers", arXiv 2

null 18 Oct 28, 2022
A very light weight bank management system, built with python and sqlite3

Bank-Management A very light weight terminal-based bank management system, built with pure python and sqlite3 What is it made for? Admin side of the s

VanshKhera 0 Jun 18, 2022
Source code of "EdgeFormer: Improving Light-weight ConvNets by Learning from Vision Transformers"

EdgeFormer: Improving Light-weight ConvNets by Learning from Vision Transformers 中文版 Official PyTorch implementation of EdgeFormer Introduction EdgeFo

null 263 Nov 22, 2022
A light-weight library for adding fault tolerance to large-scale PyTorch distributed training workloads.

torchsnapshot This library is currently in Alpha and currently does not have a stable release. The API may change and may not be backward compatible.

Meta Research 69 Nov 13, 2022
Manage light-weight sandbox environments for development

Cubicle development container manager Cubicle is a program to manage light-weight containers or sandbox environments. It is intended for isolating dev

Diego Ongaro 8 Oct 21, 2022
[ECCV 2022] EdgeViTs: Competing Light-weight CNNs on Mobile Devices with Vision Transformers

EdgeViTs: Competing Light-weight CNNs on Mobile Devices with Vision Transformers Abstract Self-attention based models such as vision transformers (ViT

SamsungLabs 8 Aug 16, 2022
[ECCV 2022] EdgeViT: Competing Light-weight CNNs on Mobile Devices with Vision Transformers

EdgeViTs: Competing Light-weight CNNs on Mobile Devices with Vision Transformers Abstract Self-attention based models such as vision transformers (ViT

Future Interaction @ Samsung AI Center Cambridge 43 Nov 15, 2022