RWKV-v2-RNN trained on the full Pile (no dev/val/test split)

Overview

RWKV-v2-RNN-Pile

RWKV-v2-RNN trained on the full Pile (no dev/val/test split). See https://github.com/BlinkDL/RWKV-LM for details.

Join our Discord! https://discord.gg/bDSBUMeFpc :)

Colab for fine-tuning: https://colab.research.google.com/drive/1BwceyZczs5hQr1wefmCREonEWhY-zeST

NOTE: currently sample_logits() in run.py runs on cpu which is very slow (sometime it's slower than the model itself!). Move it to cuda for significant speedup.

RWKV-4 models

Models: https://huggingface.co/BlinkDL now with 169M, 430M, 1.5B

Code: https://github.com/BlinkDL/RWKV-LM/tree/main/RWKV-v4

RWKV-v4-1.5B-Pile

RWKV-3 models

RWKV-3 code: https://github.com/BlinkDL/RWKV-v2-RNN-Pile/tree/main/RWKV-v3

1.5B model: https://huggingface.co/BlinkDL/rwkv-3-pile-1b5

430M model: https://huggingface.co/BlinkDL/rwkv-3-pile-430m

169M model: https://huggingface.co/BlinkDL/rwkv-3-pile-169m

Training log: https://wandb.ai/blinkdl/RWKV-v2-RNN-Pile

You can use the "GPT" mode to quickly build the hidden state for the "RNN" mode. (I am not doing it in the run.py here so the initial generation is slower than usual).

I am training a 1.5B RWKV-3 (Search "RWKV-3" in https://github.com/BlinkDL/RWKV-LM for details):

RWKV-v3-1.5B-Pile

Fine-tuning

Use prepare_data.py to tokenize your .txt into .npy, then run finetune.py to fine-tune the Pile model.

Reduce batch_sz if you see CUDA OOM (and change B_GROUP_FORWARD and B_GROUP_BACKWARD in src/model_train.py to make sure they can divide batch_sz).

Improving CtxLen

UPDATE: We have a new CUDA kernel in RWKV-4. Now the model can extrapolate to 3x ctxlen. And we can achieve infinite ctxlen using transformer-XL style training.

An example where we only trained using ctx1024:

RWKV-4-ctxLen

Old guide for RWKV-3:

You can set a longer ctxLen and it can adapt (try this: 768 -> 768 * 2, train for some hours, then 768 * 2 -> 768 * 3, ...).

The current models are trained with 768 ctxLen and the optimal ctxLen for RNN mode is around 1100. The positional loss goes up when ctxLen > 768 * 2. I am finetuning them to support longer ctxLen.

RWKV-2 trained with 768 ctxLen, and after 20 minutes of finetuning to 1536 ctxLen (1e-5 LR):

RWKV-ctxLen

Therefore RWKV-2 can quickly adapt to "infinite" ctxLen via N->2N->3N->... (or if you use better training methods to begin with, such as 90% GPT + 10% RNN).

The only limiting factor is, right now I am clamping K to e^60, and this will create trouble for the model when the ctxLen is very long. It can be fixed with a better CUDA kernel.

===================================================

RWKV-2 models

Model 20220615-10803 (see Releases, or https://huggingface.co/BlinkDL/rwkv-2-pile-430m/tree/main):

This is a L24-D1024 RWKV-v2-RNN trained on the Pile for 332B tokens.

!!! Change 1e-9 to 1e-8 in model.py and model_train.py (RWKV_K_EPS) for this model !!!

LAMBADA ppl 15.34 acc 42.42% (computed using https://github.com/EleutherAI/lm-evaluation-harness)

The Pile loss 2.349

===================================================

Model 20220524-4006 (see Releases):

This is a preview of a L24-D1024 RWKV-v2-RNN trained on the Pile for only 123B tokens.

LAMBADA ppl 15.88 acc 42.36% (computed using https://github.com/EleutherAI/lm-evaluation-harness)

The Pile loss 2.383

===================================================

Model 20220501-6548 (see Releases):

This is a preview of a L12-D768 RWKV-v2-RNN trained on the Pile for only 50B tokens.

Performance of the preview model:

LAMBADA ppl 52.45 acc 26.66% (computed using https://github.com/EleutherAI/lm-evaluation-harness)

The Pile loss 2.728

You might also like...

Using Python to split Documents, Executable, Image, Music, Compressed, Others on Different Folders

Downloads-Arranger Using Python to split Documents, Executable, Image, Music, Compressed, Others on Different Folders Automatically. Installation / Us

Jun 26, 2022

Split your video into equally-sized segments!

VidSplitter This is one of my first Python projects, so expect heavy usage of libraries and copied code. How to use: Install the requirements using th

Jul 26, 2022

42/36 key ergo split keyboard using a RP2040

42/36 key ergo split keyboard using a RP2040

CyberSlice Keyboard 42/36 key split keyboard using RP2040, with the same layout (in 36 key configuration) as my previous design, the Slice 36. The goa

Sep 20, 2022

Image manipulation tool to split image into tiles.

Image manipulation tool to split image into tiles.

Python Image Splitter Image manipulation tool to split images into tiles. Inspired by: pnytko/splitter Requirements python3 pillow rich # optional Usi

Sep 23, 2022

Voxel challenge at Taichi dev.

Voxel challenge at Taichi dev.

Taichi Voxel Challenge Installation Assume you have a Python 3 environment properly, you can simply run: pip3 install -r requirements.txt to install t

Aug 4, 2022

CLI utility to convert Instruqt tracks from prod - dev and back.

instruqt-converter Python based CLI tool to convert Instruqt (https://www.instruqt.com) tracks to and from a temporary state for testing. This tool ut

Aug 22, 2022

Pequeno projetos e estudos para Dev´s Junior

📌 SEJA BEM VINDO(A) A MEU REPOSITÓRIO GITHUB 📚 ©️ ™️ 🏳️‍🌈 👋 Olá! Tudo bem? Me chamo Renata ®️ ™️ ! 👩‍🎓 Sou estudante de Analise e Desenvolvimen

Jul 20, 2022

Conda meets Cargo. Keep your conda dev environments local, specific, and up-to-date.

Snakes on a Plane Conda meets Cargo. SOAP lets you easily maintain Conda environments for individual projects. Soap is configured in soap.toml. SOAP a

May 21, 2022

Esse repositório estara com os meus projetos publicos, para que de alguma forma possa contribuir para a comunidade Dev!

O programa foi desenvolvido para atender a seguinte necessidade: Eu sempre criei documentos para os desenvolvimentos de projetos, ou de alguma implant

Aug 1, 2022
Releases(20220615-10803)
  • 20220615-10803(Jun 16, 2022)

    This is a L24-D1024 RWKV-v2-RNN trained on the Pile for 332B tokens.

    !!! Change 1e-9 to 1e-8 in model.py and model_train.py (RWKV_K_EPS) for this model !!!

    ctx_len = 768 (actually works for longer ctx_len because it's a RNN and I find it can extrapolate) n_layer = 24 n_embd = 1024

    LAMBADA ppl 15.34 acc 42.42% (computed using https://github.com/EleutherAI/lm-evaluation-harness) The Pile loss 2.349

    Source code(tar.gz)
    Source code(zip)
    20220615-10803.zip(1521.68 MB)
  • 20220605-7663(Jun 6, 2022)

    This is a preview of a L24-D1024 RWKV-v2-RNN trained on the Pile for 235B tokens.

    It is NOT indicative of the final performance (which requires 300B tokens).

    !!! Change 1e-9 to 1e-8 in model.py and model_train.py (RWKV_K_EPS) for this model !!!

    ctx_len = 768 (actually works for longer ctx_len because it's a RNN and I find it can extrapolate) n_layer = 24 n_embd = 1024

    Performance of the preview model:

    LAMBADA ppl 15.3 acc 42.62% (computed using https://github.com/EleutherAI/lm-evaluation-harness)

    The Pile loss 2.361

    Source code(tar.gz)
    Source code(zip)
    RWKV-v2-RNN-Pile-20220605-7663.zip(1521.64 MB)
  • 20220524-4006(May 24, 2022)

    This is a preview of a L24-D1024 RWKV-v2-RNN trained on the Pile for only 123B tokens.

    It is NOT indicative of the final performance (which requires 300B tokens).

    ctx_len = 768 (actually works for longer ctx_len because it's a RNN and I find it can extrapolate) n_layer = 24 n_embd = 1024

    Performance of the preview model:

    LAMBADA ppl 15.88 acc 42.36% (computed using https://github.com/EleutherAI/lm-evaluation-harness)

    The Pile loss 2.383

    Source code(tar.gz)
    Source code(zip)
    RWKV-v2-RNN-Pile-20220524-4006.zip(1521.57 MB)
  • 20220515-1853(May 15, 2022)

    This is a preview of a L24-D1024 RWKV-v2-RNN trained on the Pile for only 57B tokens.

    It is NOT indicative of the final performance (which requires 300B tokens).

    ctx_len = 768 (actually works for longer ctx_len because it's a RNN and I find it can extrapolate) n_layer = 24 n_embd = 1024

    Performance of the preview model:

    LAMBADA ppl 18.63 acc 39.61% (computed using https://github.com/EleutherAI/lm-evaluation-harness)

    The Pile loss 2.417

    Source code(tar.gz)
    Source code(zip)
    RWKV-v2-RNN-Pile-20220515-1853.zip(1523.21 MB)
  • 20220501-6548(May 1, 2022)

    This is a preview of RWKV-v2-RNN trained on the Pile for only 50B tokens. It is NOT indicative of the final performance (which requires 300B tokens).

    Performance of the preview model: LAMBADA ppl 52.45 acc 26.66% (computed using https://github.com/EleutherAI/lm-evaluation-harness) The Pile loss 2.728

    Source code(tar.gz)
    Source code(zip)
    model-20220501-6548.zip(600.00 MB)
Owner
PENG Bo
http://zhihu.com/people/bopengbopeng
PENG Bo
This is a full ddos scripts set by finix dev a.k.a ZxC some scripts are stolen from stressing websites so you'll need this ❤📖

Many Scripts Will Be Added Every Day ?? Better use a vps to attack ?? finixv1 - a simple script to down TCP network brrr - a script to down home netwo

null 1 Sep 11, 2022
Pooya Mohammadi Kazaj 2 Jun 30, 2022
Referring to pymarl, qmix is implemented with RNN to cope with SMAC environment

QMIXRNN Referring to pymarl, qmix is implemented with RNN to cope with SMAC environment Run python main.py --map-name=3s5z TODO Now this code can deal

null 1 Jul 8, 2022
Busted Code is a repo full of weird scripts in programming languages that you have always wanted to test.

Busted Code Busted Code is a repo full of weird scripts in programming languages that you have always wanted to test. Languages: Python Contribute YOU

null 1 Apr 25, 2022
Script and stuff I use to help me release and split stuff up for Komga tagging.

[nao] Manga Release Scripts This repo contains stuff that I use to release and collect manga from a certain cat website. All my release use the [nao]

Aiman 1 Mar 21, 2022
Split your patch similarly to `git add -p` but supporting multiple buckets

split-patch.py This is git add -p on steroids for patches. Given a my.patch you can run ./split-patch.py my.patch You can choose in which bucket to p

null 101 Jul 28, 2022
Split up mocap, then apply poses with adjustment blending.

✂ Mocap Clipper Split up mocap, then apply poses with adjustment blending. Install 1. Download this package and unzip it in a good location 1.B (

null 7 Sep 24, 2022
Aiming at the common training datsets split, spectrum preprocessing, wavelength select and calibration models algorithm involved in the spectral analysis process

Aiming at the common training datsets split, spectrum preprocessing, wavelength select and calibration models algorithm involved in the spectral analysis process, a complete algorithm library is established, which is named opensa (openspectrum analysis).

Fu Pengyou 32 Sep 18, 2022
Split the search part and download part, and only search from google

AutoCrawler_google_url Modified from AutoCrawler Split the search part and download part, and only search from google How to use Install Chrome pip in

null 2 Sep 8, 2022
Splich is a simple file splitting tool written in python that can split a file into parts, and stitch them back together.

Splich (Split-Stitch) Splits a file into parts or stitches them back Splich, because it can split or stitch, is a simple file splitting tool written i

Shine J 2 Jun 10, 2022