🐜Say hello to MinimalGPT! 👋

Abhas_Kumar · May 5, 2023, 8:38am

While open-sourcing generative models - LLaMa, GPT4all, FreedomGPT, etc have paved the way to kickstart GPT locally in ordinary CPUs, they are still far from the idea of ‘Minimalist’ tiny-GPT models.

GPT4 can take up to 32k tokens input to generate the next probable output token. It has been trained for over > 600 G datasets for months on supercomputing processing capabilities. We approach a totally opposite question of - minimal resources required to train a GPT model.

With the MinimalGPT framework- the creation of GPT models (including vectorization), saving the data, and loading back from saved backup data for re-training/finetuning or inferencing, becomes a matter of single-line command prompt.

Example: Creating a model, training, and saving the data to the disk:

MinimalGPT.py -d './dataset/output_dataset.txt' -l 0.001 -ol 200 -e 4 -b 512 -s 10 -dm 128 -p 8 -ds 1 -ts 0 -te 40000 -vs 0 -ve 200000 -sd -st './models/tokenizer.mgt' -sw './models/weights.mgw'

Loading a saved model and Finetuning it:

MinimalGPT.py -d './dataset/output_dataset.txt' -l 0.00005 -ol 200 -e 1 -b 512 -s 10 -dm 128 -p 8 -ds 1 -ts 80000 -te 120000 -sd -st './models/tokenizer2.mgt' -sw './models/weights2.mgw' -lt 'tokenizer.mgt' -lw '.weights.mgw'

Or using a previously-saved model, to load and generate an output of a specified length, trained over a million samples on a GPU too.

-i -ol 500 -e 6 -b 512 -s 10 -dm 128 -p 8 -ds 1 -lt './models/tokenizer2.mgt' -lw './models/weights2.mgw'

all in a matter of a few minutes to a few hours on a CPU core locally!

Parameter specs:

Training_data: 40k + 40k fine-tuning
GPT_input: 10 tokens
Embedding_dims: 128
Stack: 1 decoder
Multi-head: 8
Vocab_size: 7k (generated and processed automatically)

Furthermore, MinimalGPT offers ways to import trained GPT models into Python script or your next project too!

Check out the project and example tutorial files to learn more about MinimalGPT!
Papers With Code: https://paperswithcode.com/paper/improving-language-understanding-by

Built on Keras💟