MLEM + Modal + nanoGPT

If you hadn't heard all the recent fuss about the ChatGPT model from, you must have been living under a rock. And you might even have seen this video from Andrej Karpathy on how those GPT models work. In this post, I will show how easy it is to train your own GPT model and also share it with your friends via a nice Streamlit app in the cloud (see this one as an example!). All you need is an idea of what you want to generate and a couple of bucks for renting a GPU if you don’t have access to your own.

  • Mike Sveshnikov
  • February 08, 20232 min read
Hero Picture

Writing dogs with nanoGPT

Preparing data

To kick off the process, you basically just need a single text file that you want your model to be trained on. For example, I often struggle with writing docs for MLEM framework, so I will try to generate those. Here you can find my code that clones repo, compiles every .md from the docs directory into a single text file and then creates a train set using the same code as an example Shakespeare dataset. I also prepended each file’s content with the path to this file, so I can condition the generation for a specific file.

Of course, for your own experiments, you can provide different data and train GPT model for a different task.

Training the model

Thanks to Andrej’s original repo, it’s as easy as cloning and running a couple of commands. My fork has some additional stuff to make it even easier.

$ git clone && cd nanoGPT/ && git checkout -b mlem origin/mlem
$ pip install -r requirements-mlem.txt

# Prepare mlem docs dataset
# Alternatively, you can compile your own training data for different task
$ python data/mlem-docs/ char

If you don’t have access to GPU, you can use to train your model without any infrastructure configuration. Just register there, wait for approval, and run this script to run the training and download the resulting model checkpoint.

$ modal token new  # approve in browser
$ python  # you can edit paths or other parameters

Or if you are already working on a machine with GPU, just run the training locally

# train model
$ python config/ --device cuda --dtype=float32 --max_iters=3000 --init_from=scratch

After training you model will be saved at out-mlemai-char/ and you can sample it with

# sample model
$ python --out_dir=out-mlemai-char --dtype=float32

Deploying your model

Now, to show off your model to friends and colleagues, we will deploy it as a Streamlit application to It’s very easy with MLEM Streamlit extension. First, we need to save the model as MLEM model - here is the script for that

$ python out-mlemai-char mlem_char

Now, setup and login into and run mlem deploy command. I also prepared a custom Streamlit application template you can use to give it more ChatGPT feel

# setup flyio
$ flyctl auth login

$ mlem deploy run flyio app -m mlem_char \
	--app_name mlem-nanogpt --scale_memory 1024 \
	--server streamlit  --server.ui_port 8080 \
	--server.server_port 8081 --server.template

After the command finishes, just go to https://<app_name> - in my case its - and start chatting.


Well, I guess if this is what generated docs look like, I still have a job! 🤣

But just for lulz, I re-generated the whole MLEM documentation with this model - you can check it out here.


Nowadays it’s really easy to recreate someone else’s work thanks to open source software. And thanks to folks like Andrej and companies like Modal and Fly now it becomes much faster to build and deploy ML models. We are happy to be part of this, with tools like MLEM, DVC, CML and others. Long live the open source!

Back to blog