How long would it take if I were to build a model like GPT-3?

Zubayer_Hossain · October 28, 2022, 4:46am

Let’s say I was to make a model like GPT-3. Well it shouldn’t be possible alone as I’m pretty sure GPT-3 is trained on the whole internet. But if I was to train the model on a single topic like health for example, how long would it take? Also is it possible to build something that would automatically scrape contents from certain websites and use those to train the machine? It’s like a machine training a machine.

This is one of those stupid questions people ask but I found no appropriate answer anywhere. That’s why I’m asking.

marcelo_schamber · October 31, 2022, 4:06am

Figure 1: Training time for GPT-3 models as a function of # GPUs. As we scale up GPU count, we see near-linear strong scaling in the time-to-train. For example, training the 1.3B model on 1B tokens takes 7.1 hr on 1 node (8xA100), and 0.5 hr on 16 nodes (128xA100) , which is a 14.4x speedup on 16x GPUs.

Zubayer_Hossain · October 31, 2022, 8:53am

I don’t have a good computer at all so I was thinking of using google colab. Someone told me that they started with colab and made something and then transferred the whole process to their personal server but colab is all I got. Now I’m wondering if it’s even possible with colab or not.

lgusm · October 31, 2022, 11:11am

No, it’s not possible to train a GPT-3 with colab.
You’ll need way more power than what’s available even paying for it.
Training these kinds of model, from scratch, will require a proper server/cloud infrastructure

What you can do is use its original weights and specialize the model on the domain you want. That would take WAY less resources. This is the idea of fine tuning

on the other part of automatically scraping for more data, this is of course doable but it’s not something on the modeling side but more on the system development side. One drawback is that the model would be way slower to train as it would have the internet latency during training. Also you wouldn’t be able to prepare a good dataset as it would try to use raw from the web which can be both dangerous and with bad quality. The best idea is having a good and cleaned dataset to train the model from the start. That’s the hard part.

Zubayer_Hossain · October 31, 2022, 3:32pm

So I would have to scrape and create a clean dataset manually on a specific topic and then use the weights form gpt-3 to train my full set?? I’m basically trying to get a perfect model that can create text on almost any topic related the the main niche. Please let me know if I understood correctly.

lgusm · November 1, 2022, 1:44pm

yes, that’s the idea, you would make GPT-3 (or any related model) learn about a specific topic

the problem is, I don’t know if there’s an open version of this model available or if they have this kind of feature on their side (ignorance on my side, sorry)

Zubayer_Hossain · November 1, 2022, 2:05pm

It isn’t free but BERT is. The problem is I don’t know how effective fine tuning BERT will be.

lgusm · November 1, 2022, 2:22pm

I don’t know either, I’ve never done this task myself from a BERT model but I don’t think it would be easy.

Nathan · February 6, 2023, 4:03am

Newb alert here

Can something like this be done (from scratch or not) via distributed computing of sorts?
Let’s say I wanted to do some similar, perhaps, even within the same niche - so we combine the computing resources.
@marcelo_schamber @lgusm

lgusm · February 6, 2023, 11:20am

Hi Nathan, welcome to the TF Forum

It HAS to be done with distributed computing (for large models of course)
Using multiple GPU/TPU in multiple machines is what AI companies to do to train their LLM