How Text Generation Models Work Technically?

We all know that Text generation models like GPT-3, BLOOM, LAMDA, PaLM have very complex architecture when it comes to analyzing prompt, filtering, understanding the context and generating text. They have large corpus of Training data, then they are trained on a cluster of supercomputer, and then finetuned for various task. This is how it generally works.

But, what are the ways to work on the various section of Architectures? Does it use Two adversarial Convolution Neural Networks (Generative Adversarial Network), or Recurrent Neural Network, or LSTM Neural Network?

I am trying to work on my own version of such LLMs but I need a lot more information on this one.

Thanks in advance for replies.