Hyperparameters for the GPT model.
Maximum context length of the attention window.
Embedding dimension (width).
Number of attention heads.
Number of transformer layers (depth).
Hyperparameters for the GPT model.