Create and randomly initialize a state dict (all model parameters) for a GPT with the given configuration and vocabulary size.
Total number of unique tokens (including the BOS token).
Model hyperparameters.
Seeded random number generator.
Standard deviation for weight initialization (default 0.08).
An object containing stateDict and a flat params array.
stateDict
params
Create and randomly initialize a state dict (all model parameters) for a GPT with the given configuration and vocabulary size.