Tokenize a document string into an array of token ids, surrounded by BOS.
The document string.
Sorted unique characters (from buildTokenizer).
The BOS token id.
Array of token ids: [BOS, ...charIds, BOS].
[BOS, ...charIds, BOS]
Tokenize a document string into an array of token ids, surrounded by BOS.