Build a character-level tokenizer from a list of documents.
Array of document strings.
uchars (sorted unique characters), BOS token id, and vocabSize.
uchars
BOS
vocabSize
Build a character-level tokenizer from a list of documents.