This is a very small uncased tokenizer for the non-ascii version of TinyStories, based on the original TinyStories dataset. I use a WordPiece tokenizer with a vocabulary of 4096.

The tokenizer is strictly fitted to the mentioned dataset and probably won't work well in any context outside of children's stories.

Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support