This is a very small uncased tokenizer for the non-ascii version of TinyStories, based on the original TinyStories dataset. I use a WordPiece tokenizer with a vocabulary of 4096.
The tokenizer is strictly fitted to the mentioned dataset and probably won't work well in any context outside of children's stories.
Inference Providers
NEW
This model isn't deployed by any Inference Provider.
๐
Ask for provider support