Scene Text Recognition with Permuted Autoregressive Sequence Models
Paper โข 2207.06966 โข Published โข 1
GGUF conversions of PARSeq (ECCV 2022) for use with CrispEmbed.
PARSeq is a scene text recognition model that reads text from natural images (signs, labels, documents). It recognizes 94 printable ASCII characters (digits, letters, punctuation).
| File | Variant | Params | Size | Notes |
|---|---|---|---|---|
parseq-f32.gguf |
Base | 24M | 91 MB | Full precision |
parseq-q8_0.gguf |
Base | 24M | 24 MB | Best quantized |
parseq-q4_k.gguf |
Base | 24M | 13 MB | Smallest base |
parseq-tiny-f16.gguf |
Tiny | 6M | 12 MB | Half precision |
parseq-tiny-q8_0.gguf |
Tiny | 6M | 6 MB | Smallest overall |
All quantization levels produce identical output on test images.
# CLI
crispembed -m parseq-q8_0.gguf --ocr image.png
# Auto-download
crispembed -m parseq --auto-download --ocr image.png
from crispembed import CrispMathOcr
ocr = CrispMathOcr("parseq-q8_0.gguf")
text = ocr.recognize("sign.png")
| Dataset | Accuracy |
|---|---|
| IIIT5k | 99.1% |
| SVT | 97.9% |
| IC13-1015 | 98.1% |
| IC15-2077 | 89.2% |
| SVTP | 96.9% |
| CUTE80 | 98.6% |
models/convert-parseq-to-gguf.py from CrispEmbed8-bit
16-bit
32-bit