Instructions to use microsoft/Florence-2-large-ft with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use microsoft/Florence-2-large-ft with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("image-text-to-text", model="microsoft/Florence-2-large-ft", trust_remote_code=True)# Load model directly from transformers import AutoProcessor, AutoModelForImageTextToText processor = AutoProcessor.from_pretrained("microsoft/Florence-2-large-ft", trust_remote_code=True) model = AutoModelForImageTextToText.from_pretrained("microsoft/Florence-2-large-ft", trust_remote_code=True) - Notebooks
- Google Colab
- Kaggle
- Local Apps
- vLLM
How to use microsoft/Florence-2-large-ft with vLLM:
Install from pip and serve model
# Install vLLM from pip: pip install vllm # Start the vLLM server: vllm serve "microsoft/Florence-2-large-ft" # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:8000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "microsoft/Florence-2-large-ft", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }'Use Docker
docker model run hf.co/microsoft/Florence-2-large-ft
- SGLang
How to use microsoft/Florence-2-large-ft with SGLang:
Install from pip and serve model
# Install SGLang from pip: pip install sglang # Start the SGLang server: python3 -m sglang.launch_server \ --model-path "microsoft/Florence-2-large-ft" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "microsoft/Florence-2-large-ft", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }'Use Docker images
docker run --gpus all \ --shm-size 32g \ -p 30000:30000 \ -v ~/.cache/huggingface:/root/.cache/huggingface \ --env "HF_TOKEN=<secret>" \ --ipc=host \ lmsysorg/sglang:latest \ python3 -m sglang.launch_server \ --model-path "microsoft/Florence-2-large-ft" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "microsoft/Florence-2-large-ft", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }' - Docker Model Runner
How to use microsoft/Florence-2-large-ft with Docker Model Runner:
docker model run hf.co/microsoft/Florence-2-large-ft
Chinese very bad
hi, the released florence-2 models are english only.
Would consider add Chinese support? Since it's a "Foundation Vision Model"
I don't think Chinese is the basis. You are wrong in thinking why all models must have Chinese, by default they are in English, that is the foundation. You should not expect Chinese, you should consider English please, not Chinese
You are wrong thinking about all models should must be English.
You are wrong, this is clear. All foundation vision models must be in Slovene. This way is fair for English and Chinese.
Yes, am training a multi langual Florence2 now, so far so good, but not include Slovene, sorry.
Althought this is currently only roughly tuned, when applied on more data, it could gets better.
I don't see any CJK characters in the original vocab.json of Florence-2-large, so I guess you must extend the vocabulary before the chinese OCR finetuning task?
And I still don't understand why it can output chinese chars in your first post, did you have already extend the vocab before inference?
Oh, yes, you were right.
I rechcked the vocab, it doesn have CJK
Very strange....
Oh, yes, you were right.
I rechcked the vocab, it doesn have CJK
Very strange....
Is it mean that you have finetuned florence-2 with chinese ocr training data, but without extending the vocab? And got a pretty decent result?
Yes, yes, but as you can see, the first with raw flr2, it also can prints Chinese.
I haven't tried but I think it might can encode a Chinese character to id, can decode it back
Yes, yes, but as you can see, the first with raw flr2, it also can prints Chinese.
I haven't tried but I think it might can encode a Chinese character to id, can decode it back
Maybe we should try to explore the logic of this, I am also curious, and I am trying to understand why we can output cjk


