How to quantize bloom with 4-bit

#268

by char-1ee - opened Oct 14, 2023

Oct 14, 2023

Hi, I noticed that there already exists bloom-int8 and bloom-fp16 models. Anyone know where can find bloom-int4 model, or how can I quantize a 4bit model locally?

lysandre

Oct 16, 2023

cc @ybelkada

ybelkada

BigScience Workshop org Oct 16, 2023

Hi @char-1ee

If you have enough CPU RAM to load the entire BLOOM model, you can easily quantize it on-the-fly in 4bit using bitsandbytes and the latest transformers package.

pip install -U bitsandbytes transformers

Simply pass load_in_4bit=True when calling from_pretrained and that should do the trick to quantize the model in 4bit precision.

Let me know how that goes for you!

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment