Hi everyone,
Iām facing an issue while fine-tuning the LLAVA model using LoRA on a machine with limited GPU resources. To accommodate the small GPU, Iāve been experimenting with 4-bit precision. However, I consistently encounter the following error:
RuntimeError: expected scalar type BFloat16 but found Float
This occurs specifically in the vision model, particularly during the LayerNorm operation in the forward pass.
Key Configuration:
- Model:
liuhaotian/llava-v1.6-vicuna-7b - Vision Tower:
openai/clip-vit-large-patch14-336 - LoRA: Enabled with
lora_r=128,lora_alpha=256 - Precision: 4-bit (
bits=4) - Other Settings:
bf16=True,gradient_checkpointing=True
Problem:
Iām running into a data type mismatch where some layers (e.g., LayerNorm) expect BFloat16, but are instead using Float32, which triggers the error. When I inspect the model, I find a mix of data types across the layers:
- 166 layers in float32
- 744 layers in bfloat16
- 369 layers in uint8
My Situation:
Iām trying to modify LLAVA for my own use case and need to run it in a ādebug modeā to test and tweak the code. Since I have limited GPU resources, Iām using low precision (4-bit) to make debugging feasible. However, this data type mismatch is proving to be a roadblock.
My Questions:
- How can I debug or fine-tune LLAVA with LoRA on a small GPU without running into these precision-related errors?
- Should I be manually converting specific layers to avoid the mismatch between bfloat16 and float32?
- Is there a general approach to running LoRA fine-tuning in a lightweight ādebug modeā for code experimentation without worrying about outputs or precision mismatches?
Any guidance or suggestions would be greatly appreciated!
Thanks in advance!