ToolDial: Multi-turn Dialogue Generation Method for Tool-Augmented Language Models
Paper
•
2503.00564
•
Published
•
2
TD (ToolDial)-Llama-OP (OverallPerformance) is the same model used in ToolDial paper Overall Performance Task. We encourage you to use this model to reproduce the results. Please refer the Experiments of our github page to see how our evaluation has proceed.
[Model Summary]
[How to load the model]
from transformers import AutoTokenizer, AutoModelForCausalLM, BitsAndBytesConfig
from peft import PeftModel
device = "cuda:0"
quant_config = BitsAndBytesConfig(
load_in_4bit=True,
bnb_4bit_quant_type='nf4',
bnb_4bit_compute_dtype=torch.bfloat16,
bnb_4bit_use_double_quant=True,
)
## 1. Load the base model (we use llama3-8b-inst) with the given quantization config.
base_model = AutoModelForCausalLM.from_pretrained(
"meta-llama/Meta-Llama-3-8B-Instruct",
quantization_config=quant_config,
device_map={"": device},
)
tokenizer = AutoTokenizer.from_pretrained("HOLILAB/td-llama-op")
tokenizer.pad_token_id = tokenizer.eos_token_id
## 2. Load the lora adapter with PeftModel
model = PeftModel.from_pretrained(base_model, "HOLILAB/td-llama-op")