LoRA: Low-Rank Adaptation of Large Language Models
Paper
β’
2106.09685
β’
Published
β’
57
A LoRA adapter for the Phi-2 language model, fine-tuned on short conversational snippets to provide short-term memory in dialogue. This adapter enables your assistant to recall and leverage the last few user/assistant turnsβwithout full fine-tuning of the 2.7 B-parameter base model.
π Live Demo on Hugging Face Spaces
β³ It takes time to generate responses since it's running on the CPU free tier
phi2-memory-deeptalks injects lightweight, low-rank corrections into the attention and MLP layers of microsoft/phi-2.
microsoft/phi-2 (causal-LM) q_proj, k_proj, v_proj, dense fc1, fc2lora_alpha: 32 lora_dropout: 0.05 ### Human:
<user message>
### Assistant:
<assistant response>
labels = input_ids per_device_train_batch_size=1 + gradient_accumulation_steps=8 adapter_model.safetensorsLoad the adapter into your Phi-2 model with just a few lines:
from transformers import AutoModelForCausalLM, AutoTokenizer
from peft import PeftModel, LoraConfig
# 1) Load base
tokenizer = AutoTokenizer.from_pretrained("microsoft/phi-2", padding_side="left")
model = AutoModelForCausalLM.from_pretrained("microsoft/phi-2")
# 2) Apply LoRA adapter
peft_config = LoraConfig.from_pretrained("sourize/phi2-memory-deeptalks")
model = PeftModel.from_pretrained(model, peft_config)
# 3) (Optional) Resize embeddings
model.base_model.resize_token_embeddings(len(tokenizer))
# 4) Generate
prompt = "### Human:\nHello, how are you?\n\n### Assistant:"
inputs = tokenizer(prompt, return_tensors="pt")
output = model.generate(**inputs, max_new_tokens=64)
print(tokenizer.decode(output[0], skip_special_tokens=True))
curl -X POST \
-H "Authorization: Bearer $HF_TOKEN" \
-H "Content-Type: application/json" \
https://huggingface.co/proxy/api-inference.huggingface.co/pipeline/text-generation/sourize/phi2-memory-deeptalks \
-d '{
"inputs": "Hello, how are you?",
"parameters": {
"max_new_tokens": 64,
"do_sample": true,
"temperature": 0.7,
"top_p": 0.9,
"return_full_text": false
}
}'
@misc{sourize_phi2_memory_deeptalks,
title = {phi2-memory-lora: LoRA adapter for Phi-2 with short-term conversational memory},
author = {Sourish},
year = {2025},
howpublished = {\url{https://huggingface.co/sourize/phi2-memory-deeptalks}},
license = {MIT}
}
Questions or feedback? Please open an issue on the repository. ```
Base model
microsoft/phi-2