Hello everyone. I am using ibm granite 20b model for code generation task, its working pretty good but when I make my prompt and examples in prompt longer, it gets very slow… Can anyone tell how can I make it faster with longer prompts. I have already applied quantization etc
Related topics
| Topic | Replies | Views | Activity | |
|---|---|---|---|---|
| Conversational pipeline by huggingface transformer taking too long to generate output | 0 | 852 | September 27, 2023 | |
| Handle long generation in text generation pipeline | 0 | 517 | June 16, 2023 | |
| Inference slows down after restrictions | 0 | 210 | March 22, 2021 | |
| Optimize response time of model output | 0 | 695 | December 23, 2021 | |
| Closest model available to OpenAI's codex/ GitHub Copilot for code completion | 6 | 7857 | August 7, 2023 |