Hey
,
Did you try quantization ?
There is an example for pegasus model here. I tried and it performed pretty well for summarization with an inference time decrease by 2x or 3x
Hey
,
Did you try quantization ?
There is an example for pegasus model here. I tried and it performed pretty well for summarization with an inference time decrease by 2x or 3x