nvidia
/

Llama-3.1-Nemotron-8B-UltraLong-1M-Instruct

Text Generation

text-generation-inference

Model card Files Files and versions

xp1992slz commited on Apr 17, 2025

Commit

27d64ec

·

verified ·

1 Parent(s): 9bdaf71

Update README.md

Files changed (1) hide show

README.md +1 -1

README.md CHANGED Viewed

@@ -7,7 +7,7 @@ license: cc-by-nc-4.0
 # Model Information
-We introduce **UltraLong-8B**, a series of ultra-long context language models designed to process extensive sequences of text (up to 1M, 2M, and 4M tokens) while maintaining competitive performance on standard benchmarks. Built on the Llama-3.1, UltraLong-8B leverages a systematic training recipe that combines efficient continued pretraining with instruction tuning to enhance long-context understanding and instruction-following capabilities. This approach enables our models to efficiently scale their context windows without sacrificing general performance.
 ## The UltraLong Models

 # Model Information
+We introduce **Nemotron-UltraLong-8B**, a series of ultra-long context language models designed to process extensive sequences of text (up to 1M, 2M, and 4M tokens) while maintaining competitive performance on standard benchmarks. Built on the Llama-3.1, UltraLong-8B leverages a systematic training recipe that combines efficient continued pretraining with instruction tuning to enhance long-context understanding and instruction-following capabilities. This approach enables our models to efficiently scale their context windows without sacrificing general performance.
 ## The UltraLong Models