https://huggingface.co/cerebras/btlm-3b-8k-base/discussions/25
Context length schedule and performance
#25
by baffo32 - opened less than a minute ago
Discussion
> Hey,
>
> I’m looking at your chart showing incredible performance improvement greatly
> extending the context length with a smaller
Long Sequence Lengths
To enable long sequence applications, we use ALiBi position embeddings
and trained on 470B tokens at the context length of 2,048 followed by
157B of tokens trained at 8,192 context length. To assess BTLM’s long
sequence capability, we evaluate it on SlimPajama test set with 3
I might be wondering if this is useful for hobby finetuning. Only 8k
context length though (some models have 128k now), although ALiBi is
purported to be extendable to longer context lengths than it was
trained on.
https://huggingface.co/papers/2309.11568
BTLM-3B-8K: 7B Parameter Performance in a