Re: [ot] 3B / 3GB quantized edge language model

2023-09-22 Thread Undescribed Horrific Abuse, One Victim & Survivor of Many
https://huggingface.co/cerebras/btlm-3b-8k-base/discussions/25 Context length schedule and performance #25 by baffo32 - opened less than a minute ago Discussion > Hey, > > I’m looking at your chart showing incredible performance improvement greatly > extending the context length with a smaller

Re: [ot] 3B / 3GB quantized edge language model

2023-09-22 Thread Undescribed Horrific Abuse, One Victim & Survivor of Many
Long Sequence Lengths To enable long sequence applications, we use ALiBi position embeddings and trained on 470B tokens at the context length of 2,048 followed by 157B of tokens trained at 8,192 context length. To assess BTLM’s long sequence capability, we evaluate it on SlimPajama test set with 3

[ot] 3B / 3GB quantized edge language model

2023-09-22 Thread Undescribed Horrific Abuse, One Victim & Survivor of Many
I might be wondering if this is useful for hobby finetuning. Only 8k context length though (some models have 128k now), although ALiBi is purported to be extendable to longer context lengths than it was trained on. https://huggingface.co/papers/2309.11568 BTLM-3B-8K: 7B Parameter Performance in a