Re: [agi] Re: Lexical model learning for LLMs

2023-11-27 Thread Matt Mahoney
I separated the effects of XML parsing and article reordering in cmix for the Hutter prize. Recall that I first found a baseline for enwik9 compression using generic compression models without specializations for text. Baseline enwik9 178,051,852 enwik9.zpaq -ms7ci1.1.1.1.2am (context mixing) 18

Re: [agi] Re: Lexical model learning for LLMs

2023-11-23 Thread Matt Mahoney
Shannon estimated the entropy of written English to be between 0.6 and 1.3 bits per character. The best compression of enwik9 is 0.1072, which is 0.86 bpc. I think there is room for improvement. Maybe 0.08 is possible. cmix is based on PAQ, which uses 24 bytes to store each set of context statistic

Re: [agi] Re: Lexical model learning for LLMs

2023-11-23 Thread John Rose
A compression ratio of 0.1072 seems like there is plenty of room still. What is the max ratio estimate something like 0.08 to 0.04?  Though 0.04 might be impossibly tight... even at 0.05 the resource consumption has got to exponentiate out of control unless there are overlooked discoveries y

Re: [agi] Re: Lexical model learning for LLMs

2023-11-23 Thread Matt Mahoney
I'm assuming 1 but per character compression, so 1 GB of input text is 1B bits, so 1B parameters. enwik9 compression is actually a little better. A neural network with m neurons and n connections can implement roughly 2^n/m! distinct functions, allowing the m neurons to be permuted to equivalent n

Re: [agi] Re: Lexical model learning for LLMs

2023-11-22 Thread James Bowery
Matt wrote: > I am doing experiments on learning the rules for tokenization. Back in > 2000 I experimented in finding word boundaries in text without spaces. > These occur where there is low mutual information across boundaries. > Possibly relevant is the sub-answer "Variable-length tokens" to th

Re: [agi] Re: Lexical model learning for LLMs

2023-11-22 Thread James Bowery
I'm asking because when you say "ideally" this evokes a *recurrent* neural network that approximates what I've called the NiNOR complexity of the corpus: the "ideal" "compresse

Re: [agi] Re: Lexical model learning for LLMs

2023-11-21 Thread Matt Mahoney
On Tue, Nov 21, 2023, 8:45 PM James Bowery wrote: > Please elucidate: > > > Ideally a neural network should use one parameter per bit of compressed > training data, or 1 billion > > Approximately, from information theory. A Hopfield associate memory capacity is 0.3 bits per parameter. Also I'm n

[agi] Re: Lexical model learning for LLMs

2023-11-21 Thread James Bowery
Please elucidate: > > Ideally a neural network should use one parameter per bit of compressed > training data, or 1 billion. > -- Artificial General Intelligence List: AGI Permalink: https://agi.topicbox.com/groups/agi/Tdc371ce11a040352-M92adabc6ed264bcd