A simple (greedy compression) probabilistic inference algorithm for 
determining context relevant mutual information that requires O[n log(n)] 
connections and a similar time complexity for mutual information 
calculations, where n is the length of the phrase.

It's just an inference algorithm, nothing close to anything human with 
goals, etc.  But it was useful for understanding sensor-network streams 
and their relation to human language stories.  This is a short paper on 
the topic:


The idea of using commonsense knowledge (at a language story level) and 
sensors is developed more fully in my masters thesis, which is also on 
that page, but again, it's just an inference algorithm.


On Tue, 15 May 2007, Matt Mahoney wrote:

) --- Richard Loosemore <[EMAIL PROTECTED]> wrote:
) > Matt Mahoney wrote:
) > > --- Tom McCabe <[EMAIL PROTECTED]> wrote:
) > >> --- Matt Mahoney <[EMAIL PROTECTED]> wrote:
) > >>> Personally, I would experiment with
) > >>> neural language models that I can't currently
) > >>> implement because I lack the
) > >>> computing power.
) > >> Could you please describe these models?
) > > 
) > > Essentially models in which neurons (with time delays) respond to
) > increasingly
) > > abstract language concepts: letters, syllables, words, grammatical roles,
) > > phrases, and sentence structures.  This is not really new.  Models like
) > these
) > > have been proposed in the 1980's but were never fully implemented due to
) > lack
) > > of computing power.  These constraints resulted in connectionist systems
) > in
) > > which each concept mapped to a single neuron.  Such models can't learn
) > well. 
) > > There is no mechanism for adding to the vocabulary, for instance.  I
) > believe
) > > you need at least hundreds of neurons per concept, where each neuron may
) > > correlate weakly with hundreds of different concepts.  Exactly how many, I
) > > don't know.  That is why I need to experiment.
) > > 
) > > One problem that bothers me is the disconnect between the information
) > > theoretic estimates of the size of a language model, about 10^9 bits, and
) > > models based on neuroanatomy, perhaps 10^14 bits.  Experiments might tell
) > us
) > > what's wrong with our neural models.  But how to do such experiments?  A
) > fully
) > > connected network of 10^9 connections trained on 10^9 bits of data would
) > > require about 10^18 operations, about a year on a PC.  There are
) > optimizations
) > > I could do, such as activating only a small fraction of the neurons at one
) > > time, but if the model fails, is it because of these optimizations or
) > because
) > > you really do need 10^14 connections, or the training data is bad, or
) > > something else?
) > 
) > I was building connectionist models of language in the late 80s, early 
) > 90s, and your characterizations are a little bit off, here.
) > 
) > We used distributed models in which single neurons certainly did not 
) > correspond to single concepts.  They learned well, and there was no 
) > problem getting new vocabulary items into them.  I was writing C code on 
) > an early model Macintosh computer that was about 1000th the power of the 
) > ones available today.  You don't really need hundreds of neurons per 
) > concept:  a few hundred was the biggest net I ever built, and it could 
) > cope with about 200 vocabulary items, IIRC.
) > 
) > The *real* problem are: (1) encoding the structural aspects of sentences 
) > in abstract ways, (2) encoding layered concepts (in which a concept 
) > learned today can be the basis for new concepts learned tomorrow) and 
) > (3) solving the type-token problem in such a way that the system can 
) > represent more than one instance of a concept at once.
) > 
) > In essence, my research since then has been all about finding a good way 
) > to solve these issues whilst retaining the immense learning power of 
) > those early connectionist systems.
) > 
) > It's doable.  Just have to absorb ten tons of research material and then 
) > spit it out in the right way whilst thinking outside the box.  All in a 
) > day's work.  ;-)
) > 
) > 
) > 
) > Richard Loosemore.
) I doubt you could model sentence structure usefully with a neural network
) capable of only a 200 word vocabulary.  By the time children learn to use
) complete sentences they already know thousands of words after exposure to
) hundreds of megabytes of language.  The problem seems to be about O(n^2).  As
) you double the training set size, you also need to double the number of
) connections to represent what you learned.
) -- Matt Mahoney, [EMAIL PROTECTED]
) -----
) This list is sponsored by AGIRI: http://www.agiri.org/email
) To unsubscribe or change your options, please go to:
) http://v2.listbox.com/member/?&;

This list is sponsored by AGIRI: http://www.agiri.org/email
To unsubscribe or change your options, please go to:

Reply via email to