Matt Mahoney wrote:
--- Tom McCabe <[EMAIL PROTECTED]> wrote:
--- Matt Mahoney <[EMAIL PROTECTED]> wrote:
Personally, I would experiment with
neural language models that I can't currently
implement because I lack the
computing power.
Could you please describe these models?

Essentially models in which neurons (with time delays) respond to increasingly
abstract language concepts: letters, syllables, words, grammatical roles,
phrases, and sentence structures.  This is not really new.  Models like these
have been proposed in the 1980's but were never fully implemented due to lack
of computing power.  These constraints resulted in connectionist systems in
which each concept mapped to a single neuron. Such models can't learn well. There is no mechanism for adding to the vocabulary, for instance. I believe
you need at least hundreds of neurons per concept, where each neuron may
correlate weakly with hundreds of different concepts.  Exactly how many, I
don't know.  That is why I need to experiment.

One problem that bothers me is the disconnect between the information
theoretic estimates of the size of a language model, about 10^9 bits, and
models based on neuroanatomy, perhaps 10^14 bits.  Experiments might tell us
what's wrong with our neural models.  But how to do such experiments?  A fully
connected network of 10^9 connections trained on 10^9 bits of data would
require about 10^18 operations, about a year on a PC.  There are optimizations
I could do, such as activating only a small fraction of the neurons at one
time, but if the model fails, is it because of these optimizations or because
you really do need 10^14 connections, or the training data is bad, or
something else?

I was building connectionist models of language in the late 80s, early 90s, and your characterizations are a little bit off, here.

We used distributed models in which single neurons certainly did not correspond to single concepts. They learned well, and there was no problem getting new vocabulary items into them. I was writing C code on an early model Macintosh computer that was about 1000th the power of the ones available today. You don't really need hundreds of neurons per concept: a few hundred was the biggest net I ever built, and it could cope with about 200 vocabulary items, IIRC.

The *real* problem are: (1) encoding the structural aspects of sentences in abstract ways, (2) encoding layered concepts (in which a concept learned today can be the basis for new concepts learned tomorrow) and (3) solving the type-token problem in such a way that the system can represent more than one instance of a concept at once.

In essence, my research since then has been all about finding a good way to solve these issues whilst retaining the immense learning power of those early connectionist systems.

It's doable. Just have to absorb ten tons of research material and then spit it out in the right way whilst thinking outside the box. All in a day's work. ;-)



Richard Loosemore.


-----
This list is sponsored by AGIRI: http://www.agiri.org/email
To unsubscribe or change your options, please go to:
http://v2.listbox.com/member/?member_id=4007604&user_secret=8eb45b07

Reply via email to