That's AMAZING!
I was just thinking about using Neo4j to store some extracted n-grams, I
previously did it with a SQLite database but maybe using a graph an
application could surf between nodes more efficiently.
One question: is it possible to download the google ngram corpus release
(or at least some part of it) for free (and legally, of course) ? I've
found just this page (
http://www.ldc.upenn.edu/Catalog/CatalogEntry.jsp?catalogId=LDC2006T13) but
it seems I would have to pay.
Cheers,
Jacopo Farina


2011/11/28 Peter Neubauer <peter.neuba...@neotechnology.com>

> Seriously cool stuff René!
>
> I would love to hear more as the project progresses! Also, maybe the
> dataset could be added to the example dataset collection for playing around
> with neo4j? WDYT?
>
> Cheers,
>
> /peter neubauer
>
> GTalk:      neubauer.peter
> Skype       peter.neubauer
> Phone       +46 704 106975
> LinkedIn   http://www.linkedin.com/in/neubauer
> Twitter      http://twitter.com/peterneubauer
>
> http://www.neo4j.org              - NOSQL for the Enterprise.
> http://startupbootcamp.org/    - Öresund - Innovation happens HERE.
>
>
> 2011/11/27 René Pickhardt <r.pickha...@googlemail.com>
>
> > Hey Everyone,
> >
> > I am curently advising two high school students for a programing project
> > for some german student competition.
> >
> > They have inserted the German google n-gram data set several GB of
> natural
> > language to a neo4j data base and used this to make sentence prediction
> to
> > improve typing speed.
> >
> > The entire project is far from being complete but there is some code
> > available on how we modelled n-grams in neo4j and what we used for
> > prediction
> >
> > Both approaches very basic and as you would expect them. Still they
> already
> > work in a decent way showing again the power of neo4j.
> >
> > We would be happy for some feedback thoghts and suggestions for further
> > improvement. Find more info in my blog post:
> >
> >
> http://www.rene-pickhardt.de/download-google-n-gram-data-set-and-neo4j-source-code-for-storing-it/
> >
> > or in the source code:
> >
> >
> http://code.google.com/p/complet/source/browse/trunk/Completion_DataCollector/src/completion_datacollector/Main.java?spec=svn64&r=64
> >
> > by the way. even though the code is just hacked down it uses hashmaps to
> > store nodes in memory and increase inserting speed. and builds the lucene
> > index later. Of course it would be even better to use the batch inserter.
> >
> > best regards René
> > --
> > --
> > mobile: +49 (0)176 6433 2481
> >
> > Skype: +49 (0)6131 / 4958926
> >
> > Skype: rene.pickhardt
> >
> > www.rene-pickhardt.de
> >  <http://www.beijing-china-blog.com>
> > _______________________________________________
> > Neo4j mailing list
> > User@lists.neo4j.org
> > https://lists.neo4j.org/mailman/listinfo/user
> >
> _______________________________________________
> Neo4j mailing list
> User@lists.neo4j.org
> https://lists.neo4j.org/mailman/listinfo/user
>
_______________________________________________
Neo4j mailing list
User@lists.neo4j.org
https://lists.neo4j.org/mailman/listinfo/user

Reply via email to