[Neo4j] google n grams data set in neo4j

René Pickhardt Sun, 27 Nov 2011 05:43:57 -0800

Hey Everyone,

I am curently advising two high school students for a programing project
for some german student competition.


They have inserted the German google n-gram data set several GB of natural
language to a neo4j data base and used this to make sentence prediction to
improve typing speed.

The entire project is far from being complete but there is some code
available on how we modelled n-grams in neo4j and what we used for
prediction

Both approaches very basic and as you would expect them. Still they already
work in a decent way showing again the power of neo4j.

We would be happy for some feedback thoghts and suggestions for further
improvement. Find more info in my blog post:
http://www.rene-pickhardt.de/download-google-n-gram-data-set-and-neo4j-source-code-for-storing-it/

or in the source code:
http://code.google.com/p/complet/source/browse/trunk/Completion_DataCollector/src/completion_datacollector/Main.java?spec=svn64&r=64

by the way. even though the code is just hacked down it uses hashmaps to
store nodes in memory and increase inserting speed. and builds the lucene
index later. Of course it would be even better to use the batch inserter.

best regards René
-- 
--
mobile: +49 (0)176 6433 2481

Skype: +49 (0)6131 / 4958926

Skype: rene.pickhardt

www.rene-pickhardt.de
 <http://www.beijing-china-blog.com>
_______________________________________________
Neo4j mailing list
User@lists.neo4j.org
https://lists.neo4j.org/mailman/listinfo/user

[Neo4j] google n grams data set in neo4j

Reply via email to