Hello, I'm looking for tips on how to recreate something like Google's Ngram viewer <https://books.google.com/ngrams> with elasticsearch. I have a text corpus of < 500 MB for which this kind of tool would be very valuable.
I've had some success with the shingle token filter <http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/analysis-shingle-tokenfilter.html> and the date histogram aggregation <http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/search-aggregations-bucket-datehistogram-aggregation.html>, but the results are not ideal: I'd like to get a histogram of word/phrase frequencies, not a histogram of how many documents the word/phrase occurs in. It looks like what I need is some kind of combination of shingles, term vectors <http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/docs-termvectors.html> and the date histogram aggregation, but I'm not sure how to proceed. I can improve my current approach by breaking the corpus into smaller pieces, i.e. make my documents be paragraphs instead of chapters. But what I really want is a "shingle frequency date histogram". Is this something that can be accomplished with elasticsearch? Jari -- You received this message because you are subscribed to the Google Groups "elasticsearch" group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/4b37f0a1-4611-4260-85fb-36b4d67c6076%40googlegroups.com. For more options, visit https://groups.google.com/d/optout.