There are many uses for shingles.
I've used them to find common phrases in text, which is my
understanding of what you try to achieve. It works rather well, is a
very simple solution and easy on resources compared to real semantic
analysis.
You'll be getting a lot of shingles such as "the
Hi Erick,
If you want to query, you should know the "phase" right? but I want to
discover the phase, or which words came together so often and by the natural
way, we use that as a phase.
On Tue, Oct 6, 2009 at 8:12 PM, Erick Erickson wrote:
> Maybe I'm missing the problem entirely, but can you
Hi Karl,
I think shingle is designed to make the phase search faster, it'll generate
a lot of "seemed like" phase by pos only and completely disregard the
meaning, that's not good enough.
Regards,
Andrew
On Tue, Oct 6, 2009 at 11:51 PM, Karl Wettin wrote:
> Hi Andrew,
>
> I think you are looki
Right, Vasu, I think NLP is good, I should take some time to look at that.
Thanks.
On Tue, Oct 6, 2009 at 8:10 PM, Vasudevan Comandur wrote:
> Hi,
>
> Take the NLP route and use modules like POS tagger and NP chunker.
>
> OpenNLP has a stack for English language. Try to use them.
>
> Regards
Hi Andrew,
I think you are looking for the shingle package in contrib/analyzers.
karl
6 okt 2009 kl. 13.42 skrev Andrew Zhang:
Hi guys,
The requirement is very simple here, e.g. for this sentence, 'The NBA
formally announced its new *social media* guidelines Wednesday', I
want to
t
Maybe I'm missing the problem entirely, but can you use phrase queries?or
one of the Span* queries with a slop of 0 when searching?
Best
Erick
On Tue, Oct 6, 2009 at 7:42 AM, Andrew Zhang wrote:
> Hi guys,
>
> The requirement is very simple here, e.g. for this sentence, 'The NBA
> formally anno
Hi,
Take the NLP route and use modules like POS tagger and NP chunker.
OpenNLP has a stack for English language. Try to use them.
Regards
Vasu
On Tue, Oct 6, 2009 at 5:12 PM, Andrew Zhang wrote:
> Hi guys,
>
> The requirement is very simple here, e.g. for this sentence, 'The NBA
> form
Hi guys,
The requirement is very simple here, e.g. for this sentence, 'The NBA
formally announced its new *social media* guidelines Wednesday', I want to
treat '*social media*' as a whole phase term. The default english analyzers
came with lucene all deal with single word, so it you want to get t