Phase Extraction, mainly for English

2009-10-06 Thread Andrew Zhang
Hi guys, The requirement is very simple here, e.g. for this sentence, 'The NBA formally announced its new *social media* guidelines Wednesday', I want to treat '*social media*' as a whole phase term. The default english analyzers came with lucene all deal with single word, so it you want to get t

Re: Phase Extraction, mainly for English

2009-10-06 Thread Vasudevan Comandur
Hi, Take the NLP route and use modules like POS tagger and NP chunker. OpenNLP has a stack for English language. Try to use them. Regards Vasu On Tue, Oct 6, 2009 at 5:12 PM, Andrew Zhang wrote: > Hi guys, > > The requirement is very simple here, e.g. for this sentence, 'The NBA > form

Re: Phase Extraction, mainly for English

2009-10-06 Thread Erick Erickson
Maybe I'm missing the problem entirely, but can you use phrase queries?or one of the Span* queries with a slop of 0 when searching? Best Erick On Tue, Oct 6, 2009 at 7:42 AM, Andrew Zhang wrote: > Hi guys, > > The requirement is very simple here, e.g. for this sentence, 'The NBA > formally anno

Re: Phase Extraction, mainly for English

2009-10-06 Thread Karl Wettin
Hi Andrew, I think you are looking for the shingle package in contrib/analyzers. karl 6 okt 2009 kl. 13.42 skrev Andrew Zhang: Hi guys, The requirement is very simple here, e.g. for this sentence, 'The NBA formally announced its new *social media* guidelines Wednesday', I want to t

Re: Phase Extraction, mainly for English

2009-10-06 Thread Andrew Zhang
Right, Vasu, I think NLP is good, I should take some time to look at that. Thanks. On Tue, Oct 6, 2009 at 8:10 PM, Vasudevan Comandur wrote: > Hi, > > Take the NLP route and use modules like POS tagger and NP chunker. > > OpenNLP has a stack for English language. Try to use them. > > Regards

Re: Phase Extraction, mainly for English

2009-10-06 Thread Andrew Zhang
Hi Karl, I think shingle is designed to make the phase search faster, it'll generate a lot of "seemed like" phase by pos only and completely disregard the meaning, that's not good enough. Regards, Andrew On Tue, Oct 6, 2009 at 11:51 PM, Karl Wettin wrote: > Hi Andrew, > > I think you are looki

Re: Phase Extraction, mainly for English

2009-10-06 Thread Andrew Zhang
Hi Erick, If you want to query, you should know the "phase" right? but I want to discover the phase, or which words came together so often and by the natural way, we use that as a phase. On Tue, Oct 6, 2009 at 8:12 PM, Erick Erickson wrote: > Maybe I'm missing the problem entirely, but can you

Re: Phase Extraction, mainly for English

2009-10-06 Thread Karl Wettin
There are many uses for shingles. I've used them to find common phrases in text, which is my understanding of what you try to achieve. It works rather well, is a very simple solution and easy on resources compared to real semantic analysis. You'll be getting a lot of shingles such as "the