Re: Storing Part of Speech information in Lucene Indices

2006-07-12 Thread Grant Ingersoll
I think Mark's idea is better for this. Although I seem to recall there being some caveats w/ multiple tokens at the same position, but I don't remember the details. I _think_ term vectors don't like it, so if you need them, you might have troubles. Perhaps a search of the mailing lists

Re: Storing Part of Speech information in Lucene Indices

2006-07-12 Thread Amit Kumar
ect: Re: Storing Part of Speech information in Lucene Indices We need to be able to search by word and POS and also have POS available for each occurrence. Appending POS to the terms will create post processing nightmare to retrieve term frequencies right? (I would have to add all the foo_NN and

Re: Storing Part of Speech information in Lucene Indices

2006-07-12 Thread mark harwood
ssage From: Amit Kumar <[EMAIL PROTECTED]> To: java-user@lucene.apache.org Sent: Wednesday, 12 July, 2006 4:15:34 PM Subject: Re: Storing Part of Speech information in Lucene Indices We need to be able to search by word and POS and also have POS available for each occurrence. Appe

Re: Storing Part of Speech information in Lucene Indices

2006-07-12 Thread Amit Kumar
We need to be able to search by word and POS and also have POS available for each occurrence. Appending POS to the terms will create post processing nightmare to retrieve term frequencies right? (I would have to add all the foo_NN and foo_ADJ etc.). I can store the POS in a parallel field

Re: Storing Part of Speech information in Lucene Indices

2006-07-12 Thread mark harwood
l Message From: Amit Kumar <[EMAIL PROTECTED]> To: java-user@lucene.apache.org Cc: Amit Kumar <[EMAIL PROTECTED]> Sent: Wednesday, 12 July, 2006 6:36:24 AM Subject: Storing Part of Speech information in Lucene Indices Hi, A new project that I am investigating lucene for needs the P

Re: Storing Part of Speech information in Lucene Indices

2006-07-12 Thread Grant Ingersoll
Hi Amit, This is definitely something you can do. What are your goals for it? Do you want to search by word and POS or do you just want POS available for post processing? You could just append the POS tag onto the end of your token as it gets indexed, something like foo_NN or foo_ADJ.

Storing Part of Speech information in Lucene Indices

2006-07-11 Thread Amit Kumar
Hi, A new project that I am investigating lucene for needs the Parts of speech information for the tokens. I can get that information using NLP techniques (GATE etc.), by pre processing the documents but I would like to store that information in the Indices. Something along the lines of