Re: How to handle plural?

2008-06-16 Thread Sengly Heng
. Regards, Sengly On Mon, Jun 16, 2008 at 9:14 PM, Grant Ingersoll [EMAIL PROTECTED] wrote: What do your documents look like? Can you share more about the problem? Is there some kind of structure that lets you count this information? -Grant On Jun 15, 2008, at 5:08 AM, Sengly Heng wrote

Re: Seeking suggestion on results re-ranking methodology

2008-06-15 Thread Sengly Heng
://wunderwood.org/most_casual_observer/2007/04/progressive_reranking.html Otis -- Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch - Original Message From: Sengly Heng [EMAIL PROTECTED] To: java-user@lucene.apache.org Sent: Friday, June 13, 2008 11:47:26 AM Subject: Seeking

How to handle plural?

2008-06-15 Thread Sengly Heng
Hello all, I am facing a problem when dealing a query such as Finding all the documents that write about at least 5 animals? How to handle it? Do you have any idea? Thank you. Best regards, Sengly

Seeking suggestion on results re-ranking methodology

2008-06-13 Thread Sengly Heng
Dear all, I would like to seek your suggestion on re-ranking methodology. My problem is that I have a set of resulting documents to a query and each one of them with a matching score and also a list of relatedness score between each two of them. I would like to re-rank my resulting documents by

Get the total term frequency vector of a specific field from the hit results

2007-04-10 Thread Sengly Heng
Hello all, I would like to extract the term freq vector from the hit results as a total vector not by document. I have searched the mailing and I found many have talked about this issue but I still could not find the right solution to this matter. Everyone just suggested to look at

Re: Get the total term frequency vector of a specific field from the hit results

2007-04-10 Thread Sengly Heng
have any. Your help is hightly appreciated. Best, Sengly Sengly Heng wrote: Hello all, I would like to extract the term freq vector from the hit results as a total vector not by document. I have searched the mailing and I found many have talked about this issue but I still could not find

Re: Get the total term frequency vector of a specific field from the hit results

2007-04-10 Thread Sengly Heng
Dear Karl, Thank you for taking your time in my problem. We don't really know what your problem is. Explaining that rathern than the solution you have thought of might render a couple of alternate solutions. Perhaps something could be precalculated and stored in the documents. Perhaps

Re: Get the total term frequency vector of a specific field from the hit results

2007-04-10 Thread Sengly Heng
Once again, thank you for your help. We don't really know what your problem is. Explaining that rathern than the solution you have thought of might render a couple of alternate solutions. Perhaps something could be precalculated and stored in the documents. Perhaps feature selection

Indexing multiple instances of the same field and counting their frequency afterward

2007-04-04 Thread Sengly Heng
Dear all, My problem is a little bit strange. Instead of parsing the content of the document to the indexer. I am adding one by one. Here is a piece of my code : Document doc = new Document(); doc.add(Field.Text(Features, blue); doc.add(Field.Text(Features,beautiful);

Re: Indexing multiple instances of the same field and counting their frequency afterward

2007-04-04 Thread Sengly Heng
fields without knowing in advance what are the tokens that we have. Once again, thank you very much for your reply. Best regards, Sengly On 4/4/07, Erick Erickson [EMAIL PROTECTED] wrote: See below On 4/4/07, Sengly Heng [EMAIL PROTECTED] wrote: Dear all, My problem is a little bit strange

Re: Indexing multiple instances of the same field and counting their frequency afterward

2007-04-04 Thread Sengly Heng
); TermEnum te=ISer.terms(new Term(Features,blue)); Term te1= te.term(); System.out.println(Frequency of blue +ISer.docFreq(te1)); regards, -LM On 4/4/07, Sengly Heng [EMAIL PROTECTED] wrote: Dear all, My problem is a little bit strange. Instead of parsing the content

TF-IDF API

2007-03-28 Thread Sengly Heng
Hello Luceners, I have a collections of vector of terms (token) that I extracted from files. I am looking for ways to calculate TF/IDF of each term. I wanted to use Lucene to do this but Lucene is made for collections of files and in my case I have already extracted those files into vector of

Re: TF-IDF API

2007-03-28 Thread Sengly Heng
. For the calculation of the idf, you can use the provided formula from the DefaultSimilarity. To get the document frequency, which is necessary to calculate the idf, you can call: reader.docFreq(term) Hope this helps... Thomas Sengly Heng wrote: Hello Luceners, I have a collections of vector

Re: TF-IDF API

2007-03-28 Thread Sengly Heng
fit-in this case. Thanks once again everyone. Best regards, Sengly On 3/28/07, karl wettin [EMAIL PROTECTED] wrote: 28 mar 2007 kl. 10.36 skrev Sengly Heng: Does anyone of you know any Java API that directly handle this problem? or I have to implement from scratch. You can also try

Re: TF-IDF API

2007-03-28 Thread Sengly Heng
. Thanks once again everyone. Best regards, Sengly On 3/28/07, karl wettin [EMAIL PROTECTED] wrote: 28 mar 2007 kl. 10.36 skrev Sengly Heng: Does anyone of you know any Java API that directly handle this problem? or I have to implement from scratch. You can also try