Re: How to handle plural?

2008-06-16 Thread Sengly Heng
elcome. Thanks. Regards, Sengly On Mon, Jun 16, 2008 at 9:14 PM, Grant Ingersoll <[EMAIL PROTECTED]> wrote: > What do your documents look like? Can you share more about the problem? > Is there some kind of structure that lets you count this information? > > -Grant > > >

How to handle plural?

2008-06-15 Thread Sengly Heng
Hello all, I am facing a problem when dealing a query such as "Finding all the documents that write about at least 5 animals"? How to handle it? Do you have any idea? Thank you. Best regards, Sengly

Re: Seeking suggestion on results re-ranking methodology

2008-06-14 Thread Sengly Heng
re: > > > http://wunderwood.org/most_casual_observer/2007/04/progressive_reranking.html > Otis > -- > Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch > > > ----- Original Message > > From: Sengly Heng <[EMAIL PROTECTED]> > > To: java-user@lu

Seeking suggestion on results re-ranking methodology

2008-06-13 Thread Sengly Heng
Dear all, I would like to seek your suggestion on re-ranking methodology. My problem is that I have a set of resulting documents to a query and each one of them with a matching score and also a list of relatedness score between each two of them. I would like to re-rank my resulting documents by do

Keyword expansion

2008-06-11 Thread Sengly Heng
Dear all, To improve the search, I will have to do keyword expansion. I am looking for a library that would help me to get the list of synonym of a term with some similarity score. Is there any lib package that can handle this? It would be great if it is in Python. I have searched the web and foun

Re: Get the total term frequency vector of a specific field from the hit results

2007-04-10 Thread Sengly Heng
Once again, thank you for your help. >> We don't really know what your problem is. Explaining that rathern >> than the solution you have thought of might render a couple of >> alternate solutions. Perhaps something could be precalculated and >> stored in the documents. Perhaps feature selection

Re: Get the total term frequency vector of a specific field from the hit results

2007-04-10 Thread Sengly Heng
Dear Karl, Thank you for taking your time in my problem. We don't really know what your problem is. Explaining that rathern than the solution you have thought of might render a couple of alternate solutions. Perhaps something could be precalculated and stored in the documents. Perhaps feature

Re: Get the total term frequency vector of a specific field from the hit results

2007-04-10 Thread Sengly Heng
ons. Please do contribute if you have any. Your help is hightly appreciated. Best, Sengly Sengly Heng wrote: > Hello all, > > I would like to extract the term freq vector from the hit results as a > total > vector not by document. > > I have searched the mailing and I fou

Get the total term frequency vector of a specific field from the hit results

2007-04-10 Thread Sengly Heng
Hello all, I would like to extract the term freq vector from the hit results as a total vector not by document. I have searched the mailing and I found many have talked about this issue but I still could not find the right solution to this matter. Everyone just suggested to look at getTermFreqVe

Re: Indexing multiple instances of the same field and counting their frequency afterward

2007-04-04 Thread Sengly Heng
:/Testindex"); TermEnum te=ISer.terms(new Term("Features","blue")); Term te1= te.term(); System.out.println("Frequency of blue "+ISer.docFreq(te1)); regards, -LM On 4/4/07, Sengly Heng <[EMAIL PROTECTED]> wrote: > > Dear all, >

Re: Indexing multiple instances of the same field and counting their frequency afterward

2007-04-04 Thread Sengly Heng
fields without knowing in advance what are the tokens that we have. Once again, thank you very much for your reply. Best regards, Sengly On 4/4/07, Erick Erickson <[EMAIL PROTECTED]> wrote: See below On 4/4/07, Sengly Heng <[EMAIL PROTECTED]> wrote: > > Dear all, > > My

Indexing multiple instances of the same field and counting their frequency afterward

2007-04-04 Thread Sengly Heng
Dear all, My problem is a little bit strange. Instead of parsing the content of the document to the indexer. I am adding one by one. Here is a piece of my code : Document doc = new Document(); doc.add(Field.Text("Features", "blue"); doc.add(Field.Text("Features","beautiful"); doc.add(Field.Text(

Re: TF-IDF API

2007-03-28 Thread Sengly Heng
Best regards, Sengly On 3/28/07, karl wettin <[EMAIL PROTECTED]> wrote: 28 mar 2007 kl. 15.24 skrev Sengly Heng: > Thank you but I still have have no clue of how to do that by using > Weka > after taking a look at its API. Let me reformulate my problem : > > I have

Re: TF-IDF API

2007-03-28 Thread Sengly Heng
fit-in this case. Thanks once again everyone. Best regards, Sengly On 3/28/07, karl wettin <[EMAIL PROTECTED]> wrote: 28 mar 2007 kl. 10.36 skrev Sengly Heng: > Does anyone of you know any Java API that directly handle this > problem? > or I have to implement from scratch. Y

Re: TF-IDF API

2007-03-28 Thread Sengly Heng
ithin the current document. For the calculation of the idf, you can use the provided formula from the "DefaultSimilarity". To get the document frequency, which is necessary to calculate the idf, you can call: reader.docFreq(term) Hope this helps... Thomas Sengly Heng wrote: > Hell

TF-IDF API

2007-03-28 Thread Sengly Heng
Hello Luceners, I have a collections of vector of terms (token) that I extracted from files. I am looking for ways to calculate TF/IDF of each term. I wanted to use Lucene to do this but Lucene is made for collections of files and in my case I have already extracted those files into vector of te