Re: Boolean Search Query is not workng

2015-01-23 Thread parnab kumar
Hi, While indexing , a norm value is calculated for each field and injected in the index. This norm value is used as field level boosting which is also multiplied with other factors like tf-idf and query level boost which you specify with setBoost. so you see setting boosting is one of the

Re: How best to compare tow sentences

2014-12-04 Thread parnab kumar
Hi, If you are comparing two song titles which are usually very short you are better of using custom set of several features rather than using one of cosine or levenstein or jaccard. You may use the combination of the following: 1. cosine sim score 2. Jaccard overlap coeff 3. how many words in

Re: Document Term matrix

2014-11-11 Thread parnab kumar
hi, While indexing the documents , store the Term Vectors for the content field. Now for each document you will have an array of terms and their corresponding frequency in the document. Using the Index Reader you can retrieve this term vectors. Similarity between two documents can be computed

free text suggester

2014-08-22 Thread parnab kumar
Hi, I am using lucene 4.8. I already have an index. I want to use the Free text suggester feature when a user queries the index. I am not sure how to start with this. A sample code snippet or a pointer to one would be really helpful. Thanks, Parnab

Re: Lucene newbie in need of a hint

2014-08-14 Thread parnab kumar
Have a look at this article if you have not already gone through it. http://blog.mikemccandless.com/2011/06/lucenes-near-real-time-search-is-fast.html On Thu, Aug 14, 2014 at 11:16 PM, Michael Jennings mike.c.jenni...@gmail.com wrote: Hi everyone, I'm a bit of a Lucene newb, but a fairly

Re: bigram problem

2014-07-02 Thread parnab kumar
TF is straight forward, you can simply count the no of occurrences in the doc by simple string matching. For IDF you need to know total no of docs in the collection and the no. of docs having the bigram. reader.maxDoc() will give you the total no of docs in the collection. To calculate the number

Re: Batch wise Indexing Structured Documents

2014-06-26 Thread parnab kumar
download lucene source code... and check the demo source files that are shipped with it ... you should find a sample indexing file... On Thu, Jun 26, 2014 at 9:27 PM, Venkata krishna venkat1...@gmail.com wrote: Hi, I have to index millions of files, that's why i am thinking batch wise

Re: please help me

2013-09-30 Thread parnab kumar
Just add the lucene jar files in the build path of the project. On Sat, Sep 28, 2013 at 5:04 PM, sajad naderi sajad_nader...@yahoo.comwrote: hi i want run code sample of lucene in actionbook by eclipse please tell me how configure eclipse to run those code

Re: Delete documents base on more than one condition?

2012-12-06 Thread parnab kumar
Hi Rajashekhar, yet it is possible . You can form a Boolean Query which will match the documents as per your required conditions . Then you can delete by the respective document ids by instantiating a indexReader. You can refer to Book Lucene in Action 2nd Edition for more details . Thanks,

Re: Help for multi-language support

2012-12-04 Thread parnab kumar
Hi Deepak , Lucene already has multi-language support . For any language you just need to write the custom Analyzer for that language .While indexing you can configure the indexer to use the custom analyzer as and when needed . During searching also, the same applies .You just need to provide

Re: Variable term weighting while indexing

2012-10-01 Thread parnab kumar
Erick On Sun, Sep 30, 2012 at 8:02 AM, parnab kumar parnab.2...@gmail.com wrote: Hi Erick, Can you please share your thoughts on the following : Since lucene by default does vector space scoring , the weight component for a term from the document is nothing

Re: Variable term weighting while indexing

2012-09-30 Thread parnab kumar
, otherwise the words are indistinguishable. Best Erick On Sat, Sep 29, 2012 at 12:23 PM, parnab kumar parnab.2...@gmail.com wrote: Hi All, I have an algorithm by which i measure the importance of a term in a document . While indexing i want to store weight with respect

Re: Lucene Index File Format

2012-09-30 Thread parnab kumar
Hi, Use IndexReader instead . You can loop through the index and read one document at a time . Thanks, Parnab On Mon, Oct 1, 2012 at 10:33 AM, Selvakumar vvekselva...@gmail.com wrote: Hi, I'm new to Lucene and I reading the docs on Lucene. I read through the Lucene Index