Re: Boolean Search Query is not workng

2015-01-23 Thread parnab kumar
Hi, While indexing , a norm value is calculated for each field and injected in the index. This norm value is used as field level boosting which is also multiplied with other factors like tf-idf and query level boost which you specify with setBoost. so you see setting boosting is one of the s

Re: How best to compare tow sentences

2014-12-04 Thread parnab kumar
Hi, If you are comparing two song titles which are usually very short you are better of using custom set of several features rather than using one of cosine or levenstein or jaccard. You may use the combination of the following: 1. cosine sim score 2. Jaccard overlap coeff 3. how many words in th

Re: Document Term matrix

2014-11-11 Thread parnab kumar
hi, While indexing the documents , store the Term Vectors for the content field. Now for each document you will have an array of terms and their corresponding frequency in the document. Using the Index Reader you can retrieve this term vectors. Similarity between two documents can be computed as

free text suggester

2014-08-22 Thread parnab kumar
Hi, I am using lucene 4.8. I already have an index. I want to use the Free text suggester feature when a user queries the index. I am not sure how to start with this. A sample code snippet or a pointer to one would be really helpful. Thanks, Parnab

Re: Lucene newbie in need of a hint

2014-08-14 Thread parnab kumar
Have a look at this article if you have not already gone through it. http://blog.mikemccandless.com/2011/06/lucenes-near-real-time-search-is-fast.html On Thu, Aug 14, 2014 at 11:16 PM, Michael Jennings < mike.c.jenni...@gmail.com> wrote: > Hi everyone, > > I'm a bit of a Lucene newb, but a fairl

Re: bigram problem

2014-07-02 Thread parnab kumar
TF is straight forward, you can simply count the no of occurrences in the doc by simple string matching. For IDF you need to know total no of docs in the collection and the no. of docs having the bigram. reader.maxDoc() will give you the total no of docs in the collection. To calculate the number o

Re: Batch wise Indexing Structured Documents

2014-06-26 Thread parnab kumar
download lucene source code... and check the demo source files that are shipped with it ... you should find a sample indexing file... On Thu, Jun 26, 2014 at 9:27 PM, Venkata krishna wrote: > Hi, > > I have to index millions of files, that's why i am thinking batch wise > indexing is good. > >

Re: please help me

2013-09-30 Thread parnab kumar
Just add the lucene jar files in the build path of the project. On Sat, Sep 28, 2013 at 5:04 PM, sajad naderi wrote: > hi > i want run code sample of "lucene in action"book by eclipse > please tell me how configure eclipse to run those code >

Re: Delete documents base on more than one condition?

2012-12-06 Thread parnab kumar
Hi Rajashekhar, yet it is possible . You can form a Boolean Query which will match the documents as per your required conditions . Then you can delete by the respective document ids by instantiating a indexReader. You can refer to Book Lucene in Action 2nd Edition for more details . Thanks, Parn

Re: Help for multi-language support

2012-12-04 Thread parnab kumar
Hi Deepak , Lucene already has multi-language support . For any language you just need to write the custom Analyzer for that language .While indexing you can configure the indexer to use the custom analyzer as and when needed . During searching also, the same applies .You just need to provide the

Re: Variable term weighting while indexing

2012-10-01 Thread parnab kumar
t > Erick > > On Sun, Sep 30, 2012 at 8:02 AM, parnab kumar > wrote: > > Hi Erick, > > Can you please share your thoughts on the following : > > Since lucene by default does vector space scoring , the > > weight component for a term from

Re: Lucene Index File Format

2012-09-30 Thread parnab kumar
Hi, Use IndexReader instead . You can loop through the index and read one document at a time . Thanks, Parnab On Mon, Oct 1, 2012 at 10:33 AM, Selvakumar wrote: > Hi, > > I'm new to Lucene and I reading the docs on Lucene. > > > I read through the Lucene Index File Format, so to e

Re: Variable term weighting while indexing

2012-09-30 Thread parnab kumar
are > indistinguishable. > > Best > Erick > > On Sat, Sep 29, 2012 at 12:23 PM, parnab kumar > wrote: > > Hi All, > > > >I have an algorithm by which i measure the importance of a > term > > in a document . While indexing i want to store weig