date:20111027

index bigger than it should be?

2011-10-27 Thread v . sevel

Hi, I have an application that has an index with 30 millions docs in it. every day, I add around 1 million docs, and I remove the oldest 1 million, to keepit stable at 30 million. for the most part doc fields are indexed and stored. each doc weighs around from a few Kb to a 1 Mb (a few Mb in so

Re: idf calculation in Lucene ?

2011-10-27 Thread Robert Muir

On Thu, Oct 20, 2011 at 3:11 PM, David Ryan wrote: > > However, in some case, when I search o'reilly , I see > > * 44.0865 = idf(title: o''reilli=4 o=1488 reilli=14 oreilli=4)* > > In this cae, How is IDF calculated? > thats a phrase or multiphrase query. in this case it sums up the idf of

Re: Lucene 3.1 search paralelism per segment doubt

2011-10-27 Thread Robert Muir

On Mon, Oct 10, 2011 at 7:02 AM, Marc Sturlese wrote: > I've read in another thread > (http://lucene.472066.n3.nabble.com/Indexing-slower-in-trunk-td3059836.html#a3062991) > /Since Lucene 2.9, Lucene works on a per segment basis when searching. Since > Lucene 3.1 it can even parallelize on multipl

Re: IndexWriter loops trying to merge using ConcurrentMergeScheduler

2011-10-27 Thread Michael McCandless

It looks like you are using BalancedSegmentMergePolicy right? And somehow it gets stuck in a state where it keeps merging the same single segment into a new segment, which is odd. Likely this is a bug in BSMP. Do you see this same looping with eg LogByteSizeMergePolicy? Note that newer versions

Re: index bigger than it should be?

2011-10-27 Thread Ian Lea

There's org.apache.lucene.index.CheckIndex which will report assorted stats about the index, as well as checking it for correctness. It can fix it too but you don't need that. I hope. Will take quite a while to run on a large index. What version of lucene? Does a before/after (or large/small) d

Re: Lucene 3.1 search paralelism per segment doubt

2011-10-27 Thread Simon Willnauer

On Thu, Oct 27, 2011 at 2:50 PM, Robert Muir wrote: > On Mon, Oct 10, 2011 at 7:02 AM, Marc Sturlese > wrote: >> I've read in another thread >> (http://lucene.472066.n3.nabble.com/Indexing-slower-in-trunk-td3059836.html#a3062991) >> /Since Lucene 2.9, Lucene works on a per segment basis when sea

Re: performance question - number of documents

2011-10-27 Thread Felipe Hummel

Hi, there are two types of query processing in document retrieval: document-at-a-time and term-at-a-time. Lucene uses document-at-a-time processing. That means the posting lists (the list of documents a word appears in) is sorted by the document IDs. This type of processing is usually better for l

Re: using lucene to find neighbouring points in an n-dimensional space

2011-10-27 Thread Felipe Hummel

For the indexing part, you can 'insert' the term multiple times (term-weight times) constructing the document String manually. That is not very typical, you would normally feed Lucene with the original documents for it to parse and index. The query processing could be done similar as you said. Jus

Re: IndexWriter loops trying to merge using ConcurrentMergeScheduler

2011-10-27 Thread alfredhong

Hi, Mike, Thanks for your analysis. You are correct in that BalancedSegmentMergePolicy is used. We previously used LogByteSizeMergePolicy but might have run into some other issues that I was involved in so weren't using it. Re: TieredMergePolicy, we'll definitely check that out when we update

Re: using lucene to find neighbouring points in an n-dimensional space

2011-10-27 Thread prasenjit mukherjee

Thanks for responding. On Fri, Oct 28, 2011 at 1:12 AM, Felipe Hummel wrote: > For the indexing part, you can 'insert' the term multiple times (term-weight > times) constructing the document String manually. That is not very typical, > you would normally feed Lucene with the original documents fo

Finding Term Positions in the original document

2011-10-27 Thread Vidya Kanigiluppai Sivasubramanian

Hi, I am using lucene 2.4.1 in my project. I need to display the search results when searched for a particular term and on selecting an item in the result page, I need to display the document where the term was found highlighting the match terms in the display. For this I need to know the match

index bigger than it should be?

Re: idf calculation in Lucene ?

Re: Lucene 3.1 search paralelism per segment doubt

Re: IndexWriter loops trying to merge using ConcurrentMergeScheduler

Re: index bigger than it should be?

Re: Lucene 3.1 search paralelism per segment doubt

Re: performance question - number of documents

Re: using lucene to find neighbouring points in an n-dimensional space

Re: IndexWriter loops trying to merge using ConcurrentMergeScheduler

Re: using lucene to find neighbouring points in an n-dimensional space

Finding Term Positions in the original document

11 matches

Site Navigation

Mail list logo

Footer information