date:20130402

Term vector Lucene 4.2

2013-04-02 Thread andi rexha

Hi, I have a problem while trying to extract term vector's attributes (i.e. position of the terms). What I have done was: Terms termVector = indexReader.getTermVector(docId, fieldName); TermsEnum reuse = null; TermsEnum iterator = termVector.iterator(reuse); PositionIncr

Re: Term vector Lucene 4.2

2013-04-02 Thread Adrien Grand

Hi Andi, Here is how you could retrieve positions from your document: Terms termVector = indexReader.getTermVector(docId, fieldName); TermsEnum reuse = null; TermsEnum iterator = termVector.iterator(reuse); BytesRef ref = null; DocsAndPositionsEnum docsAndPositions = null;

RE: Term vector Lucene 4.2

2013-04-02 Thread andi rexha

Hi Adrien, Thank you very much for the reply. I have two other small question about this: 1) Is "final int freq = docsAndPositions.freq();" the same with "iterator.totalTermFreq()" ? In my tests it returns the same result and from the documentation it seems that the result should be the same.

How to use concurrency efficiently

2013-04-02 Thread Igor Shalyminov

Hello! I have a ~20GB index and try to make a concurrent search over it. The index has 16 segments, I run SpanQuery.getSpans() on each segment concurrently. I see really small performance improvement of searching concurrently. I suppose, the reason is that the sizes of the segments are very non-

Re: Term vector Lucene 4.2

2013-04-02 Thread Adrien Grand

On Tue, Apr 2, 2013 at 12:45 PM, andi rexha wrote: > Hi Adrien, > Thank you very much for the reply. > > I have two other small question about this: > 1) Is "final int freq = docsAndPositions.freq();" the same with > "iterator.totalTermFreq()" ? In my tests it returns the same result and from >

Segment readers in Lucene 4.2

2013-04-02 Thread andi rexha

Hi, I have a question about the Index Readers in Lucene. As far as I understand from the documentation, with the Lucene 4, we can create an Index Reader from DirectoryReader.open(directory); >From the code of the DirectoryReader, I have seen that it uses the >SegmentReader to create the reader.

RE: Segment readers in Lucene 4.2

2013-04-02 Thread Uwe Schindler

Hi, this is all not public tot he code because it is also subject to change! With Lucene 4.x, you can assume: directoryReader.leaves().get(i) corresponds to segmentsinfos.info(i) WARNING: But this is only true if: - the reader is instanceof DirectoryReader - the segmentinfos were opened on the e

Re: How to use concurrency efficiently

2013-04-02 Thread Adrien Grand

On Tue, Apr 2, 2013 at 2:29 PM, Igor Shalyminov wrote: > Hello! Hi Igor, > I have a ~20GB index and try to make a concurrent search over it. > The index has 16 segments, I run SpanQuery.getSpans() on each segment > concurrently. > I see really small performance improvement of searching concurre

Re: Indexing Term Frequency Vectors

2013-04-02 Thread Sharon W Tam

Thanks for your help, Adrien. But unfortunately, my term frequencies will be partial counts so they won't be integers, And finding a common denominator and scaling the rest of the frequencies accordingly will affect the relative lengths of the documents which will affect the Lucene scoring becaus

RE: Segment readers in Lucene 4.2

2013-04-02 Thread andi rexha

Hi, Thanks for the reply ;) > > this is all not public tot he code because it is also subject to change! > > With Lucene 4.x, you can assume: > directoryReader.leaves().get(i) corresponds to segmentsinfos.info(i) > > WARNING: But this is only true if: > - the reader is instanceof DirectoryR

Re: How to use concurrency efficiently

2013-04-02 Thread Igor Shalyminov

Yes, the number of documents is not too large (about 90 000), but the queries are very hard. Although they're just boolean, a typical query can produce a result with tens of millions of hits. Single-threadedly such a query runs ~20 seconds, which is too slow. therefore, multithreading is vital f

Re: How to use concurrency efficiently

2013-04-02 Thread Adrien Grand

On Tue, Apr 2, 2013 at 4:39 PM, Igor Shalyminov wrote: > Yes, the number of documents is not too large (about 90 000), but the queries > are very hard. Although they're just boolean, a typical query can produce a > result with tens of millions of hits. How can there be tens of millions of hits

Re: Indexing Term Frequency Vectors

2013-04-02 Thread Adrien Grand

On Tue, Apr 2, 2013 at 4:10 PM, Sharon W Tam wrote: > Are there any other ideas? Since scoring seems to be what you are interested in, you could have a look to payloads: there can store arbitrary data and can be used to score matches. -- Adrien

Re: Scoring function in LMDirichletSimilarity Class

2013-04-02 Thread Zeynep P.

Hi, I have the same question related to LMJelinekMercerSimiliarity class. protected float score(BasicStats stats, float freq, float docLen) { return stats.getTotalBoost() * (float)Math.log(1 + ((1 - lambda) * freq / docLen) / (lambda * ((LMStats)stats).getCollectionProbability()));

Re: How to use concurrency efficiently

2013-04-02 Thread Igor Shalyminov

These are not document hits but text hits (to be more specific, spans). For the search result it is necessary to have the precise number of document and text hits and a relatively small number of matched text snippets. I've tried several approaches to optimize the search algorithm but they didn't

When should I commit IndexWriter and TaxonomyWriter if I use NRT readers?

2013-04-02 Thread crocket

Since I use NRT readers for Index and TaxonomyIndex, I don't have to commit to see the changes. Now, I don't know if indexes are ever committed. If they don't commit automatically, I'd have to do it on a regular basis. What should I do about committing?

Re: When should I commit IndexWriter and TaxonomyWriter if I use NRT readers?

2013-04-02 Thread Apostolis Xekoukoulotakis

Maybe consider the data saved only after you have committed them. Acknowledge new data in batches after a commit? 2013/4/3 crocket > Since I use NRT readers for Index and TaxonomyIndex, I don't have to commit > to see the changes. > > Now, I don't know if indexes are ever committed. > > If they

RE: How to use concurrency efficiently

2013-04-02 Thread Uwe Schindler

If you are using MMapDirectory (default on 64 bit platforms) then they are already in filesystem cache and directly accessible like RAM to indexreader. No need to cache separately. - Uwe Schindler H.-H.-Meier-Allee 63, D-28213 Bremen http://www.thetaphi.de eMail: u...@thetaphi.de > -Or

Re: How to use concurrency efficiently

2013-04-02 Thread Paul

Hi, I've experimented a bit with MultiFieldQueryParser (http://lucene.apache.org/core/4_2_0/queryparser/org/apache/lucene/queryparser/classic/MultiFieldQueryParser.html) But it seems to search for each of a query's terms in each field specified in the constructor. So, as the doc says, if you q

Term vector Lucene 4.2

Re: Term vector Lucene 4.2

RE: Term vector Lucene 4.2

How to use concurrency efficiently

Re: Term vector Lucene 4.2

Segment readers in Lucene 4.2

RE: Segment readers in Lucene 4.2

Re: How to use concurrency efficiently

Re: Indexing Term Frequency Vectors

RE: Segment readers in Lucene 4.2

Re: How to use concurrency efficiently

Re: How to use concurrency efficiently

Re: Indexing Term Frequency Vectors

Re: Scoring function in LMDirichletSimilarity Class

Re: How to use concurrency efficiently

When should I commit IndexWriter and TaxonomyWriter if I use NRT readers?

Re: When should I commit IndexWriter and TaxonomyWriter if I use NRT readers?

RE: How to use concurrency efficiently

Re: How to use concurrency efficiently

19 matches

Site Navigation

Mail list logo

Footer information