date:20140124

Re: Help using ShingleFilter/NGramTokenizer: Could not find implementing class for org.apache.lucene.analysis.tokenattributes.OffsetAttribute

2014-01-24 Thread Koji Sekiguchi

Hi Russell, Seems that the error messages says that the implementing class for OffsetAttribute cannot be found in your classpath on the (Pig?) environment. There seems to be implementing classes OffsetAttributeImpl and Token, according to Javadoc: http://lucene.apache.org/core/4_6_0/core/org/a

Re: Performance testing Lucene

2014-01-24 Thread Michael McCandless

Oh that's good to hear. Lucene's unit tests are quite stressful on a new Directory impl... Mike McCandless http://blog.mikemccandless.com On Thu, Jan 23, 2014 at 8:40 PM, Scott Schneider wrote: > Thanks! I ran this Directory subclass through the Lucene unit tests (and > found 3 race conditi

RE: Performance testing Lucene

2014-01-24 Thread Uwe Schindler

Hi Scott, the unit tests are also a good performance test. But to compare your directory with another one, be sure to: - use a defined directory instance to compare. The most performant Lucene one is: -Dtests.directory=MMapDirectory - so compare you results with that one. If you don't define a

Building term frequency matrix over 6 million documents...

2014-01-24 Thread Witdouck, Xavier

Hi all, We have over 6 million documents in our index, and would like to construct a term frequency matrix over all 6 million documents as quickly as possible. Each document has a numeric date field, so we would like to build a time series which contains values which are the sum of all frequen

Re: Building term frequency matrix over 6 million documents...

2014-01-24 Thread Marcio Napoli

Hi! I believe the approach below can help you. http://svn.apache.org/repos/asf/lucene/dev/trunk/lucene/misc/src/java/org/apache/lucene/misc/HighFreqTerms.java Marcio http://numere.stela.org.br Go beyond Lucene™ features with Numere® 2014/1/24 Witdouck, Xavier > Hi all, > > We have over 6 m

(DocIds satisfying a query) -> (branch of the boolean query as a tree)

2014-01-24 Thread Olivier Binda

Hello While searching a query, I guess that Lucene traverses a Field->Term->DocId structure, filters the docIds that satisfy the query, score them and then sort them Given a resulting docId, I would like a way to find at least a valid path (or the first valid path or all valid paths) that ma

exporting a query to String with default operator = AND ?

2014-01-24 Thread Olivier Binda

Hello. I would like to serialize a query into a string (A) and then to unserialize it back into a query (B) I guess that a solution is A) query.toString() B) StandardQueryParser().parse(query,"") It is suboptimal for me though, because my app already has a custom query parser (with leadingWi

Re: Problems with Lucene and Solr

2014-01-24 Thread Doug Turnbull

Hey Vishnu, I'm trying to understand what you're trying to accomplish (cc'ing Lucene user group to solicit additional advice) Are you trying to extract all the terms for a given document? If so, you might just want to enable term vectors to analyze the index terms for the document. -Doug On Fri

SnapshotDeletionPolicy API changes

2014-01-24 Thread Vitaly Funstein

I see that SnapshotDeletionPolicy no longer supports snapshotting by an app-supplied string id, as of Lucene 4.4. However, my use case relies on the policy's ability to maintain multiple snapshots simultaneously to provide index versioning semantics, of sorts. What is the new recommended way of doi

Re: exporting a query to String with default operator = AND ?

2014-01-24 Thread Erick Erickson

First of all, query.toString is not idempotent. You cannot count on feeding the results of query.toString back into query and getting the same thing, so that's out. Not quite sure what the right solution is though Best, Erick On Fri, Jan 24, 2014 at 11:29 AM, Olivier Binda wrote: > Hello. >

Re: SnapshotDeletionPolicy API changes

2014-01-24 Thread Michael McCandless

It added complexity, for Lucene to track the app-provided ID. And, it's something you can easily add back on top of the new API, if necessary. But, maintaining multiple snapshots is certainly still allowed: multiple snapshots referencing the same IndexCommit is fine. There is a ref count increme

Lucene performance

2014-01-24 Thread Hamed Ghavamnia

Hello, I searched a lot about lucene limits and its performance, but I still don't know how much I can count on it. I'm storing logs and indexing them with lucene. The event per second is 2000. The format of each log is generally 'fieldname' : 'fieldvalue'. What search performance should I expect

Re: Help using ShingleFilter/NGramTokenizer: Could not find implementing class for org.apache.lucene.analysis.tokenattributes.OffsetAttribute

Re: Performance testing Lucene

RE: Performance testing Lucene

Building term frequency matrix over 6 million documents...

Re: Building term frequency matrix over 6 million documents...

(DocIds satisfying a query) -> (branch of the boolean query as a tree)

exporting a query to String with default operator = AND ?

Re: Problems with Lucene and Solr

SnapshotDeletionPolicy API changes

Re: exporting a query to String with default operator = AND ?

Re: SnapshotDeletionPolicy API changes

Lucene performance

12 matches

Site Navigation

Mail list logo

Footer information