date:20140320

Segments reusable across commits?

2014-03-20 Thread Vitaly Funstein

I have a usage pattern where I need to package up and store away all files from an index referenced by multiple commit points. To that end, I basically call IndexWriter.commit(), followed by SnapshotDeletionPolicy.snapshot(), followed by something like this: List files = new ArrayList(dir.li

RE: Dimension mismatch exception

2014-03-20 Thread Uwe Schindler

Hi Stefy, the stack trace you posted has nothing to do with Apache Lucene. It looks like you are using some commons-lang3 classes here, but no Lucene code at all. So I think your question might be better asked on the commons-math mailing list, unless you have some Lucene code around, too. If th

RE: Please help me in migrating Apache Lucene 2.9 to 4.7.0

2014-03-20 Thread Uwe Schindler

Hi, I would recommend to first upgrade to Lucene 3.6. The changes to that version are much easier to do, because you can do the following migration steps: Remove any deprecated code from your classes (so carefully remove any deprecation warning when compiling your code with 2.9 - after that it

RE: Possible issue with Tokenizer in lucene-analyzers-common-4.6.1

2014-03-20 Thread Uwe Schindler

Hi, I am glad that I was able to help you! One more optimization in your consumer: CharTermAttribute implements CharSequence, so you can directly append it to StringBuilder, no need to call toString(), see http://goo.gl/Ffg9tW: builder.append(termAttribute); This will save additional us

Re: Possible issue with Tokenizer in lucene-analyzers-common-4.6.1

2014-03-20 Thread Joe Wong

Thanks Uwe. It worked. On Thu, Mar 20, 2014 at 3:28 PM, Uwe Schindler wrote: > Hi, > > the IllegalStateException tells you what's wrong: "TokenStream contract > violation: close() call missing" > > Analyzer internally reuses TokenStreams, so if you call > Analyzer.tokenStream() a second time

RE: Possible issue with Tokenizer in lucene-analyzers-common-4.6.1

2014-03-20 Thread Uwe Schindler

Hi, the IllegalStateException tells you what's wrong: "TokenStream contract violation: close() call missing" Analyzer internally reuses TokenStreams, so if you call Analyzer.tokenStream() a second time it will return the same instance of your TokenStream. On that second call the state machine

Re: Possible issue with Tokenizer in lucene-analyzers-common-4.6.1

2014-03-20 Thread Joe Wong

Hi Uwe, Thanks for the reply. I'm not familiar with the usage of Lucene so any help would be appreciated. In our test we are executing several consecutive stemming operations (exception is thrown when the second stemmer.stem() method is called). In the code, see below, it does call the reset() me

Re: Dimension mismatch exception

2014-03-20 Thread Herb Roitblat

If you want to compute the cosines between pairs of documents (each a compared with each b), then the dimension is 100, the size of each document. If you want to compare the whole index then you will need to make them the same length (number of elements) by padding the shorter with zeroes. There

Re: Problem with numeric range query syntax in lucene 4.4.0

2014-03-20 Thread Matthew Petersen

Thanks for the response. I had seen references to this explanation in other areas but for older versions and was hoping it changed in 4.4. I guess since it's schema-less it really can't be fixed for the masses and must be fixed through customization. Thanks again, Matt On Thu, Mar 20, 2014 at

RE: Problem with numeric range query syntax in lucene 4.4.0

2014-03-20 Thread Uwe Schindler

Hi, Lucene is schema-less. Because of that QueryParser cannot handle numeric fields out of the box, because it cannot know what fields are numeric. Because of this it just creates a TermRangeQuery which will never hit any documents of a field indexed as LongField. You can directly use NumericRa

Dimension mismatch exception

2014-03-20 Thread Stefy D.

Dear all, I am trying to compute the cosine similarity between several documents. I have an indexed directory A made using 1 files and another indexed directory B made using 2 files. All the indexed documents from both directories have the same length (100 sentences). I want to get the

Problem with numeric range query syntax in lucene 4.4.0

2014-03-20 Thread Matthew Petersen

Hi I'm trying to submit a lucene query string to my index to return a data based on a numeric range. I'm using the syntax provided in the Query Parser Syntax document but the results I get indicate that the query is not working correctly. Below is a unit test that proves that the range query doe

RE: Possible issue with Tokenizer in lucene-analyzers-common-4.6.1

2014-03-20 Thread Uwe Schindler

Hi Joe, in Lucene 4.6, the TokenStream/Tokenizer APIs got some additional state machine checks to ensure that consumers and subclasses of those abstract interfaces are implemented in a correct way - they are not easy to understand, because they are implemented in that way to ensure they don't a

Possible issue with Tokenizer in lucene-analyzers-common-4.6.1

2014-03-20 Thread Joe Wong

Hi We're planning to upgrade lucene-analyzers-commons 4.3.0 to 4.6.1 . While running our unit test with 4.6.1 it fails at org.apache.lucene.analysis.Tokenizer on line 88 (setReader method). There it checks if input != ILLEGAL_STATE_READER then throws IllegalStateException. Should it not be if inp

Re: Please help me in migrating Apache Lucene 2.9 to 4.7.0

2014-03-20 Thread Doug Turnbull

Are you able to reindex the data from source? Typical practices around search indexes is to treat them as secondary stores for full-text search that mirrors a primary database or data store. -Doug On Thu, Mar 20, 2014 at 12:52 PM, NarasimhaRao DPNV < narasimha.jav...@gmail.com> wrote: > Hi > >

Please help me in migrating Apache Lucene 2.9 to 4.7.0

2014-03-20 Thread NarasimhaRao DPNV

Hi I started migrating my lucene search application from 2.9 version to 4.7.0 . Please suggest me the best way and best practices for this. There are many files to rewrite. Thank you, Narasimha.

Re: Replicator: how to use it?

2014-03-20 Thread Shai Erera

> > Even if the commit is called just before the close, the close triggers > a last commit. > That seems wrong. If you do writer.commit() and them immediately writer.close(), and there are no changes to the writer in between (i.e. a thread comes in and adds/updates/deletes a document), then close(

Segments reusable across commits?

RE: Dimension mismatch exception

RE: Please help me in migrating Apache Lucene 2.9 to 4.7.0

RE: Possible issue with Tokenizer in lucene-analyzers-common-4.6.1

Re: Possible issue with Tokenizer in lucene-analyzers-common-4.6.1

RE: Possible issue with Tokenizer in lucene-analyzers-common-4.6.1

Re: Possible issue with Tokenizer in lucene-analyzers-common-4.6.1

Re: Dimension mismatch exception

Re: Problem with numeric range query syntax in lucene 4.4.0

RE: Problem with numeric range query syntax in lucene 4.4.0

Dimension mismatch exception

Problem with numeric range query syntax in lucene 4.4.0

RE: Possible issue with Tokenizer in lucene-analyzers-common-4.6.1

Possible issue with Tokenizer in lucene-analyzers-common-4.6.1

Re: Please help me in migrating Apache Lucene 2.9 to 4.7.0

Please help me in migrating Apache Lucene 2.9 to 4.7.0

Re: Replicator: how to use it?

17 matches

Site Navigation

Mail list logo

Footer information