Re: Replicator: how to use it?

2014-03-20 Thread Shai Erera
Even if the commit is called just before the close, the close triggers a last commit. That seems wrong. If you do writer.commit() and them immediately writer.close(), and there are no changes to the writer in between (i.e. a thread comes in and adds/updates/deletes a document), then close()

Please help me in migrating Apache Lucene 2.9 to 4.7.0

2014-03-20 Thread NarasimhaRao DPNV
Hi I started migrating my lucene search application from 2.9 version to 4.7.0 . Please suggest me the best way and best practices for this. There are many files to rewrite. Thank you, Narasimha.

Re: Please help me in migrating Apache Lucene 2.9 to 4.7.0

2014-03-20 Thread Doug Turnbull
Are you able to reindex the data from source? Typical practices around search indexes is to treat them as secondary stores for full-text search that mirrors a primary database or data store. -Doug On Thu, Mar 20, 2014 at 12:52 PM, NarasimhaRao DPNV narasimha.jav...@gmail.com wrote: Hi I

Possible issue with Tokenizer in lucene-analyzers-common-4.6.1

2014-03-20 Thread Joe Wong
Hi We're planning to upgrade lucene-analyzers-commons 4.3.0 to 4.6.1 . While running our unit test with 4.6.1 it fails at org.apache.lucene.analysis.Tokenizer on line 88 (setReader method). There it checks if input != ILLEGAL_STATE_READER then throws IllegalStateException. Should it not be if

RE: Possible issue with Tokenizer in lucene-analyzers-common-4.6.1

2014-03-20 Thread Uwe Schindler
Hi Joe, in Lucene 4.6, the TokenStream/Tokenizer APIs got some additional state machine checks to ensure that consumers and subclasses of those abstract interfaces are implemented in a correct way - they are not easy to understand, because they are implemented in that way to ensure they don't

Problem with numeric range query syntax in lucene 4.4.0

2014-03-20 Thread Matthew Petersen
Hi I'm trying to submit a lucene query string to my index to return a data based on a numeric range. I'm using the syntax provided in the Query Parser Syntax document but the results I get indicate that the query is not working correctly. Below is a unit test that proves that the range query

Dimension mismatch exception

2014-03-20 Thread Stefy D.
Dear all, I am trying to compute the cosine similarity between several documents. I have an indexed directory A made using 1 files and another indexed directory B made using 2 files. All the indexed documents from both directories have the same length (100 sentences). I want to get the

RE: Problem with numeric range query syntax in lucene 4.4.0

2014-03-20 Thread Uwe Schindler
Hi, Lucene is schema-less. Because of that QueryParser cannot handle numeric fields out of the box, because it cannot know what fields are numeric. Because of this it just creates a TermRangeQuery which will never hit any documents of a field indexed as LongField. You can directly use

Re: Problem with numeric range query syntax in lucene 4.4.0

2014-03-20 Thread Matthew Petersen
Thanks for the response. I had seen references to this explanation in other areas but for older versions and was hoping it changed in 4.4. I guess since it's schema-less it really can't be fixed for the masses and must be fixed through customization. Thanks again, Matt On Thu, Mar 20, 2014 at

Re: Dimension mismatch exception

2014-03-20 Thread Herb Roitblat
If you want to compute the cosines between pairs of documents (each a compared with each b), then the dimension is 100, the size of each document. If you want to compare the whole index then you will need to make them the same length (number of elements) by padding the shorter with zeroes.

Re: Possible issue with Tokenizer in lucene-analyzers-common-4.6.1

2014-03-20 Thread Joe Wong
Hi Uwe, Thanks for the reply. I'm not familiar with the usage of Lucene so any help would be appreciated. In our test we are executing several consecutive stemming operations (exception is thrown when the second stemmer.stem() method is called). In the code, see below, it does call the reset()

RE: Possible issue with Tokenizer in lucene-analyzers-common-4.6.1

2014-03-20 Thread Uwe Schindler
Hi, the IllegalStateException tells you what's wrong: TokenStream contract violation: close() call missing Analyzer internally reuses TokenStreams, so if you call Analyzer.tokenStream() a second time it will return the same instance of your TokenStream. On that second call the state machine

Re: Possible issue with Tokenizer in lucene-analyzers-common-4.6.1

2014-03-20 Thread Joe Wong
Thanks Uwe. It worked. On Thu, Mar 20, 2014 at 3:28 PM, Uwe Schindler u...@thetaphi.de wrote: Hi, the IllegalStateException tells you what's wrong: TokenStream contract violation: close() call missing Analyzer internally reuses TokenStreams, so if you call Analyzer.tokenStream() a

RE: Possible issue with Tokenizer in lucene-analyzers-common-4.6.1

2014-03-20 Thread Uwe Schindler
Hi, I am glad that I was able to help you! One more optimization in your consumer: CharTermAttribute implements CharSequence, so you can directly append it to StringBuilder, no need to call toString(), see http://goo.gl/Ffg9tW: builder.append(termAttribute); This will save additional

RE: Please help me in migrating Apache Lucene 2.9 to 4.7.0

2014-03-20 Thread Uwe Schindler
Hi, I would recommend to first upgrade to Lucene 3.6. The changes to that version are much easier to do, because you can do the following migration steps: Remove any deprecated code from your classes (so carefully remove any deprecation warning when compiling your code with 2.9 - after that it

RE: Dimension mismatch exception

2014-03-20 Thread Uwe Schindler
Hi Stefy, the stack trace you posted has nothing to do with Apache Lucene. It looks like you are using some commons-lang3 classes here, but no Lucene code at all. So I think your question might be better asked on the commons-math mailing list, unless you have some Lucene code around, too. If

Segments reusable across commits?

2014-03-20 Thread Vitaly Funstein
I have a usage pattern where I need to package up and store away all files from an index referenced by multiple commit points. To that end, I basically call IndexWriter.commit(), followed by SnapshotDeletionPolicy.snapshot(), followed by something like this: ListString files = new