Re: codec mismatch

2014-02-17 Thread Jason Wee
Hi Mike, Thank you. This exception is pretty clear that during lucene execute readInternal(...) on _0.cfs and encountered an npe. The root cause is because the object being read, FileBlock is null. As far as i can tell, it happen only during reading _0.cfs but not on the index files that were

Re: codec mismatch

2014-02-17 Thread Michael McCandless
That NPE is happening inside Cassandra's sources; I think you need to trace what's happening there and how its FileBlock can be null? It looks like it's a bug on how CassandraDirectory handles compound files (e.g. _0.cfs), which are somewhat tricky because it's a file that acts itself like a

Re: Actual min and max-value of NumericField during codec flush

2014-02-17 Thread Ravikumar Govindarajan
Well, this will change your scores? MultiReader will sum up all term statistics across all SegmentReaders up front, and then scoring per segment will use those top-level weights. Our app needs to do only matching and sorting. In-fact, it would be fully OK to by-pass scoring. But I feel

Re: codec mismatch

2014-02-17 Thread Jack Krupansky
Are you using or aware of Solandra? See: https://github.com/tjake/Solandra Solandra has been superceded by a commercial product, DataStax Enterprise that combines Solr/Lucene and Cassandra. Solr/Lucene indexing of Cassandra data is supported, but the actual Lucene indexes are stored in the

Lucene doubt

2014-02-17 Thread Pedro Cardoso
Good afternoon, I am using Lucene in developing a protect, however I was faced with a doubt. I wonder if a multi-thread system it is possible to write concurrently? Cumprimentos/ Best Regards *Pedro Cardoso*

Re: Lucene doubt

2014-02-17 Thread Michael McCandless
In general, both indexing and searching are highly concurrent in Lucene. Mike McCandless http://blog.mikemccandless.com On Mon, Feb 17, 2014 at 9:54 AM, Pedro Cardoso pmcardoso@gmail.com wrote: Good afternoon, I am using Lucene in developing a protect, however I was faced with a

Re: Lucene doubt

2014-02-17 Thread Adrien Grand
Hi Pedro, Lucene indeed supports indexing data from several threads into a single IndexWriter instance, and it will make use of all your I/O and CPU. You can learn more about how it works at http://blog.trifork.com/2011/05/03/lucene-indexing-gains-concurrency/ On Mon, Feb 17, 2014 at 3:54 PM,

Re: Actual min and max-value of NumericField during codec flush

2014-02-17 Thread Michael McCandless
On Mon, Feb 17, 2014 at 8:33 AM, Ravikumar Govindarajan ravikumar.govindara...@gmail.com wrote: Well, this will change your scores? MultiReader will sum up all term statistics across all SegmentReaders up front, and then scoring per segment will use those top-level weights. Our app needs

Re: Collector is collecting more than the specified hits

2014-02-17 Thread saisantoshi
The collector is collecting all the documents. Let's say I have 50k documents and I want the collector to give me the results taking the start and maxHits. Can we get this functionality from Lucene? For example, very first time, I want to collect from 0 -100 the next time I want to collect from

RE: Reverse Matching

2014-02-17 Thread Siraj Haider
Thanks for your great advice Ahmet. Do you know if I could use luwak libraries in my Lucene project diretly? Or do I have to use Solr? Currently, we use core lucene libraries in our system and have built our own framework around it. regards -Siraj -Original Message- From: Ahmet Arslan

Re: Reverse Matching

2014-02-17 Thread Alan Woodward
Hi Siraj, At the moment luwak is based on a fork of lucene (https://github.com/flaxsearch/lucene-solr-intervals, itself based on work done in LUCENE-2878), which we use to report exact match positions. I'm hoping to get it working with the main lucene classes soon, though. Alan Woodward

Re: Extending StandardTokenizer Jflex to not split on '/'

2014-02-17 Thread Diego Fernandez
Hey Steve, thanks for the quick reply. I didn't have a chance to test again until today. In our Lucene build, we had already made some customization to the JFlex file and it re-generates the java file whenever we build our project. Unfortunately, it is still not working for me. I diffed the

Re: Collector is collecting more than the specified hits

2014-02-17 Thread Michael McCandless
This is exactly what searchAfter is for (deep paging). Mike McCandless http://blog.mikemccandless.com On Mon, Feb 17, 2014 at 3:12 PM, saisantoshi saisantosh...@gmail.com wrote: The collector is collecting all the documents. Let's say I have 50k documents and I want the collector to give me

Re: Collector is collecting more than the specified hits

2014-02-17 Thread saisantoshi
Could you please elaborate on the above? I am not sure if the collector is already doing it or do I need to call any other API? Thanks, Sai. -- View this message in context: http://lucene.472066.n3.nabble.com/Collector-is-collecting-more-than-the-specified-hits-tp4117329p4117883.html Sent

Re: Extending StandardTokenizer Jflex to not split on '/'

2014-02-17 Thread Steve Rowe
Sorry, Diego, the generated scanner diff doesn't tell me anything. Since I was able to successfully make changes to the open source and get the desired behavior, I'm guessing you're: a) not using the same (versions of) tools as me; b) not using the same (version of the) source as me; or c) not

Re: Collector is collecting more than the specified hits

2014-02-17 Thread saisantoshi
As I mentioned in my original post, I am calling like the below: MyCollector collector; TopScoreDocCollector topScore = TopScoreDocCollector.create(firstIndex+numHits, true); IndexSearcher searcher = new IndexSearcher(reader); try {