RE: Indexed numeric fields return indexed() == false

2013-04-26 Thread Neil Ireson
Hi Uwe, Thank you for the clarification, knowing that I will have to shuffle my code accordingly. N -- View this message in context: http://lucene.472066.n3.nabble.com/Indexed-numeric-fields-return-indexed-false-tp4059139p4059408.html Sent from the Lucene - Java Users mailing list archive at

Re: Optimizing NRT search

2013-04-26 Thread Aleksey
Thanks for the response, Mike. Yes, I've come upon your blog before, it's very helpful. I tried bigger batches, it seems the highest throughput I can get is roughly 250 docs a second. From your blog, you updated your index at about 1MB per second, with 1K documents, which is 1000/s, but you had 24

Re: Big number of values for facets

2013-04-26 Thread Shai Erera
You can also try to use a different IntEncoder which compresses the values better. Try FourFlags and the like. Perhaps it will allow you to index more facets per document and it will be enough... though i should add "for the time being" b/c according to your scenario, you could easily hit more than

Re: Big number of values for facets

2013-04-26 Thread Nicola Buso
Hi, Mike: no it's not an error of our application I have some entries with this peculiarities :-) probably these cases can be mapped in different ways? If I think to the ER world It's not difficult to have a (n to m) relation between two tables where one of this table is a categorization of some

Re: Big number of values for facets

2013-04-26 Thread Michael McCandless
This means a single document requires more than 32 KB to store all of its ordinals ... so that document must have like at least 6K facets? Are you sure this isn't a bug in your app? That's an insanely high number of facets for one document ... Mike McCandless http://blog.mikemccandless.com On

RE: Indexed numeric fields return indexed() == false

2013-04-26 Thread Uwe Schindler
Hi Neil, the issue here is a API problem still present in Lucene 4.x. In Lucene 5 you cannot do that anymore (reindex a document by passing IndexReader's return value into IndexWriter), so we will not fix the issue for 4.x. In 5.0 the world looks like that and then it's no longer a problem (bec

Re: Big number of values for facets

2013-04-26 Thread Shai Erera
Unfortunately partitions are enabled globally and not per document. And you cannot activate them as you go. It's a setting you need to enable before you index. At least, that's how they currently work - we can think of better ways to do it. Also, partitions were not designed to handle that limitat

Re: Indexed numeric fields return indexed() == false

2013-04-26 Thread Ian Lea
It doesn't work because lucene doesn't store all the necessary info in the index. It may work for StringField because there isn't really any other info for that field type - it's just a string stored as is - but other fields have tokenization, precision, whatever, which may not be stored, and evid

Re: Big number of values for facets

2013-04-26 Thread Nicola Buso
Hi Shai, I can't say now how many of these entries I have, I need to trace them, but I expect their are exceptions, like 10 entries no more. Can I enable partitions document by document? Should I activate partitions if I reach a threshold just for these exceptions? Nicola. On Fri, 2013-04-26 a

Re: Big number of values for facets

2013-04-26 Thread Shai Erera
Hi Nicola, I think this limit denotes the number of bytes you can write in a single DV value. So this actually means much less number of facets you index. Do you know how many categories are indexed for that one document? Also, do you expect to index large number of facets for most documents, or

Big number of values for facets

2013-04-26 Thread Nicola Buso
Hi all, I'm encountering a problem to index a document with a large number of values for one facet. Caused by: java.lang.IllegalArgumentException: DocValuesField "$facets" is too large, must be <= 32766 at org.apache.lucene.index.BinaryDocValuesWriter.addValue(BinaryDocValuesWriter.java:5

Re: Indexed numeric fields return indexed() == false

2013-04-26 Thread Neil Ireson
Thanks for the reply. I don't understand why I cannot "read an existing document,... and add it to an existing or new index". I understand this wouldn't work for fields which are not stored, I also understand that I am responsible for making sure the tokenizers and analyzers are the same,

Re: Indexed numeric fields return indexed() == false

2013-04-26 Thread Ian Lea
Unfortunately you can't read an existing document, modify it and add it to an existing or new index. You'll have to create a new Document, populate it with fields of the relevant types, using values from the source index if they are stored, then add the new Document to the new index. If there are

Indexed numeric fields return indexed() == false

2013-04-26 Thread Neil Ireson
Hi all, I am copying documents from a source index to another (and adding more fields), all the fields are indexed and stored. I'm basically doing... for (int docNum = 0; docNum < maxDoc; docNum++) { Document doc = indexReader.document(docNum); doc.add(new Field1...); doc.add(new Field

Re: Optimizing NRT search

2013-04-26 Thread Michael McCandless
Batching the updates really ought to improve overall throughput. Have you tried with even bigger batches (100,1000 docs)? But, how large is each update? Are you changing any IndexWriter settings, e.g. ramBufferSizeMB. Using threads should help too, at least a separate thread doing indexing from