Re: DiskDocValuesFormat

2013-04-14 Thread Wei Wang
Strange. That's all I got from the log beside the first line I wrote to show starting merging with a time stamp. On Sun, Apr 14, 2013 at 4:58 PM, Robert Muir wrote: > Your stack trace is incomplete: it doesn't even show where the OOM > occurred. > > On Sun, Apr 14, 201

Re: DiskDocValuesFormat

2013-04-14 Thread Wei Wang
t much memory consumption. But it seems not the case. On Sun, Apr 14, 2013 at 4:13 PM, Wei Wang wrote: > That makes sense. > > BTW, I checked the jar file. Exactly as you pointed out, the services > files only contains info from lucene-core, without codec from > lucene-codecs. Aft

Re: DiskDocValuesFormat

2013-04-14 Thread Wei Wang
JAR file with a ZIP > > > program and check that all files in META-INF/services contain all > > > entries merged from all Lucene JARs. > > > > > > Uwe > > > > > > - > > > Uwe Schindler > > > H.-H.-Meier-Allee 63, D-28213 Brem

Re: DiskDocValuesFormat

2013-04-14 Thread Wei Wang
ith a ZIP program > and check that all files in META-INF/services contain all entries merged > from all Lucene JARs. > > Uwe > > - > Uwe Schindler > H.-H.-Meier-Allee 63, D-28213 Bremen > http://www.thetaphi.de > eMail: u...@thetaphi.de > > > > -Orig

Re: DiskDocValuesFormat

2013-04-14 Thread Wei Wang
3, D-28213 Bremen > http://www.thetaphi.de > eMail: u...@thetaphi.de > > > > -----Original Message- > > From: Wei Wang [mailto:welshw...@gmail.com] > > Sent: Sunday, April 14, 2013 11:30 PM > > To: java-user@lucene.apache.org > > Subject: Re: DiskDocValuesFormat >

Re: DiskDocValuesFormat

2013-04-14 Thread Wei Wang
ve created a single jar file that has all necessary dependencies, such as lucene-codecs-4.2.0.jar. And I assume the indexing step works well, so Lucene already knows the format with name 'Disk'. Thanks. On Sat, Apr 13, 2013 at 4:25 AM, Adrien Grand wrote: > Hi Wei, > > On Sat,

Re: DiskDocValuesFormat

2013-04-13 Thread Wei Wang
Hi Adrien, Thanks for your example. Really helpful! Wei On Sat, Apr 13, 2013 at 4:25 AM, Adrien Grand wrote: > Hi Wei, > > On Sat, Apr 13, 2013 at 7:44 AM, Wei Wang wrote: > > I am trying to use DiskDocValuesFormat for a particular > > BinaryDocValuesField. It seems ther

DiskDocValuesFormat

2013-04-12 Thread Wei Wang
I am trying to use DiskDocValuesFormat for a particular BinaryDocValuesField. It seems there is no good examples showing how to do this. The only hint I got from various docs and forums is set some codec in IndexWriter. Could someone give a few lines of code snippet and show how to set DiskDocValue

Re: Forcemerge running out of memory

2013-04-11 Thread Wei Wang
m, its unrelated to merging: it means you don't > have enough RAM to support all the stuff you are putting in these > binarydocvalues fields with an in-RAM implementation. I'd use "Disk" for > this instead. > > On Thu, Apr 11, 2013 at 12:57 PM, Wei Wang wrote: >

Forcemerge running out of memory

2013-04-11 Thread Wei Wang
Hi, After finishing indexing, we tried to consolidate all segments using forcemerge, but we continuously get out of memory error even if we increased the memory up to 4GB. Exception in thread "main" java.lang.IllegalStateException: this writer hit an OutOfMemoryError; cannot complete forceMerge

Re: IntField question

2013-04-10 Thread Wei Wang
Thanks for the clarification. Very helpful. On Wed, Apr 10, 2013 at 8:19 AM, Adrien Grand wrote: > Hi, > > On Wed, Apr 10, 2013 at 4:59 PM, Wei Wang wrote: > > Okay. Since there is no ByteField, setByteValue will never by used. It > > seems like a dead function. > >

Re: IntField question

2013-04-10 Thread Wei Wang
Hi, On Wed, Apr 10, 2013 at 2:45 AM, Adrien Grand wrote: > Hi, > > On Wed, Apr 10, 2013 at 9:34 AM, Wei Wang wrote: > > IntField inherits from Field class a function called setByteValue(). > > However, if we call it, it gives an error message: > > > > java.lang

IntField question

2013-04-10 Thread Wei Wang
IntField inherits from Field class a function called setByteValue(). However, if we call it, it gives an error message: java.lang.IllegalArgumentException: cannot change value type from Integer to Byte 1. If this not allowed for IntField, and there is no ByteField, how will function setByteValue(

Re: DocValues space usage

2013-04-09 Thread Wei Wang
Adrien and Rober, thanks a lot for the hints. Will try a few options and see how it goes. On Tue, Apr 9, 2013 at 9:25 AM, Robert Muir wrote: > On Tue, Apr 9, 2013 at 9:11 AM, Adrien Grand wrote: > > > The default codec stores numeric doc values by blocks of 4096 values > > that have independent

Re: DocValues space usage

2013-04-09 Thread Wei Wang
a from the comments. On Tue, Apr 9, 2013 at 8:51 AM, Robert Muir wrote: > On Tue, Apr 9, 2013 at 8:22 AM, Wei Wang wrote: > > > DocValues makes fast per doc value lookup possible, which is nice. But it > > brings other interesting issues. > > > > Assume there are 100M d

DocValues space usage

2013-04-09 Thread Wei Wang
DocValues makes fast per doc value lookup possible, which is nice. But it brings other interesting issues. Assume there are 100M docs and 200 NumericDocValuesFields, this ends up with huge number of disk and memory usage, even if there are just thousands of values for each field. I guess this is b

Re: Reuse Document

2013-04-07 Thread Wei Wang
today ... but, > likely this wouldn't really buy you much performance if it did vs just > creating a new Document when the fields changed. > > Mike McCandless > > http://blog.mikemccandless.com > > On Sun, Apr 7, 2013 at 2:41 AM, Wei Wang wrote: > > Lucene encourages to

Reuse Document

2013-04-06 Thread Wei Wang
Lucene encourages to re-use Document by setting new values for Fields contained within a Document object. This assumes there is no change to the number and types of Fields contained in a Document object during indexing. If the number and types of Fields contained in a Document object changes from

Re: DocValues questions

2013-04-04 Thread Wei Wang
error: Exception in thread "main" java.lang.IllegalArgumentException: cannot change value type from Long to Integer Do we need to use setLongValue() all the time? Thanks. On Thu, Apr 4, 2013 at 3:58 PM, Wei Wang wrote: > Thanks! Good to know the codec uses variable length encod

Re: DocValues questions

2013-04-04 Thread Wei Wang
Thanks! Good to know the codec uses variable length encoding mechanism here. On Thu, Apr 4, 2013 at 3:36 PM, Adrien Grand wrote: > On Thu, Apr 4, 2013 at 11:03 PM, Wei Wang wrote: > > Given the new Lucene 4.2 DocValues API, it seems no matter it is byte, > > short, int, or lon

Re: DocValues questions

2013-04-04 Thread Wei Wang
ed to give some hint to NumericDocValuesField to save space? On Thu, Apr 4, 2013 at 11:53 AM, Wei Wang wrote: > Hi Adrien, > > Thanks for the clarification. It is very helpful. Will try Lucene 4.2 and > AtomicReader API. > > Wei > > > On Thu, Apr 4, 2013 at 11:22 AM, Adrie

Re: DocValues questions

2013-04-04 Thread Wei Wang
Hi Adrien, Thanks for the clarification. It is very helpful. Will try Lucene 4.2 and AtomicReader API. Wei On Thu, Apr 4, 2013 at 11:22 AM, Adrien Grand wrote: > Hi, > > On Thu, Apr 4, 2013 at 10:30 AM, Wei Wang wrote: > > A few quick questions about DocValues: > >

DocValues questions

2013-04-04 Thread Wei Wang
A few quick questions about DocValues: 1. If only small number of documents have a ShortDocValueField defined, should each document in the index has this field filled with some value? The add() function of Document seems not enforce a DocValues field is always added to each document. 2. Is there

Re: Filter based on the sum of values of two fields

2013-03-27 Thread Wei Wang
Hi Yann-Erwan, Thank you for the detailed reply. Your idea seems reasonable. I will give it a try for out environment settings. Wei On Tue, Mar 26, 2013 at 5:22 PM, Yann-Erwan Perio wrote: > On Sun, Mar 24, 2013 at 10:46 AM, Wei Wang wrote: > > Hi, > >> For example, assume

Re: Filter based on the sum of values of two fields

2013-03-26 Thread Wei Wang
Can someone give some hint on this? Or this is a tough problem. Thanks in advance. On Sun, Mar 24, 2013 at 2:46 AM, Wei Wang wrote: > Hello, > > We have documents with many numerical fields. In some search scenario, > we would like to create a filter based on the sum of the v

Filter based on the sum of values of two fields

2013-03-24 Thread Wei Wang
Hello, We have documents with many numerical fields. In some search scenario, we would like to create a filter based on the sum of the values of two fields. For example, assume we have fields F1 and F2, we would like to find all documents with condition F1+F2 > 5.0. This filter may be combined wi

Re: BlockJoinQuery: delete documents

2013-03-05 Thread Wei Wang
rnal field together with the docID of the parent doc to remove the whole doc block. Here we assume the parent doc is given a doc ID first during the indexing time. Wei On Sun, Mar 3, 2013 at 11:54 AM, Wei Wang wrote: > I see. Probably assigning blockID is the most efficient way. Thanks. >

Re: BlockJoinQuery: delete documents

2013-03-03 Thread Wei Wang
n't join to > anything. > > Mike McCandless > > http://blog.mikemccandless.com > > > On Sat, Mar 2, 2013 at 11:34 PM, Wei Wang wrote: >> Hello, >> >> I understand BlockJoinQuery can be used to index nested documents with >> some internal structure. And

BlockJoinQuery: delete documents

2013-03-02 Thread Wei Wang
Hello, I understand BlockJoinQuery can be used to index nested documents with some internal structure. And at indexing time, addDocuments is used to create document blocks. In case we would like to update some data fields, we have to delete the old document block and add the updated block. How ca

Re: Lucene filter questions

2013-02-25 Thread Wei Wang
Thank you, Mike. I will try it out. On Mon, Feb 25, 2013 at 4:01 PM, Michael McCandless wrote: > On Mon, Feb 25, 2013 at 2:19 PM, Wei Wang wrote: >> Cool. Thanks, Ian. >> >> I will try FieldCacheTermsFilter. >> >> A related question. Occasionally, we would like

Re: Lucene filter questions

2013-02-25 Thread Wei Wang
> FieldCache - you might get better performance from > FieldCacheTermsFilter than from TermsFilter. See also > CachingWrapperFilter and QueryWrapperFilter. > > > -- > Ian. > > > On Mon, Feb 25, 2013 at 1:16 AM, Wei Wang wrote: > > Hi, > > > > I am a

Lucene filter questions

2013-02-24 Thread Wei Wang
Hi, I am a Lucene user and I have a few questions about Lucene filters. I appreciate it if someone can shed light on this. 1. Is Lucene filters such as TermsFilter thread-safe in general? The semantics of a Filter is fixed, unless a filter maintains some private state information, theoretical