oom on FastTaxonomyFacetsCounts

2016-12-27 Thread Sheng
This is probably not the fault of Lucene, as oom happened on the loc : values = new int[taxoReader.getSize()]; So taxoReader.getSize() probably is too big. My question is is there a more memory friendly way (also without significant performance penality) to get FacetResult for a particular

Re: SortingMergePolicy moved to solr ?

2016-09-14 Thread Sheng
owse/LUCENE-6766 has all the gory > details. > > Mike McCandless > > http://blog.mikemccandless.com > > On Wed, Sep 14, 2016 at 10:56 AM, Sheng <sheng...@gmail.com <javascript:;>> > wrote: > > Before 6.2, it is in Lucene-misc, now I can only find it in

SortingMergePolicy moved to solr ?

2016-09-14 Thread Sheng
Before 6.2, it is in Lucene-misc, now I can only find it in solr. I understand it might have something to do with an issue I reported earlier that sortingmergepolicy cannot handle point field properly, but my expectation by then was to expect this would be addressed in a later version instead of

Re: dv field is too large

2016-07-06 Thread Sheng
; chop it off before sending it to Lucene. > > Best, > Erick > > On Wed, Jul 6, 2016 at 3:53 PM, Sheng <sheng...@gmail.com <javascript:;>> > wrote: > > You misunderstand. I have many fields, and unfortunately a few of them > are > > quite big, i.e

Re: dv field is too large

2016-07-06 Thread Sheng
s searchable and sortable > independently. But from what you've described, putting the entire > thing into a single DV field isn't useful. > > Best, > Erick > > > > On Wed, Jul 6, 2016 at 3:10 PM, Sheng <sheng...@gmail.com <javascript:;>> > wrote:

Re: dv field is too large

2016-07-06 Thread Sheng
ndless > > http://blog.mikemccandless.com > > On Wed, Jul 6, 2016 at 5:55 PM, Sheng <sheng...@gmail.com <javascript:;>> > wrote: > > > Hi Eric, > > > > I am refactoring a legacy system. One of the most annoying things is I > have > > to keep the ol

Re: dv field is too large

2016-07-06 Thread Sheng
To be clear, the "field" is indeed tokenized, which is accompanied with a SortedDocValueField so that it is sortable too. Am I making the wrong assumption here ? On Wednesday, July 6, 2016, Sheng <sheng...@gmail.com> wrote: > Hi Eric, > > I am refactoring a legac

Re: dv field is too large

2016-07-06 Thread Sheng
may have a perfectly valid reason, but > it's > not obvious what use-case you're serving from this thread so far > > Nobody has yet put forth a compelling use-case for such large fields, > perhaps > this would be one. > > Best, > Erick > > On Wed, Jul 6, 2016 at 2

Re: dv field is too large

2016-07-06 Thread Sheng
be larger than 32K bytes. > > Mike McCandless > > http://blog.mikemccandless.com > > On Wed, Jul 6, 2016 at 10:31 AM, Sheng <sheng...@gmail.com <javascript:;>> > wrote: > > > Hi, > > > > I am getting an IAE indicating one of the Sorted

dv field is too large

2016-07-06 Thread Sheng
Hi, I am getting an IAE indicating one of the SortedDocValueField is too large, > 32k I googled a bit, and it seems like #Lucene-4583 has addressed this issue in 4.5 and 6.0, while I am currently using Lucene 6.1. Do I miss or misunderstand anything ? Thanks,

Re: SortingMergePolicy in Lucene 6

2016-06-10 Thread Sheng
lue in parent and child documents), and > secondarily by "blockID" where blockID is a unique long doc value indexed > on each document in the block. That should preserve your blocks? > > Mike McCandless > > http://blog.mikemccandless.com > > On Wed, May 25, 2016 at 8:26

Re: SortingMergePolicy in Lucene 6

2016-05-25 Thread Sheng
Maybe you could test Lucene's current master > and confirm points and index-time sorting work correctly for you? > > Mike McCandless > > http://blog.mikemccandless.com > > On Wed, May 25, 2016 at 1:10 PM, Sheng <sheng...@gmail.com > <javascript:_e(%7B%7D,'cvml','she

SortingMergePolicy in Lucene 6

2016-05-25 Thread Sheng
It makes a call to SlowCompositeReaderWrapper in line 103, which checks if field hasPointValues in line 68. If yes, it throws an exception "cannot wrap points". Does this essentially mean SortingMergePolicy cannot be used for index that has point values. If yes, what is the rationale behind it ?

Re: 500 millions document for loop.

2016-04-21 Thread Sheng
If you don't care about search, why not just use reader to traverse ? Establish a for loop from 0 to reader.maxDoc() - 1, and filter the documents using Multifields. You can even bucket this procedure, and run your statistics calc in parallel. On Thursday, November 12, 2015, Valentin Popov

Re: What is the propper replacement for Filters working in DocValue fields?

2016-03-23 Thread Sheng
One possible workaround I can think of is to make use of the CustomScoreQuery to do a posteri scoring, let documents not matching your criteria have score 0, and use a PostiveScoreOnlyCollector to harvest the search result. Now problem using CustomScoreQuery is FieldCache is deprecated too, but

Weird Lucene 5 filter behavior

2016-02-10 Thread Sheng
question is asked on SO, http://stackoverflow.com/questions/35320661/weird-filter-behavior-in-lucene-5 I am behind the firm proxy that make me have to type in phone to send this to the mail group. If there is any read inconvenience, apologize in advance!

Re: Weird Lucene 5 filter behavior

2016-02-10 Thread Sheng
instead? And is is guaranteed the behavior would be the same as that written in Filter ? On Wednesday, February 10, 2016, Sheng <sheng...@gmail.com> wrote: > question is asked on SO, > > > http://stackoverflow.com/questions/35320661/weird-filter-behavior-in-lucene-5 > > I

Re: ConjunctionScorer access

2015-10-22 Thread Sheng
to leverage this. On Thu, Oct 22, 2015 at 9:22 AM, Alan Woodward <a...@flax.co.uk> wrote: > You should be able to use a FilterScorer that wraps a ConjunctionScorer > and overrides score(). > > Alan Woodward > www.flax.co.uk > > > On 22 Oct 2015, at 13:43, Sheng wro

Re: ConjunctionScorer access

2015-10-22 Thread Sheng
nals > are still private - and that's good. > > Uwe > > - > Uwe Schindler > H.-H.-Meier-Allee 63, D-28213 Bremen > http://www.thetaphi.de > eMail: u...@thetaphi.de > > > > -Original Message- > > From: Sheng [mailto:sheng...@gmail.com] > &

Re: ConjunctionScorer access

2015-10-22 Thread Sheng
is that documents that have both term A and term B in > "payload_field" will not necessarily have term A in "excluded_field" -- > only the ones that you don't want to see in the result set. > > Regards, > AndrĂ¡s > > On Thu, Oct 22, 2015 at 4:06 PM, Sheng <sh

Re: ConjunctionScorer access

2015-10-22 Thread Sheng
emen > http://www.thetaphi.de > eMail: u...@thetaphi.de > > > > -Original Message- > > From: Sheng [mailto:sheng...@gmail.com] > > Sent: Thursday, October 22, 2015 4:06 PM > > To: java-user@lucene.apache.org > > Subject: Re: ConjunctionScorer access >

Re: ConjunctionScorer access

2015-10-22 Thread Sheng
f hacking BooleanWeight, you should use a version of > SpanPayloadCheckQuery? There isn't anything that combines checking and > scoring for payloads at the moment, but I don't think it would be too > difficult to write one. > > Alan Woodward > www.flax.co.uk > > > On 22 Oct 20

ConjunctionScorer access

2015-10-21 Thread Sheng
It's a bummer Lucene makes the constructor of ConjunctionScorer non-public. I wanted to extend from this class in order to tweak its behavior for my use case. Is it possible to change it to protected in future releases ?

similarity per query

2015-10-08 Thread Sheng
Let's say I have a boolean query "a AND b", is it possible I run the search for this boolean query with similarity "Sa" set for query "a", and similarity "Sb" set for query "b" ?

Facet label index exception

2015-07-30 Thread Sheng
this is the first time I come across error like this, label already exists: Facet label: ..., prev ordinal: ... It shows error happened at line 131 of CompactLabelToOrdinal.java Any idea for what could go wrong? I am using Lucene 4.10.2 Thanks!

drilldown query with null base query

2015-07-24 Thread Sheng
This is what I am going to achieve - running a drill down query with baseQuery = null / MatchAllDocsQuery(), and expecting the index returning all the documents that matches the drill down path(s). So it returns nothing back to me, however as long as I make the basequery to search a specific term

Re: drilldown query with null base query

2015-07-24 Thread Sheng
Just found out more, drill down query will MatchAllDocsQuery as base query will work if only one path is added, and starts to return empty results if more than 1 path are added. This is very strange... On Fri, Jul 24, 2015 at 12:12 PM, Sheng sheng...@gmail.com wrote: This is what I am going

Re: Using lucene queries to search StringFields

2015-06-19 Thread Sheng
1. What is the analyzer are you using for indexing ? 2. you cannot fuzzy match field name - that for sure will throw exception 3. I would start from a simple, deterministic query object to rule out all unlikely possibilities first before resorting to parser to generate that for you. On Fri, Jun

Re: Exception while updating a lucene document

2015-04-25 Thread Sheng
seems like you forgot to do facetsConfig.setMultiValued(`field`, true) too . On Sat, Apr 25, 2015 at 7:37 AM, Gimantha Bandara giman...@wso2.com wrote: Hi, I was able to fix the problem.. the issue was with my wrong usage of FacetConfig class. I was creating Document using facetConfig.build

Customscorequery and payload

2015-02-11 Thread Sheng
as at the document level during search. I am using latest 4.10.x Lucene. Thanks, Sheng

Re: IndexSearcher creation policy question

2014-08-22 Thread Sheng
Your best bet is to use a searcher manager to manage the searcher instance, and only refresh the manager if writes are committed. This way the same searcher instances can be shared by multiple threads. For the paging, if you want to have a guaranteed consistent view, you have to keep around the

Re: WhiteSpaceTokenizer

2014-08-15 Thread Sheng
-4148 I actually filed a Jira for this already. No action so far, but PLEASE feel free to comment on it: https://issues.apache.org/jira/browse/LUCENE-5785 -- Jack Krupansky -Original Message- From: Sheng Sent: Thursday, August 14, 2014 11:38 PM To: java-user@lucene.apache.org

Re: Lucene newbie in need of a hint

2014-08-14 Thread Sheng
At a side note, there is a race condition in your code: what if a search on the old reader is in progress while you call reader.close()? You need to call reader incref (should be tryincref, as you need to consider what if the reader is closed at the moment you call incref on it) and decref

WhiteSpaceTokenizer

2014-08-14 Thread Sheng
The length of token has to be shorter than 255, otherwise there will be unpredictable behaviors for this tokenizer. I see 255 is set as a private final in the src code, but there is no documentation to explicitly address that. Can we either make that number configurable (if not an option, I'd like

Re: Questions for facets search

2014-08-13 Thread Sheng
like a map is quite similar to how we store the payload :) We use an integer as payload for each token, and store more complicated information in another Lucene index with the integer payload as the key for each document. Sheng On Wednesday, August 13, 2014, Shai Erera ser...@gmail.com wrote

Problem of calling indexWriterConfig.clone()

2014-08-12 Thread Sheng
(Version.LUCENE_47, null); // set whatever you need on this instance . IndexWriter writer = new IndexWriter(directory, masterCfg.clone()); Wouldn't this just work? If not, could you paste the stack trace of the exception you're getting? On Mon, Aug 11, 2014 at 9:01 PM, Sheng sheng

Questions for facets search

2014-08-12 Thread Sheng
lucene cache, since they are separated? We have a dynamic list of faceted fields, being able to quickly rebuild the whole facet lucene cache would be quite desirable. Again, I am using lucene 4.7, thanks in advance to your answers! Sheng

Problem of calling indexWriterConfig.clone()

2014-08-11 Thread Sheng
I tried to create a clone of indexwriteconfig with indexWriterConfig.clone() for re-creating a new indexwriter, but I then I got this very annoying illegalstateexception: clone this object before it is used. Why does this exception happen, and how can I get around it? Thanks!

Re: Problem of calling indexWriterConfig.clone()

2014-08-11 Thread Sheng
be called with .clone() at all? On Mon, Aug 11, 2014 at 9:52 PM, Vitaly Funstein vfunst...@gmail.com wrote: Looks like you have to clone it prior to using with any IndexWriter instances. On Mon, Aug 11, 2014 at 2:49 PM, Sheng sheng...@gmail.com wrote: I tried to create a clone

Re: Problem of calling indexWriterConfig.clone()

2014-08-11 Thread Sheng
. On Mon, Aug 11, 2014 at 7:12 PM, Sheng sheng...@gmail.com wrote: So the indexWriterConfig.clone() failed at this step: clone.indexerThreadPool = indexerThreadPool http://grepcode.com/file/repo1.maven.org/maven2/org.apache.lucene/lucene-core/4.7.0/org/apache/lucene/index