Re: splitting docIds from a search by segment [SEC=UNOFFICIAL]

2013-11-04 Thread Michael McCandless
On Sun, Nov 3, 2013 at 7:59 PM, Stephen GRAY wrote: > UNOFFICIAL > > Hi Mike, > > I ran it again and this time the two methods came out about the same: 168 - > 288 ms to process 173,000 documents for the walking method and 160 - 205 ms > for the MultiDocValues method . I don't know what was happ

Lucene Empty Non-empty Fields

2013-11-04 Thread manoj raj
I did some experiments for finding empty fields, But i want to know whether there is any other better method. Have to reduce hard disk space. Method 1: Add "NULL String" in empty fields We can search with null string for empty column & non empty column Observations: - Index size will grow.

Re: Lucene Empty Non-empty Fields

2013-11-04 Thread Michael McCandless
You can also use FieldCache.getDocsWithField? Mike McCandless http://blog.mikemccandless.com On Mon, Nov 4, 2013 at 7:33 AM, manoj raj wrote: > I did some experiments for finding empty fields, But i want to know whether > there is any other better method. Have to reduce hard disk space. > > >

SpanNearQuery behaviour?

2013-11-04 Thread Yu Zhou
Hi, We use SpanNearQueries intensively for proximity searching. However, we are confused by two different ways to use them. Could anybody explain in details what we can expect for nested and flatten SpanNearQueries? We used to build nested SpanNearQueries. However, we found that using nested S

Re: JLemmaGen project

2013-11-04 Thread Lance Norskog
This is very cool! Lemmatization is an important tool for making search work better. Would you consider changing the licensing to the Apache 2.0 license? On 10/23/2013 08:17 AM, Michal Hlavac wrote: Hi, I rewrote lemmatizer project LemmaGen (http://lemmatise.ijs.si/) to java. Originally it's

RE: Lucene Empty Non-empty Fields

2013-11-04 Thread Vitaly Funstein
Or FieldValueFilter - that's probably easier to use. > -Original Message- > From: Michael McCandless [mailto:luc...@mikemccandless.com] > Sent: Monday, November 04, 2013 4:37 AM > To: Lucene Users > Subject: Re: Lucene Empty Non-empty Fields > > You can also use FieldCache.getDocsWithFiel

RE: splitting docIds from a search by segment [SEC=UNOFFICIAL]

2013-11-04 Thread Stephen GRAY
UNOFFICIAL Hi Mike, The hits do seem to come back in docId order. I don't know if they do that every time though. Might be best to sort them. Compiling statistics in the collector sounds like a good idea. I might do that. Thanks, Steve -Original Message- From: Michael McCandless [mail

Re: JLemmaGen project

2013-11-04 Thread Dawid Weiss
Hi Michal, Pretty cool. Your work reminds me of what Leo Galambos did a while back: http://link.springer.com/chapter/10.1007/978-3-540-39985-8_22 I believe his implementation is still available in the Egothor search engine project. Dawid On Wed, Oct 23, 2013 at 5:17 PM, Michal Hlavac wrote:

Twitter analyser

2013-11-04 Thread Stéphane Nicoll
Hi, I am building an application that indexes tweet and offer some basic search facilities on them. I am trying to find a combination where the following would work: * foo matches the foo word, a mention (@foo) or the hashtag (#foo) * @foo only matches the mention * #foo matches only the hashtag