Re: Deleting from Index by URL field: is it safe?

2008-11-28 Thread German Kondolf
It works exactly as it does when you search of that term. Review in your index creation, if you store it without analyzing it (Index.UN_TOKENIZED), it will only match that document when you have an exact URL. It's possible that the URL is not unique enought in your domain, there is no other uniqu

Re: Controlled Indexing -New Feature

2008-11-28 Thread German Kondolf
You could use a "reverse" stop-word filter. The straight "StopFilter" actually removes the keywords that match with a given Set of words, you could do the reverse logic of that an remove ALL keywords that doesn't match that Set. Take a look at StopFilter and StandardAnalyzer ;) On Fri, Nov 28, 2

Re: Time of processing hits.doc()

2007-11-19 Thread German Kondolf
You sould never use the hits for other use than retrieving a group of results (usually a page of 10-20-30 docs). You could see Apache Solr's implementation of faceted search. I've use that code as a guide to group & count diferent facets (or conditions, fields as you wanna call it), is pretty fast

Re: Time of processing hits.doc()

2007-11-19 Thread German Kondolf
Why do you need the doc's info? If you're grouping you may not need detail on each group condition. Here is a sample of faceted (grouped) search: http://listados.deremate.com.ar/mp3 (Sorry, it's in spanish) Simply I collect every facet's bitset and intersect it against the result's bitset (keywo

Re: Time of processing hits.doc()

2007-11-19 Thread German Kondolf
I have already defined a Lucene Filter for every "id" of "ubicacion". I just create the bitset for every value, and count it against the result. One possible optimization is to read the terms of the field you're trying to "group", that's the optimization we'll be working soon on our app. I never

Re: Time of processing hits.doc()

2007-11-19 Thread German Kondolf
A facet is a group condition, could be a single value of the doc or a set of filters. On Nov 19, 2007 1:09 PM, Haroldo Nascimento <[EMAIL PROTECTED]> wrote: > German, > > When You said: > "I collect every facet's bitset ... " > what is a facet ? Is there the each option of filter of your site ?

Re: RAMDirectory vs FSDirectory

2007-11-27 Thread German Kondolf
There is a constructor in the RAMDirectory that already does that. http://lucene.apache.org/java/2_2_0/api/org/apache/lucene/store/RAMDirectory.html I don't think that worth modify the internal Lucene's code to achieve a extra bit of performance... What would you do on next version? Modify it agai

Re: Closing index searchers ...

2007-11-29 Thread German Kondolf
I had the same issue, and end up doing my own reference counting using "acquire/release" strategy. I used a single instance per searcher, every "acquire" counts +1 and every "release" count -1, when a index is switched it receives a "dispose" signal, then the release checks if there are processing

Re: how to kill IndexSearcher object after every search

2007-11-29 Thread German Kondolf
Yes, you just call "close()" method. But, why would you like to do that? The performance tips remarks exactly the opposite, keeping it alive as long as possible favors internal lucene's caching of terms, query and other internal objects. On Nov 29, 2007 11:14 AM, Sebastin <[EMAIL PROTECTED]> wrot

Re: Searching for null (empty) fields, how to use -field:[* TO *]

2008-03-11 Thread German Kondolf
Hi, I was looking for the same functionality, after a short googling didn't find a solution, I assume it must exist but I finally decided to "fill" those empty fields with a representative "null value", "__null__", this is possible only if you know previously ALL the fields. I'd like to know if t

Re: Searching for null (empty) fields, how to use -field:[* TO *]

2008-03-11 Thread German Kondolf
*:* is parsed as a MatchAllDocsQuery? I've got some preformance issues in Lucene 2.2 because MatchAllDocsQuery ask for a "isDeleted()" for every document, I didn't tried it in 2.3. On Tue, Mar 11, 2008 at 11:34 AM, Mark Miller <[EMAIL PROTECTED]> wrote: > You cannot have a purely negative query l

Re: IndexSearcher thread safety

2008-03-11 Thread German Kondolf
As Michael said, you can share it, and you should share it, this will improve performance and reuse the internal cache associated to the IndexSearcher (term cache, filters cache, etc). On Tue, Mar 11, 2008 at 7:31 AM, J B <[EMAIL PROTECTED]> wrote: > Hi, > > Are instances of IndexSearcher thread

Re: Searching for null (empty) fields, how to use -field:[* TO *]

2008-03-11 Thread German Kondolf
8 at 11:54 AM, Yonik Seeley <[EMAIL PROTECTED]> wrote: > On Tue, Mar 11, 2008 at 10:41 AM, German Kondolf > <[EMAIL PROTECTED]> wrote: > > *:* is parsed as a MatchAllDocsQuery? > > > > I've got some preformance issues in Lucene 2.2 because > > Mat

Re: FilteredQuery

2008-08-25 Thread German Kondolf
Exactly as Otis sais, you should use MatchAllDocs as query, but it has a drawback in performance, it checks every single document deletion state, I've solved the issue by making my own EnhancedMatchAllDocs query that is optimized to do not check this document state. Perhaps the SegmentReader shoul