Re: Boolean Query search performance

2008-03-05 Thread Chris Hostetter
: If I do a query.toString(), both queries give different results, which : is probably a clue (additional paren's with the BooleanQuery) : : Query.toString the old way using queryParser: : +(id:1^2.0 id:2 ... ) +type:CORE : : Query.toString the new way using BooleanQuery: : +((id:1^2.0)

Re: storing position - keyword

2008-03-05 Thread 1world1love
First off Karl, thanks for your reply and your time. karl wettin-3 wrote: > > One could also say you are classifying your data based on keywords in > the text? > I probably didn't explain myself very well or more specifically provide a good example. In my case, there really isn't any relatio

Re: Boolean Query search performance

2008-03-05 Thread Karl Wettin
Beard, Brian skrev: I'm using lucene 2.2.0. I'm in the process of re-writing some queries to build BooleanQueries instead of using query parser. Bypassing query parser provides almost an order of magnitude improvement for very large queries, but then the search performance takes 20-30% longer.

Re: storing position - keyword

2008-03-05 Thread Karl Wettin
1world1love skrev: Greetings all. I am indexing a set of documents where I am extracting terms and mapping them to a controlled vocabulary and then placing the matched vocabulary in a keyword field. One could also say you are classifying your data based on keywords in the text? What I want to

Re: changing scoring formula

2008-03-05 Thread Michael Stoppelman
Sumit, The class you'll end up subclassing from would be: http://hudson.zones.apache.org/hudson/job/Lucene-trunk/javadoc//org/apache/lucene/search/Similarity.htmlor http://hudson.zones.apache.org/hudson/job/Lucene-trunk/javadoc//org/apache/lucene/search/DefaultSimilarity.html On an IndexSearcher

Re: More IndexDeletionPolicy questions

2008-03-05 Thread Tim Brennan
Ha, you know it never occurred to me that the driver might do this for me...I'll test it out. Thanks, --tim - "Michael McCandless" <[EMAIL PROTECTED]> wrote: > Oh then I don't think you need a custom deletion policy. > > A single NFS client emulates delete-on-last-close semantics. Ie, i

changing scoring formula

2008-03-05 Thread sumittyagi
is there any way to change the score of the documents. Actually i want to modify the scores of the documents dynamically, everytime for a given query the results will be sorted according to "lucene scoring formula + an equation". how can i do that...i saw that lucene scoring page but i am not gett

Boolean Query search performance

2008-03-05 Thread Beard, Brian
I'm using lucene 2.2.0. I'm in the process of re-writing some queries to build BooleanQueries instead of using query parser. Bypassing query parser provides almost an order of magnitude improvement for very large queries, but then the search performance takes 20-30% longer. I'm adding boost valu

storing position - keyword

2008-03-05 Thread 1world1love
Greetings all. I am indexing a set of documents where I am extracting terms and mapping them to a controlled vocabulary and then placing the matched vocabulary in a keyword field. What I want to know is if there is a way to store the original term location with the keyword field? Example Text: "T

Re: applying patch in Eclipse to get SpanHighlighter functionality

2008-03-05 Thread Donna L Gresh
Thanks Mark- I'm very much a newbie in all this patching stuff, but I don't think I'm using anything other than built-in Eclipse functionality using team->apply patch. And it's clearly not working well. I took a look at TortoiseSVN but I think it's way overkill for me-- oh well; maybe I'll just

Re: Reusing same IndexSearcher

2008-03-05 Thread Mindaugas Žakšauskas
Hi, Thanks for your reply. I can't think of any way to ensure fair file descriptor usage when there are many active instances of IndexSearcher (all containing IndexReader) running. Our project installations tend to run on heavily loaded sites, where a lot of information is read and written at the

Re: Reusing same IndexSearcher

2008-03-05 Thread Michael McCandless
Actually you do need to make a new IndexSearcher every time you reopen a new IndexReader. However, that should not lead to leaking file descriptors. All open files are held by IndexReader (not IndexSearcher), so as long as you are properly closing your IndexReader's you shouldn't use up

Reusing same IndexSearcher

2008-03-05 Thread Mindaugas Žakšauskas
Hi, Another newbie here...using Lucene 2.3.1 on Linux. Hopefully anyone could advice me on /subj/. Both IndexSearcher Javadoc and Lucene FAQ says the IndexSearcher should be reused as it's thread safe. That's OK. Now if I have index changed, I need to reopen the IndexReader that is associated wit

Re: applying patch in Eclipse to get SpanHighlighter functionality

2008-03-05 Thread Mark Miller
Are you using subclipse to apply the patch? Its not very good at it. I use TortoiseSVN for patching, as its much smarter about these things. With TortoiseSVN, you just patch from the root dir and it knows you are referring to the contrib folder thats under the root directory (the directory you

applying patch in Eclipse to get SpanHighlighter functionality

2008-03-05 Thread Donna L Gresh
I have downloaded the Lucene (core, 2.3.1) code and created a project using Eclipse (pointing to src/java) to use it. That works fine, along with the contrib highlighter jar file from the standard distribution. I have also successfully added an additional Eclipse project for the (standard) High

RE: Using a thesaurus/onthology

2008-03-05 Thread Duan, Nick
Nutch has a ontology plugin based on Jena. http://wiki.apache.org/nutch/OntologyPlugin I haven't used it. Just by looking at the source code, it seems it just a Owl parser. So apparently it only works with sources defined in OWL format, not others such as RDF. I think you need to extend the sou

Re: Using a thesaurus/onthology

2008-03-05 Thread Mathieu Lecarme
Borgman, Lennart a écrit : Is there any possibility to use a thesaurus or an onthology when indexing/searching with Lucene? Yes. the WordNet contrib do that. And with a token filter, it's easy to use your own. What do you wont to do? M. ---

Using a thesaurus/onthology

2008-03-05 Thread Borgman, Lennart
Is there any possibility to use a thesaurus or an onthology when indexing/searching with Lucene? - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]

RE: C++ as token in StandardAnalyzer?

2008-03-05 Thread Tom Conlon
Hi Donna - See previous post below that may help. Tom Hi, In case this is of help to others: Crux of problem: I wanted numbers and characters such as # and + to be considered. Solution: implement a LowercaseWhitespaceAnalyzer and a Lowerc

Re: Why indexing database is necessary? (RE: indexing database)

2008-03-05 Thread Shalin Shekhar Mangar
Hi, We have built a data import tool which can read from Databases and add them to Solr. We found that making content available for full text search and faceted search was a common use case and usually everyone ends up writing a custom ETL based tool for this task. Therefore we're contributing thi

Re: More IndexDeletionPolicy questions

2008-03-05 Thread Michael McCandless
Oh then I don't think you need a custom deletion policy. A single NFS client emulates delete-on-last-close semantics. Ie, if the deletion of the file, and the held-open file handles, happen through a single NFS client, then it emulates delete-on-last-close semantics for you, by creating t

Re: Incorrect Token Offset when using multiple fieldable instance

2008-03-05 Thread Michael McCandless
Well, first off, sometimes the thing being indexed isn't a string, so you have no stringValue to get its length. It could be a Reader or a TokenStream. Second off, it's conceivable that an analyzer computes its own "interesting" offsets that are not in fact simple indices into the stri

Re: More IndexDeletionPolicy questions

2008-03-05 Thread Tim Brennan
No, I have multiple readers in the same VM. I track open readers within my VM and save those reader's commit points until the readers are gone. --tim - "Michael McCandless" <[EMAIL PROTECTED]> wrote: > OK got it. Yes, sharing an index over NFS requires your own > DeletionPolicy. > >

Re: NO_NORM and TOKENIZED

2008-03-05 Thread Michael McCandless
NO_NORMS means "do index the field as a single token (ie, do not tokenize the field), and, do not store norms for it". Mike On Mar 5, 2008, at 5:20 AM, <[EMAIL PROTECTED]> <[EMAIL PROTECTED]> wrote: Hm, what exactly does NO_NORM mean? Thank you --

RE: NO_NORM and TOKENIZED

2008-03-05 Thread spring
Hm, what exactly does NO_NORM mean? Thank you - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]

Re: Security filtering from external DB

2008-03-05 Thread Gabriel Landais
Jake Mannix a écrit : Gabriel, You can make this search much more efficient as follows: say that you have a method public BooleanQuery createQuery(Collection allowedUUIDs); that works as you describe. Then you can easily create a useful reusable filter as follows: Filter filter = new Ca

Re: Incorrect Token Offset when using multiple fieldable instance

2008-03-05 Thread Renaud Delbru
Do you know if there will be side-effects if we replace in DocumentWriter$FieldData#invertField offset = offsetEnd+1; by offset = stringValue.length(); I still not understand the reason of such choice for the incrementation of the start offset. Regards. Michael McCandless wrote: This is ho

Re: More IndexDeletionPolicy questions

2008-03-05 Thread Michael McCandless
OK got it. Yes, sharing an index over NFS requires your own DeletionPolicy. So presumably you have readers on different machines accessing the index over NFS, and then one machine that does the writing to that index? How so you plan to get which commit point each of these readers is c

Re: Incorrect Token Offset when using multiple fieldable instance

2008-03-05 Thread Michael McCandless
This is how Lucene has worked for quite some time (since 1.9). When there are multiple fields with the same name in one Document, each field's offset starts from the last offset (offset of the last token) seen in the previous field. If tokens are skipped at the end there's no way IndexWri

Re: NO_NORM and TOKENIZED

2008-03-05 Thread Michael McCandless
Correct, they are logically orthogonal, and I agree the API is somewhat confusing since "NO_NORMS" is mixing up two things. To get a tokenized field without norms you can create the field with Index.TOKENIZED, and then call setOmitNorms(true). Note that norms "spread" during merges, so, i