How build Lucene in Action examples

2009-02-27 Thread tolkienGR
Hi !!! I'm new in Lucene.I started reading Lucene in action (first edition) , i downloaded the code from http://www.manning.com/hatcher2/ . I read somewhere that that code with written in an old of lucene and i should download the code from the new version from here: http://www.manning.com/hatche

Re: TopDocCollector

2009-02-27 Thread Yonik Seeley
On Fri, Feb 27, 2009 at 6:43 AM, wrote: > Looking into TopDocCollector code, I have some questions: > > * How can a hit have a score of <=0? A function query, or a negative boost would do it. Solr has always allowed all scores through w/o screening out <=0 -Yonik http://www.lucidimagination.co

Re: TopDocCollector

2009-02-27 Thread Michael McCandless
wrote: Looking into TopDocCollector code, I have some questions: * How can a hit have a score of <=0? I'm not sure... * What happens if the first hit has the highest score of all hits? It seems that topDocs whould then contain only this doc!? That works fine, because hq.size() is sti

Re: Merging two tokenized fields

2009-02-27 Thread liat oren
Thanks for your answer - I will store both texts (I have my own objects' ids that i use to identify the documents) and will index the text after the merge. Thank you, Liat 2009/2/26 Erick Erickson > Reconstructing a field from an index is > 1> slow > 2> lossy (what about stemmed words? stopword

Re: queryNorm affect on score

2009-02-27 Thread Peter Keegan
Got it. This is another example of why scores can't be compared between (even similar) queries. (we don't) Thanks. On Fri, Feb 27, 2009 at 11:39 AM, Yonik Seeley wrote: > On Fri, Feb 27, 2009 at 9:15 AM, Peter Keegan > wrote: > > Any comments about this? Is this just the way queryNorm works or

Re: queryNorm affect on score

2009-02-27 Thread Yonik Seeley
On Fri, Feb 27, 2009 at 9:15 AM, Peter Keegan wrote: > Any comments about this? Is this just the way queryNorm works or is this a > bug? That's just the way it works... since it's applied to all clauses, it really just changes the range of scores returned, not relative ordering of documents or an

Query against newly created index.. Do not work

2009-02-27 Thread Lukas, Ray
I can now create indexes with Nutch, and see them in Luke.. this is fantastic news, well for me it is beyond fantastic.. Now I would like to (need to) query them, and to that end I wrote the following code segment. int maxHits = 1000; NutchBean nutchBean = new N

Re: queryNorm affect on score

2009-02-27 Thread Peter Keegan
Any comments about this? Is this just the way queryNorm works or is this a bug? Thanks, Peter On Fri, Feb 20, 2009 at 4:03 PM, Peter Keegan wrote: > > The explanation of scores from the same document returned from 2 similar > queries differ in an unexpected way. There are 2 fields involved, 'con

Re: How to run Lucene in Action TestCase Examples

2009-02-27 Thread Erick Erickson
I have no clue about netbeans, but I *do* know that you'll need to provide more details than "it fails" to get any meaningful help. What version of JUnit? Lucene? NetBeans? what error messages/stack traces? Imagine you were trying to respond to your own email knowing nothing except what you wrote

Re: Use of scanned documents for text extraction and indexing

2009-02-27 Thread Bastian Buch
You can use Tesseract, an openSource OCR Engine owned from Google. Its native C Code and to use it in Java you should use JNI or direct process creation. There is no PDF support, but you can use imagemagick to convert those docs on the fly. The engine scan documents line by line without trying

ApacheCon Lucene Meetup

2009-02-27 Thread Grant Ingersoll
If you're in or around Amsterdam during the week of ApacheCon (Mar 23-27), check out the Lucene Meetup we are organizing: http://wiki.apache.org/lucene-java/LuceneMeetupMarch2009 -- Grant Ingersoll http://www.lucidimagination.com/ Search the Lucene ecosystem (Lucene/So

TopDocCollector

2009-02-27 Thread spring
Looking into TopDocCollector code, I have some questions: * How can a hit have a score of <=0? * What happens if the first hit has the highest score of all hits? It seems that topDocs whould then contain only this doc!? public void collect(int doc, float score) { 57 if (score > 0.0f) { 58

Re: ThreadLocal / Memory Problems with Analyzer class

2009-02-27 Thread rviper
ok, also see https://issues.apache.org/jira/browse/LUCENE-1186 -- View this message in context: http://www.nabble.com/ThreadLocal---Memory-Problems-with-Analyzer-class-tp22241132p22241327.html Sent from the Lucene - Java Users mailing list archive at Nabble.com. -

ThreadLocal / Memory Problems with Analyzer class

2009-02-27 Thread rviper
hi, environment: lucene 2.4, jdk 1.6 i'm using quartz jobs to schedule indexing tasks; currently i'm creating a new instance of an analyzer each time i open the index; after some time i'm getting a out of memory; Analyzer Class: private ThreadLocal tokenStreams; since the analyzer class is usi