How to filter results below perticular score

2006-09-19 Thread Bhavin Pandya
Hi all, How to put limit in lucene that "dont return me any document which has score less than 0.25" Thanks. Bhavin pandya

Re: How to filter results below perticular score

2006-09-19 Thread karl wettin
On 9/19/06, Bhavin Pandya <[EMAIL PROTECTED]> wrote: Hi all, How to put limit in lucene that "dont return me any document which has score less than 0.25" You implement a HitCollector and break out when you reach such low score.

Question about termDocs.read(docs, freqs)

2006-09-19 Thread Kroehling, Thomas
Hi, I am trying to write a WildcardFilter in order to prevent TooManyBooleanClauses and high memory usage. I wrap a Filter in a ConstantScoreQuery. I enumerate over the WildcardTerms for a query. This way I can set a maximum number of terms which i will evaluate. If too many terms match, I throw an

Re: Question about termDocs.read(docs, freqs)

2006-09-19 Thread Erick Erickson
I'll side-step the explanations part of your mail since I don't know how to answer.. But a few observations, see below. On 9/19/06, Kroehling, Thomas <[EMAIL PROTECTED]> wrote: Hi, I am trying to write a WildcardFilter in order to prevent TooManyBooleanClauses and high memory usage. I wrap a Fi

AW: Question about termDocs.read(docs, freqs)

2006-09-19 Thread Kroehling, Thomas
Thanks for the answer. It is not really necessary for me to read the documents. That's what you get if you find code searching the net and using it without really thinking or understanding it. I will just step through the terms and set the bits as you said. I will add some maximum number of term

Analysis/tokenization of compound words

2006-09-19 Thread Otis Gospodnetic
Hi, How do people typically analyze/tokenize text with compounds (e.g. German)? I took a look at GermanAnalyzer hoping to see how one can deal with that, but it turns out GermanAnalyzer doesn't treat compounds in any special way at all. One way to go about this is to have a word dictionary and

Re: Question about termDocs.read(docs, freqs)

2006-09-19 Thread Erick Erickson
Glad I actually wrote something helpful .. Memories for filters shouldn't be a problem, filters take up 1 bit per document (plus some tiny overhead for a Bitset). I think the time is actually taken up on the number of terms that match each wildcard as well as the number of terms. Really, I expec

Re: Analysis/tokenization of compound words

2006-09-19 Thread Jonathan O'Connor
Otis, I can't offer you any practical advice, but as a student of German, I can tell you that beginners find it difficult to read German words and split them properly. The larger your vocabulary the easier it is. The whole topic sounds like an AI problem: A possible algorithm for German (no ide

Re: Analysis/tokenization of compound words

2006-09-19 Thread Marvin Humphrey
On Sep 19, 2006, at 9:21 AM, Otis Gospodnetic wrote: How do people typically analyze/tokenize text with compounds (e.g. German)? I took a look at GermanAnalyzer hoping to see how one can deal with that, but it turns out GermanAnalyzer doesn't treat compounds in any special way at all. O

Re: How to filter results below perticular score

2006-09-19 Thread Paul Elschot
On Tuesday 19 September 2006 11:49, karl wettin wrote: > On 9/19/06, Bhavin Pandya <[EMAIL PROTECTED]> wrote: > > Hi all, > > > > How to put limit in lucene that "dont return me any document which has score less than 0.25" > > You implement a HitCollector and break out when you reach such low sco

Re: How to filter results below perticular score

2006-09-19 Thread Paul Elschot
Sorry, I sent the message before completing it. On Tuesday 19 September 2006 19:45, Paul Elschot wrote: > On Tuesday 19 September 2006 11:49, karl wettin wrote: > > On 9/19/06, Bhavin Pandya <[EMAIL PROTECTED]> wrote: > > > Hi all, > > > > > > How to put limit in lucene that "dont return me any do

Help wanted

2006-09-19 Thread S R
Hello, I have just downloaded LUCENE. I am not an expert in Java. Could someone lead me in the first few steps.. Thank you - Do you Yahoo!? Get on board. You're invited to try the new Yahoo! Mail.

Re: Help wanted

2006-09-19 Thread Yonik Seeley
On 9/19/06, S R <[EMAIL PROTECTED]> wrote: I have just downloaded LUCENE. I am not an expert in Java. Could someone lead me in the first few steps.. The first few steps to what? First, figure out if you want straight lucene-java, or another application using lucene. Lucene is a library that

Re: Help wanted

2006-09-19 Thread S R
Thanks Yonik for the reply. What I want is to to index a set of text documents (about 200 .txt files) in windows invironment so I can search in them. What I am doing is actually evaluating different search or indexing tools. Thank you. Yonik Seeley <[EMAIL PROTECTED]> wrote: On

DisjunctionMaxQuery explaination

2006-09-19 Thread Find Me
I was trying to print out the score explanation by a DisjunctionMaxQuery. Though there is a hit score > 0 for the results, there is no detailed explanation. Am I doing something wrong? In the following output, each hit has two lines. The first line is the hit score and the second line is the expl

Re: DisjunctionMaxQuery explaination

2006-09-19 Thread Chris Hostetter
: In the following output, each hit has two lines. The first line is the hit : score and the second line is the explanation given by the : DisjunctionMaxQuery. how are you printing the Explanation? .. are you using the toString()? can you post a small self contained code example showing how you

Re: DisjunctionMaxQuery explaination

2006-09-19 Thread Find Me
public void explainSearchScore(String indexLocation, DisjunctionMaxQuery disjunctQuery){ IndexSearcher searcher = new IndexSearcher(IndexReader.open (indexLocation)); Hits hits = searcher.search(disjunctQuery); if(hits == null) return; for(int i = 0; i < hits.leng

Re: DisjunctionMaxQuery explaination

2006-09-19 Thread Find Me
Forgot to add the hits.score() to print out the hits score. public void explainSearchScore(String indexLocation, DisjunctionMaxQuery disjunctQuery){ IndexSearcher searcher = new IndexSearcher(IndexReader.open(indexLocation)); Hits hits = searcher.search(disjunctQuery);

Re: DisjunctionMaxQuery explaination

2006-09-19 Thread Chris Hostetter
The "i" you pass to Hits.score is the index of the result in that Hits object ... the "i" you pass to Searcher.explain should be the absolute docid (the searcher has no way of knowing about your Hits, or what order they are in). Try something like... searcher.explain(disjunctQuery, hits

Re: Analysis/tokenization of compound words

2006-09-19 Thread eks dev
Hi Otis, Depends what yo need to do with it, if you need this to be only used as "kind of stemming" for searching documents, solution is not all that complex. If you need linguisticly correct splitting than it gets complicated. for the first case: Build SuffixTree with your dictionary (hope you

Re: Analysis/tokenization of compound words

2006-09-19 Thread eks dev
I just remembered now on minor thing that made our life easier, recusive loop has some primitive stripEndings() method that removes most of variable endings all these ungs/ungen/... before looking up in SuffixTree. This reduces your dictionary needs dramatically. I think this is partially done

Re: Help wanted

2006-09-19 Thread Simon Willnauer
Rather than writing some more introductions to lucene I just give you a hand with google. GoogleQuery: lucene java intro http://today.java.net/pub/a/today/2003/07/30/LuceneIntro.html This should lead you to what you are looking for. best regards simon On 9/19/06, S R <[EMAIL PROTECTED]> wrote

Re: How to filter results below perticular score

2006-09-19 Thread Chris Hostetter
please see the FAQ "Can I filter by score?" ... http://wiki.apache.org/jakarta-lucene/LuceneFAQ#head-912c1f237bb00259185353182948e5935f0c2f03 : Date: Tue, 19 Sep 2006 14:07:43 +0530 : From: Bhavin Pandya <[EMAIL PROTECTED]> : Reply-To: java-user@lucene.apache.org, Bhavin Pandya <[EMAIL PROTECTE

Re: Help wanted

2006-09-19 Thread Mark Miller
I'll one up you: http://www.manning.com/hatcher2/ Might as well save yourself a whole lot of time and just buy the book. If you're going to use Lucene it might as well be required. Simon Willnauer wrote: Rather than writing some more introductions to lucene I just give you a hand with google

Re: Help wanted

2006-09-19 Thread Michael McCandless
Mark Miller wrote: I'll one up you: http://www.manning.com/hatcher2/ Might as well save yourself a whole lot of time and just buy the book. If you're going to use Lucene it might as well be required. There is also "Getting Started" on the Lucene web site: http://lucene.apache.org/java/doc

lucene based frameworks/servers: solr, nutch, compass - which one is for what????

2006-09-19 Thread Vladimir Olenin
Hi, Couple of people mentioned here SOLR as a 'new' Lucene based search server. But NUTCH is also Lucene based. Also, there is an OpenSymphony initiative called 'Compass', which is rather an integration framework than server. I wonder if anyone can come up with a small summary of what are scope

Re: lucene based frameworks/servers: solr, nutch, compass - which one is for what????

2006-09-19 Thread Otis Gospodnetic
Hi Vladimir, Yes, you are close. Solr doesn't use SOAP, though, and JSON is only one of its outputs. Solr can be described as a REST-ish web service. You trigger it via HTTP GET requests and responses are XML, or JSON, or something else in the future. I think you are right about Compass, bu

Re: Analysis/tokenization of compound words

2006-09-19 Thread Daniel Naber
On Tuesday 19 September 2006 22:41, eks dev wrote: > ahh, another one, when you strip suffix, check if last char on remaining > "stem" is "s" (magic thing in German), delete it if not the only > letter do not ask why, long unexplained mistery of German language This is called "Fugenelement" a