Re: What is the difference between the "AND" and "+" operator?

2010-11-30 Thread maven apache
2010/11/30 Chris Hostetter > > : Subject: What is the difference between the "AND" and "+" operator? > > In this query, "y" is mandatory, but documents that also match "x" will > score higher then documents that only match "y"... > >x +y > > In both of these queries, "x" and "y" are both

Re: What is the difference between the "AND" and "+" operator?

2010-11-30 Thread Anshum
with a single '=' :) -- Anshum Gupta http://ai-cafe.blogspot.com On Tue, Nov 30, 2010 at 3:03 PM, maven apache wrote: > 2010/11/30 Chris Hostetter > > > > > : Subject: What is the difference between the "AND" and "+" operator? > > > > In this query, "y" is mandatory, but documents that also ma

field cross search in lucene

2010-11-30 Thread maven apache
Hi: I have two documents: title body Lucene In ActionA high-performance, full-featured text search engine library. Lucene Practice Use lucene in your application Now,I search "lucene performance" using private String[] f = { "title", "body"}; private

Re: field cross search in lucene

2010-11-30 Thread Shai Erera
Can you try to do: QueryParser qp = new MultiFieldQueryParser(params); qp.setDefaultOperator(Operator.AND); qp.parse(query); See if that helps -- the static parse method instantiates its own QP and therefore you cannot tell it that the default OP is AND. Shai On Tue, Nov 30, 2010 at 1:42 PM, mav

Re: field cross search in lucene

2010-11-30 Thread Ian Lea
Sounds like you need to call setDefaultOperator(AND_OPERATOR). See the javadocs for MultiFieldQueryParser constructors. They give examples. Personally I tend to get confused with all the Should/Must/And/Or combinations with searches on multiple terms across multiple fields and wherever possible

Re: field cross search in lucene

2010-11-30 Thread maven apache
2010/11/30 Shai Erera > Can you try to do: > QueryParser qp = new MultiFieldQueryParser(params); > qp.setDefaultOperator(Operator.AND); > qp.parse(query); > > See if that helps -- the static parse method instantiates its own QP and > therefore you cannot tell it that the default OP is AND. > > Th

Re: field cross search in lucene

2010-11-30 Thread Anshum
You could change Occur.SHOULD to Occur.MUST for both fields. This should work for you if what I understood is what you wanted. -- Anshum Gupta http://ai-cafe.blogspot.com On Tue, Nov 30, 2010 at 5:12 PM, maven apache wrote: > Hi: I have two documents: > > title body > Luc

Re: field cross search in lucene

2010-11-30 Thread maven apache
2010/11/30 Ian Lea > Sounds like you need to call setDefaultOperator(AND_OPERATOR). See > the javadocs for MultiFieldQueryParser constructors. They give > examples. > > Personally I tend to get confused with all the Should/Must/And/Or > combinations with searches on multiple terms across multip

Re: precision and recall in lucene

2010-11-30 Thread Robert Muir
On Mon, Nov 29, 2010 at 8:01 AM, Yakob wrote: > hello all > I was wondering, if I want to measure precision and recall in lucene > then what's the best way for me to do it? is there any sample cource > code that I can use? > Have a look at contrib/benchmark under the org.apache.lucene.benchmark.q

Re: Analyzer

2010-11-30 Thread Erick Erickson
WhitespaceAnalyzer does just that, splits the incoming stream on white space. >From the javadocs for StandardAnalyzer: A grammar-based tokenizer constructed with JFlex This should be a good tokenizer for most European-language documents: - Splits words at punctuation characters, removing pun

Re: field cross search in lucene

2010-11-30 Thread maven apache
Hi: Sorry for later reply,I was just doing some tests. Using the QueryParser.setDefaultOperator(..) do work ,but I found it just for English character searching: I have make a test,and compare three manner: 1)MultiFieldQueryParser.parse. 2)Set the Operator of MultiFieldQueryParser.. 3)Using a

Retrieving payload attribute in highlighter

2010-11-30 Thread Fabiano Nunes
Hello, I'm trying to retrieve payloads from the highlighteds terms by Highlighter class. In my tests, all terms returned from Highlighter has null as payload. Example: Highlighter h = new Highlighter(new Formatter() { public String highlightTerm(String originalText, TokenGroup tokenGroup) { Token

Re: precision and recall in lucene

2010-11-30 Thread Yakob
On 11/30/10, Robert Muir wrote: > > Have a look at contrib/benchmark under the > org.apache.lucene.benchmark.quality package. > There is code (for example > org.apache.lucene.benchmark.quality.trec.QueryDriver) that can run an > experiment and output what you need for trec_eval.exe > I think ther

Re: precision and recall in lucene

2010-11-30 Thread Robert Muir
On Tue, Nov 30, 2010 at 10:46 AM, Yakob wrote: > can you tell me what went wrong? what is the difference between > topicsFile and qrelsFile anyway? > well its hard to tell what you are supplying as topics and qrels. have a look at /src/lia/benchmark in the LIA2 sample code: it has an example top

Re: Retrieving payload attribute in highlighter

2010-11-30 Thread Fabiano Nunes
I've figured out the PayloadSpanUtil class. It's exactly what I'm expecting. But, I'm concerned about the warning message in API docs (indeed, I think I dont understand it). There is any other approach? Can I have the same results retrieving the termPositions without performance issues? Thanks. O

Re: Retrieving payload attribute in highlighter

2010-11-30 Thread Erick Erickson
That's the Lucene developers' way of saying "we don't guarantee backwards compatibility". The devs go to great lengths to honor the contract of not changing public APIs without going through a deprecation process, which causes quite a lot of work. But that conflicts with the desirable process of h

Keyword extraction from pdf to text

2010-11-30 Thread McGibbney, Lewis John
Hello list, I am currently attempting to extract keywords from pdf documents, my aim is then to begin constructing a domain ontology using the words which are extracted. I do not need to index anything at this stage, but wish to extract and push the output as plain text into a text file. An exa

Re: Keyword extraction from pdf to text

2010-11-30 Thread Ian Lea
If I've understood you correctly, you want to pump text into a lucene Analyzer and grab the output and do something else with that. If that is right, you can use code based on something like this: for (String s : array-of-input-texts) { Analyzer anl = new xxxAnalyzer(whatever)

Re: Retrieving payload attribute in highlighter

2010-11-30 Thread Fabiano Nunes
Ok. I'll go ahead. Just one more thing: the apidocs warning says "(...) IndexReader should only contain doc of interest, best to use MemoryIndex (..)". How can I build a reader with a subset of docs? Thanks! On Tue, Nov 30, 2010 at 2:31 PM, Erick Erickson wrote: > That's the Lucene developers' w

term vector - WITH_POSITIONS_OFFSETS vs YES in terms of search performance

2010-11-30 Thread Maricris Villareal
Hi, Could someone tell me the effect (if any) of having term vectors set to WITH_POSITIONS_OFFSETS vs YES in terms of search performance? I did some testing and the results were inconclusive. In one case, WITH_POSITIONS_OFFSETS was searched faster than YES, in all others, it was the reverse. Is

Re: term vector - WITH_POSITIONS_OFFSETS vs YES in terms of search performance

2010-11-30 Thread Michael McCandless
The performance impact should only be at indexing time, unless you actually retrieve the vectors for some number of hits at search time. Mike On Tue, Nov 30, 2010 at 2:28 PM, Maricris Villareal wrote: > Hi, > > Could someone tell me the effect (if any) of having term vectors set to > WITH_POSITI

Re: Retrieving payload attribute in highlighter

2010-11-30 Thread Erick Erickson
Warning, ignorance alert. I'm not all that up on the guts of this one. But take a look at MemoryIndex, there's an example there. The gist is that you create a MemoryIndex on the fly and index the doc in question into it, then you can get the IndexReader from the IndexSearcher associated with the M

problem with incremental update in lucene

2010-11-30 Thread Yakob
I am creating a program that can index many text files in different folder. so that's mean every folder that has text files get indexed and its index are stored in another folder. so this another folder acts like a universal index of all files in my computer. and I am using lucene to achieve this b

Twitter Search + big Hadoop, Dec. 8th at Seattle Scalability Meetup

2010-11-30 Thread Bradford Stephens
Greetings, The Seattle Scalability Meetup isn't slacking for the holidays. We've got an awesome lineup for Wed, December 8 at 7pm: http://www.meetup.com/Seattle-Hadoop-HBase-NoSQL-Meetup/ -Jake Mannix from Twitter will talk about the Twitter Search infrastructure (with distributed Lucene) -Chris

Wikileaks Iraq log

2010-11-30 Thread Seid Muhie
Dear all. Sorry if you found this question out off topic. I need to do some research for my Text processing course on the on the wikileaks iraq war log text documents but unable to get the data. anybody who can give me a hint please thank you. Seid M.