Re: SIGSEGV in JCCEnv::setClassPath

2011-10-25 Thread Stein, Ruben
Moin Uwe, vielen Dank für den Hinweis. Beste Grüße aus Bremen, nach Bremen - Ruben -Original Message- From: Uwe Schindler Reply-To: Date: Tue, 25 Oct 2011 08:56:53 +0200 To: Subject: RE: SIGSEGV in JCCEnv::setClassPath >Hi Ruben, > >This mailing list is about Lucene Core (Java), whic

Re: reusing the term-frequency count while indexing

2011-10-25 Thread Simon Willnauer
On Tue, Oct 25, 2011 at 5:08 AM, prasenjit mukherjee wrote: > Thats exactly I was trying to avoid :( > > I can afford to do that during indexing time, but it will be > time-consuming to do that at search time. hu? I don't understand, if you provide the terms at indexing time lucene keeps track of

Re: setting up lucene for use on mac OSX

2011-10-25 Thread janwen
What classes miss? check you jre settings. 2011-10-25 janwen | China website : http://www.qianpin.com/ From:Daniel Quach Date:2011-10-25 12:09 Subject:setting up lucene for use on mac OSX To:java-user Cc: Hi all, I am unable to get the lucene demo to run on my macbook pro. I downloade

Re: reusing the term-frequency count while indexing

2011-10-25 Thread prasenjit mukherjee
On Tue, Oct 25, 2011 at 1:17 PM, Simon Willnauer wrote: > On Tue, Oct 25, 2011 at 5:08 AM, prasenjit mukherjee > wrote: >> Thats exactly I was trying to avoid :( >> >> I can afford to do that during indexing time, but it will be >> time-consuming to do that at search time. > > hu? I don't underst

Re: reusing the term-frequency count while indexing

2011-10-25 Thread Rene Hackl-Sommer
Use term boosts? "solr^3 rocks^2 apache" http://lucene.apache.org/java/3_4_0/queryparsersyntax.html#Boosting%20a%20Term Am 25.10.2011 11:19, schrieb prasenjit mukherjee: During search time I get the following input ( only for 1 field ) = "solr:3 rocks:2 apache:1" . For this I have to create the

Re: reusing the term-frequency count while indexing

2011-10-25 Thread prasenjit mukherjee
Thanks, this is helpful. Is the affect ( in ranking ) gonna be the same as passing multiple terms ? I will try it out definitely. On Tue, Oct 25, 2011 at 3:21 PM, Rene Hackl-Sommer wrote: > Use term boosts? "solr^3 rocks^2 apache" > > http://lucene.apache.org/java/3_4_0/queryparsersyntax.html#Bo

"AND" Query "under the hood" ?

2011-10-25 Thread sol myr
Hi, Could I please ask another question regarding Lucene "under the hood" / performance. I wondered how "AND" queries are implemented? Say we query for "+hello +world". Would Lucene simply find 2 lists of documents ("documents containing HELLO",  and "documents containing WORLD"), and then in

Re: "AND" Query "under the hood" ?

2011-10-25 Thread Simon Willnauer
On Tue, Oct 25, 2011 at 2:18 PM, sol myr wrote: > Hi, > > Could I please ask another question regarding Lucene "under the hood" / > performance. > > I wondered how "AND" queries are implemented? > Say we query for "+hello +world". > Would Lucene simply find 2 lists of documents ("documents contai

Re: Bet you didn't know Lucene can...

2011-10-25 Thread mark harwood
>>using Lucene that don't fit under the core premise of full text search  I've had several use cases over the years that use features peculiar to Lucene but here's a very simple one I came across today that illustrates its raw index lookup capability: I needed a fast, scalable and persistent "S

Re: Bet you didn't know Lucene can...

2011-10-25 Thread Erik Hatcher
At the group where I worked at UVa once upon a time, a coworker built Juxta, this way cool tool to diff multiple versions of a document visually with heat maps and "difference"-o-meters, and it leverages Lucene analyzers to extract words and positions and such. You can find it here: http://www.

Possible to do an indexorder sort over a MultiSearcher?

2011-10-25 Thread Alexander Devine
Hi all, I'm an trying to provide a way to efficiently allow a client to page over all of the documents in multiple Lucene indexes that I'm querying with a MultiSearcher (~1-2 million docs). Unfortunately, I can't use the standard paging algorithm of getting TopDocs to the last record needed and th

Re: Possible to do an indexorder sort over a MultiSearcher?

2011-10-25 Thread Uwe Schindler
Hi, MultiReader is the way to go. MultiSearcher is broken and therefore deprecated. See javadocs since Lucene 3.1. Uwe -- Uwe Schindler H.-H.-Meier-Allee 63, 28213 Bremen http://www.thetaphi.de Alexander Devine schrieb: Hi all, I'm an trying to provide a way to efficiently allow a client t

Re: Possible to do an indexorder sort over a MultiSearcher?

2011-10-25 Thread Uwe Schindler
Hi, Additionally, since the latest 3.x version (not sure if its already in 3.4), there is a new searchAfter method in IndexSearcher that allows deep paging. As MultiSearcher is deprecated, it is not supported there, so use MultiReader with IndexSearcher. Uwe -- Uwe Schindler H.-H.-Meier-Allee

Re: Bet you didn't know Lucene can...

2011-10-25 Thread Grant Ingersoll
On Oct 25, 2011, at 11:26 AM, mark harwood wrote: >>> using Lucene that don't fit under the core premise of full text search > > I've had several use cases over the years that use features peculiar to > Lucene but here's a very simple one I came across today that illustrates its > raw index l

Re: Bet you didn't know Lucene can...

2011-10-25 Thread Dawid Weiss
Avg lookup time slightly less than a HashSet? Interesting. Is the code to these benchmarks available somewhere? Dawid On Tue, Oct 25, 2011 at 9:57 PM, Grant Ingersoll wrote: > > On Oct 25, 2011, at 11:26 AM, mark harwood wrote: > using Lucene that don't fit under the core premise of full te

Re: Bet you didn't know Lucene can...

2011-10-25 Thread Mark Harwood
> Avg lookup time slightly less than a HashSet? Interesting. Yep, HashSet comparison was a surprise to me too. I threw it in as a datapoint for what I thought would be the fastest option on the example dataset but clearly not a long-term answer to my problem as it costs so much in RAM. Lucene s

Re: Bet you didn't know Lucene can...

2011-10-25 Thread Dawid Weiss
> Lucene started out at an avg 3ms but subsequent runs took it down > dramatically due to OS file caching. The all-in-memory hashset implementation > clearly did not demonstrate the same speed ups between runs. I don't say the benchmark was wrong or anything, but this is surprising. I mean, the