Lucene 2.9.0 / BooleanQuery problem

2009-10-28 Thread Michel Nadeau
Hi ! I spent all night trying to get a simple BooleanQuery working and I really can't figure out what is my problem. See this very simple program : public class test { @SuppressWarnings("deprecation") public static void main(String[] args) throws ParseException, CorruptIndexException, Lo

Re: Lucene 2.9.0 / BooleanQuery problem

2009-10-28 Thread Michel Nadeau
which hit nothing, because this term may be stop-listed out of > your index! > > Can you run the test again with no stop words in your query, and see what > it > gives? > > -jake > > On Wed, Oct 28, 2009 at 7:12 PM, Michel Nadeau wrote: > > > Hi ! > > > &g

ChainedFilter in Lucene 2.9

2009-11-19 Thread Michel Nadeau
Hi ! Can someone tell me what is replacing ChainedFilter in Lucene 2.9? I used to do it like this - h = searcher.search(q, cluCF, cluSort); Where cluCF is a ChainedFilter declared like this - Filter cluCF = new ChainedFilter(cluFilters, ChainedFilter.AND); cluFilters is a Filter[] containing

Performance problems with Lucene 2.9

2009-11-30 Thread Michel Nadeau
Hi, we use Lucene to store around 300 millions of records. We use the index both for conventional searching, but also for all the system's data - we replaced MySQL with Lucene because it was simply not working at all with MySQL due to the amount or records. Our problem is that we have HUGE perform

Re: Performance problems with Lucene 2.9

2009-11-30 Thread Michel Nadeau
erything, not caring for scores). > > Shai > > On Mon, Nov 30, 2009 at 5:47 PM, Michel Nadeau wrote: > > > Hi, > > > > we use Lucene to store around 300 millions of records. We use the index > > both > > for conventional searching, but also for all the s

Re: Performance problems with Lucene 2.9

2009-11-30 Thread Michel Nadeau
gt; You can add clauses w/ OR, AND, NOT etc. > > > > Note that in Lucene 2.9, you can avoid scoring documents very easily, > > which > > is a performance win if you don't need scores (i.e. if you just want to > > match everything, not caring for scores). > >

Re: Performance problems with Lucene 2.9

2009-11-30 Thread Michel Nadeau
t; Uwe > > - > Uwe Schindler > H.-H.-Meier-Allee 63, D-28213 Bremen > http://www.thetaphi.de > eMail: u...@thetaphi.de > > > > -Original Message- > > From: Michel Nadeau [mailto:aka...@gmail.com] > > Sent: Monday, November 30, 2009 5:10 PM > > To:

Re: Performance problems with Lucene 2.9

2009-11-30 Thread Michel Nadeau
first used, so second+ queries should be faster. The > Wiki has some timing/speedup advice. > > Best > Erick > > > On Mon, Nov 30, 2009 at 11:10 AM, Michel Nadeau wrote: > > > What is the main difference between Hits and Collectors? > > > > - Mike > > a

Re: Performance problems with Lucene 2.9

2009-11-30 Thread Michel Nadeau
> > > > If you do not sort at all and do not score your results, TopDocs is not > > very > > useful, because the first 200 hits cannot be ranked. > > > > - > > Uwe Schindler > > H.-H.-Meier-Allee 63, D-28213 Bremen > > http://www.thetaphi.d

Re: Performance problems with Lucene 2.9

2009-11-30 Thread Michel Nadeau
); Thanks! - Mike aka...@gmail.com On Mon, Nov 30, 2009 at 12:06 PM, Michel Nadeau wrote: > I'm currently trying something like this - > > TopFieldDocs tfd = searcher.search(new MatchAllDocsQuery(), cluCF, 200, > cluSort); > > cluCF = filters > cluSort = sorts > > N

Re: Performance problems with Lucene 2.9

2009-11-30 Thread Michel Nadeau
into the first 200 hits (if n=200). > > > > > > > If you use Sort, the returned > > > > TopDocs will be sorted. > > > > > > > > If you do not sort at all and do not score your results, TopDocs is > > not > > > > very > >

Moving to Lucene 3.0

2009-11-30 Thread Michel Nadeau
Hi ! I'm trying to fix my code to remove everything that is deprecated in order to move to Lucene 3.0. I fixed many many items but I can't find the answer to some answers. See items in red below: *#1. Opening an index* *idx = FSDirectory.getDirectory(new File(INDEX)); reader = IndexReader.open(

Returns nothing when sorting

2009-12-10 Thread Michel Nadeau
Hi ! I have a quite small Lucene 3.0.0 index with around 400,000 documents in it. I'm trying to sort my results like this : TopDocs td; td = searcher.search(q, cluCF, 10, cluSort); ScoreDoc[] hits = td.scoreDocs; My cluCF is a ChainedFilter containing at least one filter, and cluSort is a float

Re: Returns nothing when sorting

2009-12-11 Thread Michel Nadeau
By the way the same search + filter combination but with a sort on another field (string) works. It seems only the float sort isn't working. The float sort is working correctly in other conditions though. I'm very puzzled ! - Mike aka...@gmail.com On Fri, Dec 11, 2009 at 2:52 AM, Mic

Lower/Uppercase problem when searching in a not-analyzed field

2009-12-14 Thread Michel Nadeau
Hi ! My Lucene 3.0.0 index contains a field "DOMAIN" that contains an Internet domain name - like * www.DomainName.com * www.domainname.com * www.DomainName.com/path/to/document/doc.html?a=2 This field is indexed like this - doc.add(new Field("DOMAIN", sValue, Field.Store.YES, Field.Index.NOT_A

Tokenized fields in Lucene 3.0.0

2009-12-15 Thread Michel Nadeau
Hi, I just realized that since I upgraded from Lucene 2.x to 3.0.0 (and removed all deprecated things), searches like that don't work anymore: test AND blue test NOT blue (test AND blue) OR red etc. Before 3.0.0, I was inserting my fields like this: doc.add(new Field("content", sValues[j], Fiel

Re: Tokenized fields in Lucene 3.0.0

2009-12-15 Thread Michel Nadeau
ark Miller wrote: > Any more info to share? > > In 2.9, Tokenized literally == Analyzed. > >/** @deprecated this has been renamed to {...@link #ANALYZED} */ >public static final Index TOKENIZED = ANALYZED; > > Michel Nadeau wrote: > > Hi, > > >

Re: Tokenized fields in Lucene 3.0.0

2009-12-15 Thread Michel Nadeau
Forget it - I found the problem. There was an escaping problem on the search-client side. Sorry about that. - Mike aka...@gmail.com On Tue, Dec 15, 2009 at 3:48 PM, Michel Nadeau wrote: > I search like this - > > IndexReader reader = IndexReader.open(idx, true); > IndexSearc

ConstantScoreQuery without filters

2010-02-11 Thread Michel Nadeau
Hi, I use ConstantScoreQuery to find all documents in an index like this: td = searcher.search(new ConstantScoreQuery(cluCF), null, md, cluSort); * cluCF is a Filter * md is int = 999 * cluSort is a Sort My problem is that I don't always have a filter (cluCF) - so sometimes its value is 'null'

Re: ConstantScoreQuery without filters

2010-02-11 Thread Michel Nadeau
I think I solved my problem - used MatchAllDocsQuery() - is that the best solution ? - Mike aka...@gmail.com On Thu, Feb 11, 2010 at 3:50 PM, Michel Nadeau wrote: > Hi, > > I use ConstantScoreQuery to find all documents in an index like this: > > td = searcher.search(new Con

Lucene Challenge - sum, count, avg, etc.

2010-03-31 Thread Michel Nadeau
Hi, We're currently in the process of switching many of our screens from MySQL to Lucene because MySQL simply dies because we have too much data and it's becoming too long to generate the stats we need. So here's one MySQL query that we use to find out our Top 10 Affiliates : SELECT SUM(sale_amo

Re: Lucene Challenge - sum, count, avg, etc.

2010-04-01 Thread Michel Nadeau
minutes > DBSight customer, a shopping comparison site, (anonymous per request) got > 2.6 Million Euro funding! > > > > > Michel Nadeau wrote: > >> Hi, >> >> We're currently in the process of switching many of our screens from MySQL >> to Lucene b

Re: Lucene Challenge - sum, count, avg, etc.

2010-04-01 Thread Michel Nadeau
> I too am trying to achieve something. > > I am thinking of storing the integer values in payloads and then > using spanquery classes to compute the respective SUMs > > -Prasen > > On Thu, Apr 1, 2010 at 6:47 AM, Michel Nadeau wrote: > > Hi, > > > > We&

Re: Lucene Challenge - sum, count, avg, etc.

2010-04-01 Thread Michel Nadeau
use case. > > Again didn't get the "sorting" part. SUM() will return only 1 > aggregated value, so what do you want to sort it on ? > > -Prasen > > On Thu, Apr 1, 2010 at 7:44 AM, Michel Nadeau wrote: > > Are you planning to be able to sort by these SUMs? A

Re: Lucene Challenge - sum, count, avg, etc.

2010-04-01 Thread Michel Nadeau
es" ) aren't huge, > sorting can probably be done as a post-process. > > Still dont see any need of joins here. > > > On Thu, Apr 1, 2010 at 7:16 PM, Michel Nadeau wrote: > > Hi, > > > > Here's an example of raw data that would be in my Sales index

Re: Lucene Challenge - sum, count, avg, etc.

2010-04-01 Thread Michel Nadeau
ate_Lucene_Database_Search_in_3_minutes >>> DBSight customer, a shopping comparison site, (anonymous per request) got >>> 2.6 Million Euro funding! >>> >>> >>> prasenjit mukherjee wrote: >>> >>> >>>> This looks like a use case more

Re: Lucene Challenge - sum, count, avg, etc.

2010-04-01 Thread Michel Nadeau
emo: http://search.dbsight.com > Lucene Database Search in 3 minutes: > http://wiki.dbsight.com/index.php?title=Create_Lucene_Database_Search_in_3_minutes > DBSight customer, a shopping comparison site, (anonymous per request) got > 2.6 Million Euro funding! > > > > Michel Nadea

"Natural sorting" of documents in a Lucene index - possible?

2010-08-16 Thread Michel Nadeau
Hi, we are building an application using Lucene and we have HUGE data sets (our index contains millions and millions and millions of documents), which obviously cause us very important problems when sorting. In fact, we disabled sorting completely because the servers were just exploding when tryin

Re: "Natural sorting" of documents in a Lucene index - possible?

2010-08-17 Thread Michel Nadeau
ing and what > you're trying to return as well as how you're measuring before > we can say much > > Along with how much memory you're giving your JVM to work with, > what "exploding" means. Are you CPU bound? IO bound? Swapping? > You need to characterize

Re: "Natural sorting" of documents in a Lucene index - possible?

2010-08-17 Thread Michel Nadeau
e to each document, > that's > also a memory hog. Not to mention whether capitalization counts. > > You might enumerate the terms in your index for each of the sortable fields > to figure out what the total number of unique terms each is and use that as > a basis for reducing

Re: "Natural sorting" of documents in a Lucene index - possible?

2010-08-17 Thread Michel Nadeau
he price though by > having to change your queries and sorts to respect all 6 fields... > > But I'd only really go there after seeing if other options don't work. > > > Best > Erick > > On Tue, Aug 17, 2010 at 3:35 PM, Michel Nadeau wrote: > > > Would our a

Re: "Natural sorting" of documents in a Lucene index - possible?

2010-08-18 Thread Michel Nadeau
m On Tue, Aug 17, 2010 at 4:08 PM, Ian Lea wrote: > Using NumericField for dates and other numbers is likely to help a > lot, and removes padding problems. I'd try that first, or just sort > the top n hits yourself. > > > -- > Ian. > > > On Tue, Aug 17, 2010

Re: "Natural sorting" of documents in a Lucene index - possible?

2010-08-18 Thread Michel Nadeau
Alpha test > > 4 120 Charlie test > > > > Already sorted on the Count. > > > > Thanks! > > > > - Mike > > aka...@gmail.com > > > > > > On Tue, Aug 17, 2010 at 4:08 PM, Ian Lea wrote: > > > >> Using NumericF

Re: "Natural sorting" of documents in a Lucene index - possible?

2010-08-18 Thread Michel Nadeau
, Aug 18, 2010 at 11:26 AM, Michel Nadeau wrote: > Thanks ! > > - Mike > aka...@gmail.com > > > On Wed, Aug 18, 2010 at 10:37 AM, Ian Lea wrote: > >> > But - to come back to my original question... is there any way to have a >> > "n

Lucandra - Any experiences?

2010-08-23 Thread Michel Nadeau
Hi, we are currently considering to switch from Lucene + Cassandra to *Lucandra*, mainly for the following reasons: * Ability to have many threads writing in the same index at the same time; * Live results without the need to close/re-open the index reader; * Easy scaling to many nodes thanks to

Re: Lucandra - Any experiences?

2010-09-03 Thread Michel Nadeau
of views from the community. Good or > bad, i'd love to hear experiences with it. > > Jordon > > On Aug 23, 2010, at 12:21 PM, Michel Nadeau wrote: > > > Hi, > > > > we are currently considering to switch from Lucene + Cassandra to > *Lucandra*, >

Re: Lucandra - Any experiences?

2010-09-03 Thread Michel Nadeau
Yeah, exactly... it seems absolutely no one know Lucandra. - Mike aka...@gmail.com On Fri, Sep 3, 2010 at 11:06 AM, Jordon Saardchit wrote: > Hence my reluctance :) > > Jordon > > On Sep 3, 2010, at 5:44 AM, Michel Nadeau wrote: > > > Anyone? > > > > - Mike