Hi !
I spent all night trying to get a simple BooleanQuery working and I really
can't figure out what is my problem. See this very simple program :
public class test {
@SuppressWarnings("deprecation")
public static void main(String[] args) throws ParseException,
CorruptIndexException, Lo
which hit nothing, because this term may be stop-listed out of
> your index!
>
> Can you run the test again with no stop words in your query, and see what
> it
> gives?
>
> -jake
>
> On Wed, Oct 28, 2009 at 7:12 PM, Michel Nadeau wrote:
>
> > Hi !
> >
> &g
Hi !
Can someone tell me what is replacing ChainedFilter in Lucene 2.9?
I used to do it like this -
h = searcher.search(q, cluCF, cluSort);
Where cluCF is a ChainedFilter declared like this -
Filter cluCF = new ChainedFilter(cluFilters, ChainedFilter.AND);
cluFilters is a Filter[] containing
Hi,
we use Lucene to store around 300 millions of records. We use the index both
for conventional searching, but also for all the system's data - we replaced
MySQL with Lucene because it was simply not working at all with MySQL due to
the amount or records. Our problem is that we have HUGE perform
erything, not caring for scores).
>
> Shai
>
> On Mon, Nov 30, 2009 at 5:47 PM, Michel Nadeau wrote:
>
> > Hi,
> >
> > we use Lucene to store around 300 millions of records. We use the index
> > both
> > for conventional searching, but also for all the s
gt; You can add clauses w/ OR, AND, NOT etc.
> >
> > Note that in Lucene 2.9, you can avoid scoring documents very easily,
> > which
> > is a performance win if you don't need scores (i.e. if you just want to
> > match everything, not caring for scores).
> >
t; Uwe
>
> -
> Uwe Schindler
> H.-H.-Meier-Allee 63, D-28213 Bremen
> http://www.thetaphi.de
> eMail: u...@thetaphi.de
>
>
> > -Original Message-
> > From: Michel Nadeau [mailto:aka...@gmail.com]
> > Sent: Monday, November 30, 2009 5:10 PM
> > To:
first used, so second+ queries should be faster. The
> Wiki has some timing/speedup advice.
>
> Best
> Erick
>
>
> On Mon, Nov 30, 2009 at 11:10 AM, Michel Nadeau wrote:
>
> > What is the main difference between Hits and Collectors?
> >
> > - Mike
> > a
> >
> > If you do not sort at all and do not score your results, TopDocs is not
> > very
> > useful, because the first 200 hits cannot be ranked.
> >
> > -
> > Uwe Schindler
> > H.-H.-Meier-Allee 63, D-28213 Bremen
> > http://www.thetaphi.d
);
Thanks!
- Mike
aka...@gmail.com
On Mon, Nov 30, 2009 at 12:06 PM, Michel Nadeau wrote:
> I'm currently trying something like this -
>
> TopFieldDocs tfd = searcher.search(new MatchAllDocsQuery(), cluCF, 200,
> cluSort);
>
> cluCF = filters
> cluSort = sorts
>
> N
into the first 200 hits (if n=200).
> > >
> > > > If you use Sort, the returned
> > > > TopDocs will be sorted.
> > > >
> > > > If you do not sort at all and do not score your results, TopDocs is
> > not
> > > > very
> >
Hi !
I'm trying to fix my code to remove everything that is deprecated in order
to move to Lucene 3.0. I fixed many many items but I can't find the answer
to some answers. See items in red below:
*#1. Opening an index*
*idx = FSDirectory.getDirectory(new File(INDEX));
reader = IndexReader.open(
Hi !
I have a quite small Lucene 3.0.0 index with around 400,000 documents in it.
I'm trying to sort my results like this :
TopDocs td;
td = searcher.search(q, cluCF, 10, cluSort);
ScoreDoc[] hits = td.scoreDocs;
My cluCF is a ChainedFilter containing at least one filter, and cluSort is a
float
By the way the same search + filter combination but with a sort on another
field (string) works. It seems only the float sort isn't working. The float
sort is working correctly in other conditions though.
I'm very puzzled !
- Mike
aka...@gmail.com
On Fri, Dec 11, 2009 at 2:52 AM, Mic
Hi !
My Lucene 3.0.0 index contains a field "DOMAIN" that contains an Internet
domain name - like
* www.DomainName.com
* www.domainname.com
* www.DomainName.com/path/to/document/doc.html?a=2
This field is indexed like this -
doc.add(new Field("DOMAIN", sValue, Field.Store.YES,
Field.Index.NOT_A
Hi,
I just realized that since I upgraded from Lucene 2.x to 3.0.0 (and removed
all deprecated things), searches like that don't work anymore:
test AND blue
test NOT blue
(test AND blue) OR red
etc.
Before 3.0.0, I was inserting my fields like this:
doc.add(new Field("content", sValues[j], Fiel
ark Miller wrote:
> Any more info to share?
>
> In 2.9, Tokenized literally == Analyzed.
>
>/** @deprecated this has been renamed to {...@link #ANALYZED} */
>public static final Index TOKENIZED = ANALYZED;
>
> Michel Nadeau wrote:
> > Hi,
> >
>
Forget it - I found the problem. There was an escaping problem on the
search-client side.
Sorry about that.
- Mike
aka...@gmail.com
On Tue, Dec 15, 2009 at 3:48 PM, Michel Nadeau wrote:
> I search like this -
>
> IndexReader reader = IndexReader.open(idx, true);
> IndexSearc
Hi,
I use ConstantScoreQuery to find all documents in an index like this:
td = searcher.search(new ConstantScoreQuery(cluCF), null, md, cluSort);
* cluCF is a Filter
* md is int = 999
* cluSort is a Sort
My problem is that I don't always have a filter (cluCF) - so sometimes its
value is 'null'
I think I solved my problem - used MatchAllDocsQuery() - is that the best
solution ?
- Mike
aka...@gmail.com
On Thu, Feb 11, 2010 at 3:50 PM, Michel Nadeau wrote:
> Hi,
>
> I use ConstantScoreQuery to find all documents in an index like this:
>
> td = searcher.search(new Con
Hi,
We're currently in the process of switching many of our screens from MySQL
to Lucene because MySQL simply dies because we have too much data and it's
becoming too long to generate the stats we need.
So here's one MySQL query that we use to find out our Top 10 Affiliates :
SELECT SUM(sale_amo
minutes
> DBSight customer, a shopping comparison site, (anonymous per request) got
> 2.6 Million Euro funding!
>
>
>
>
> Michel Nadeau wrote:
>
>> Hi,
>>
>> We're currently in the process of switching many of our screens from MySQL
>> to Lucene b
> I too am trying to achieve something.
>
> I am thinking of storing the integer values in payloads and then
> using spanquery classes to compute the respective SUMs
>
> -Prasen
>
> On Thu, Apr 1, 2010 at 6:47 AM, Michel Nadeau wrote:
> > Hi,
> >
> > We&
use case.
>
> Again didn't get the "sorting" part. SUM() will return only 1
> aggregated value, so what do you want to sort it on ?
>
> -Prasen
>
> On Thu, Apr 1, 2010 at 7:44 AM, Michel Nadeau wrote:
> > Are you planning to be able to sort by these SUMs? A
es" ) aren't huge,
> sorting can probably be done as a post-process.
>
> Still dont see any need of joins here.
>
>
> On Thu, Apr 1, 2010 at 7:16 PM, Michel Nadeau wrote:
> > Hi,
> >
> > Here's an example of raw data that would be in my Sales index
ate_Lucene_Database_Search_in_3_minutes
>>> DBSight customer, a shopping comparison site, (anonymous per request) got
>>> 2.6 Million Euro funding!
>>>
>>>
>>> prasenjit mukherjee wrote:
>>>
>>>
>>>> This looks like a use case more
emo: http://search.dbsight.com
> Lucene Database Search in 3 minutes:
> http://wiki.dbsight.com/index.php?title=Create_Lucene_Database_Search_in_3_minutes
> DBSight customer, a shopping comparison site, (anonymous per request) got
> 2.6 Million Euro funding!
>
>
>
> Michel Nadea
Hi,
we are building an application using Lucene and we have HUGE data sets (our
index contains millions and millions and millions of documents), which
obviously cause us very important problems when sorting. In fact, we
disabled sorting completely because the servers were just exploding when
tryin
ing and what
> you're trying to return as well as how you're measuring before
> we can say much
>
> Along with how much memory you're giving your JVM to work with,
> what "exploding" means. Are you CPU bound? IO bound? Swapping?
> You need to characterize
e to each document,
> that's
> also a memory hog. Not to mention whether capitalization counts.
>
> You might enumerate the terms in your index for each of the sortable fields
> to figure out what the total number of unique terms each is and use that as
> a basis for reducing
he price though by
> having to change your queries and sorts to respect all 6 fields...
>
> But I'd only really go there after seeing if other options don't work.
>
>
> Best
> Erick
>
> On Tue, Aug 17, 2010 at 3:35 PM, Michel Nadeau wrote:
>
> > Would our a
m
On Tue, Aug 17, 2010 at 4:08 PM, Ian Lea wrote:
> Using NumericField for dates and other numbers is likely to help a
> lot, and removes padding problems. I'd try that first, or just sort
> the top n hits yourself.
>
>
> --
> Ian.
>
>
> On Tue, Aug 17, 2010
Alpha test
> > 4 120 Charlie test
> >
> > Already sorted on the Count.
> >
> > Thanks!
> >
> > - Mike
> > aka...@gmail.com
> >
> >
> > On Tue, Aug 17, 2010 at 4:08 PM, Ian Lea wrote:
> >
> >> Using NumericF
, Aug 18, 2010 at 11:26 AM, Michel Nadeau wrote:
> Thanks !
>
> - Mike
> aka...@gmail.com
>
>
> On Wed, Aug 18, 2010 at 10:37 AM, Ian Lea wrote:
>
>> > But - to come back to my original question... is there any way to have a
>> > "n
Hi,
we are currently considering to switch from Lucene + Cassandra to *Lucandra*,
mainly for the following reasons:
* Ability to have many threads writing in the same index at the same time;
* Live results without the need to close/re-open the index reader;
* Easy scaling to many nodes thanks to
of views from the community. Good or
> bad, i'd love to hear experiences with it.
>
> Jordon
>
> On Aug 23, 2010, at 12:21 PM, Michel Nadeau wrote:
>
> > Hi,
> >
> > we are currently considering to switch from Lucene + Cassandra to
> *Lucandra*,
>
Yeah, exactly... it seems absolutely no one know Lucandra.
- Mike
aka...@gmail.com
On Fri, Sep 3, 2010 at 11:06 AM, Jordon Saardchit wrote:
> Hence my reluctance :)
>
> Jordon
>
> On Sep 3, 2010, at 5:44 AM, Michel Nadeau wrote:
>
> > Anyone?
> >
> > - Mike
37 matches
Mail list logo