I'd like to add a line count field to my indexed document. The obvious way
is to read my file twice, once to tokenize it and add it's content to a
field in the document and once to count the number of lines in it and add it
to another field.
Any idea how can I optimize this and read the file once?
: Very good points, I hadn't considered the term frequency of the digits
: affecting scoring. As an aside, can that aspect of the score be ignored for
: these fields?
The easiest way is to use a boost that is so low it's insignificant, or
you could subclass TermQuery and override getSimilarity t
Thanks Yonik, there they are
-Original Message-
From: Yonik Seeley [mailto:[EMAIL PROTECTED]
Sent: Tuesday, February 28, 2006 4:59 PM
To: java-user@lucene.apache.org
Subject: Re: Inside a Boolean Query
On 2/28/06, Seeta Somagani <[EMAIL PROTECTED]> wrote:
> Is there a way that I can dete
On 2/28/06, Seeta Somagani <[EMAIL PROTECTED]> wrote:
> Is there a way that I can detect the composition of a BooleanQuery,
> rather than just extract the individual terms?
Hi Seeta,
I think BooleanQuery.getClauses() might be what you are looking for.
-Yonik
Hi,
I need to return the context of the terms along with the results.
The approach that I'm using is to
1) detect what kind of query it is,
2) extract the terms of the query,
3) fetch the context of the individual terms, and
4) finally join them depending on the
Very good points, I hadn't considered the term frequency of the digits
affecting scoring. As an aside, can that aspect of the score be ignored for
these fields?
I need to spend more time with FunctionQuery, I haven't given it the
attention it deserves.
Great feedback, thanks for the notes.
-- j
Hoss,
Your observation about the spaces seems very likely. I hence, removed the
spaces, padded the numbers and also tried using the RangeFilter, but still I
got the same result. Upon closer inspection of my code, I found that I was
tokenizing the "id" field, which was rendering that field illeg
Michael -
Great thoughts, and thanks for the feedback.
Following on the Range Query approach, how is performance? I found the
range approach (albeit with the exact values) to be slower than the
parsed-string approach I posited.
On the custom scoring, is the distance element for sorting or as a
I'm in the same boat as Michael on this one. It's not a matter of finding
the right technology to do geo-locational calculations, but rather being
able to accomplish that task in conjunction with keyword search.
-- j
On 2/28/06, Bryzek.Michael <[EMAIL PROTECTED]> wrote:
>
> Our geo searches are
Our geo searches are combined with keyword searches. We previously performed
all of our queries in the database (Oracle 10g w/ interMedia for the
unstructured portion) but found that it was easier to scale search outside the
database than within.
-Original Message-
From: John Powers
I don't know if this matters, but we do all of our geolocating in sql
with decent speed. All the trig is in the query itself and then we can
limit top 5, top 10 etc for what we show.Is the data such that you
need lucene?Can I ask what causes it to be beyond a databases
ability?
-Orig
Jeff -
This is an interesting approach. On our end, we have experimented with
two variants:
Variant 1: Use Range Query
Rather than precomputing the boolean clauses yourself, index encoded
latitude and longitude values and use a Range Query. We encode by
adding 1000 to each of the values. Note: W
: Geo definition:
: Boxing around a center point. It's not critical to do a radius search with
: a given circle. A boxed approach allows for taller or wider frames of
: reference, which are applicable for our use.
if you are just loking to confine your results to a box then i think
RangeFilteri
I've been wrestling with a way to index and search data with a
geo-positional aspect. By a geo-positional search, I want to constrain
search results within a given location range. Furthermore, I want to allow
the user to set/change the geo-positional boundaries as needed for their
search. This i
: price and about 10 more additional fields. I want to not just find
: something in the index also I want to get the lists of all brands and
: price. The list of brands is needed for displaying all of the products
: and the quantity of products of this brand for certain search request.
1) iterati
there are a couple of things that could be happening that make your
results unexpected...
: But, when I enter the query - id: [104 TO 200] content: "Marbella
: EspaƱa" it's just returning me all the results while ignoring the range.
1) if you really have a space between the "id:" and the "[104 T
Otis Gospodnetic wrote:
Regarding performance fix - if you can be more precise (is it really
just more or less or is it as good as before), that would be great
for those of us itching to use 1.9.
To be more precise: The patch reduced the time required to build one large
index from 13 to 11 ho
Otis Gospodnetic wrote:
Regarding performance fix - if you can be more precise (is it really
> just more or less or is it as good as before), that would be great
> for those of us itching to use 1.9.
Yes, I can confirm that performance differs by no more than 3.1 fraggles.
;-)
--
Hi Eric,
Regarding performance fix - if you can be more precise (is it really just more
or less or is it as good as before), that would be great for those of us
itching to use 1.9.
Thanks,
Otis
- Original Message
From: Eric Jain <[EMAIL PROTECTED]>
To: java-user@lucene.apache.org
Sent:
Please ignore the ContextQueryParser... I dumped that and switched back to the
QueryParser which still gives me the same result.
Thanks.
Seeta Somagani
-Original Message-
From: Seeta Somagani [mailto:[EMAIL PROTECTED]
Sent: Tuesday, February 28, 2006 10:54 AM
To: java-user@lucene.apache
Hi,
My documents are in the following format.
doc.add(new Field ("id",page, Field.Store.YES, Field.Index.TOKENIZED));
doc.add(new Field ("content",fileContent.toString(), Field.Store.YES,
Field.Index.TOKENIZED, Field.TermVector.WITH_POSITIONS_OFFSETS));
I need to make a query on
Yes, the bottleneck is defenitely in lucene. The index is quite big, three
files with more than 1 giga.
We are querying for html extracts, with the id together, but it can return
almost 50 extracts for each ID, and just the first 2 will be used. We could
as well do 10 queries(that's the max number
Do you want the first 2 docs, regardless of score, with the same
property or do you want the 2 highest scoring docs with the same property?
You might look at the HitCollector search method on IndexSearcher. Btw,
the Filter that is required can be null. The HitCollector interface
allows you t
does anyone knows a solution for that?
I know theres a method that returns a TopDoc, but it needs a filter, and in
my case, Ill need the first 2 of each doc with the same value in a given
property.
On 27/02/06, emerson cargnin <[EMAIL PROTECTED]> wrote:
>
> Hi all
>
> Due a performance problem, I
On Feb 28, 2006, at 6:53 AM, Haritha_Parvatham wrote:
I believe that snowball uses porter stemming algorthim.
The Snowball stemmer came from Porter, yes. There was a looser
stemming algorithm prior to Snowball though, which the
PorterStemFilter uses.
Anyway
Is there is any
other a
On Feb 28, 2006, at 8:11 AM, Samuru Jackson wrote:
Also heed the other recommendations in LIA and don't necessarily use
Filters when BooleanQuery clauses will suffice. There is overhead
involved in the Filter mechanism in terms of executing multiple
queries to build all the filters you're prop
Anton Potehin wrote:
I have a problem.
There is an index, which contains about 6,000,000 records (15,000,000
will be soon) the size is 4GB. Index is optimized and consists of only
one segment. This index stores the products. Each product has brand,
price and about 10 more additional fields. I
> Also heed the other recommendations in LIA and don't necessarily use
> Filters when BooleanQuery clauses will suffice. There is overhead
> involved in the Filter mechanism in terms of executing multiple
> queries to build all the filters you're proposing.
I'm aware of the fact that using multip
Hi erik,
Iam sorry for using the same subject line.
I believe that snowball uses porter stemming algorthim.Is there is any
other alternative for snowball.Because I want stemmer which supports
multilingualism.
Please help me in configuring lucene step by step in my systemI have
downloaded lucene
Haritha - please do not hijack threads (meaning you're replying to a
message with one subject, but starting a new one with the same
"subject" line). Please create a brand new message to the list with
a new subject.
The SnowballAnalyzer is available in Lucene, which incorporates the
Snowb
Hi,
Lucene uses stemmer for supporting multilingualism.The stemming
algorthim differs from language to language.
Can you tell me how many different types of stemmer available & which
stemmer lucene supports.I believe it supports snowball stemmer.I have
downloaded the snowball stemmer .it support
On Feb 28, 2006, at 6:10 AM, Samuru Jackson wrote:
Hi again!
2) Use a QueryFilter with that same TermQuery, and apply
that Filter
to your search method.
Thanks for the hint - I just bought "Lucene in Action" and now I'm
more into it :-)
But at this point I'm facing some Filter pro
On Feb 28, 2006, at 6:14 AM, Haritha_Parvatham wrote:
Hi,
Is there some one to guide to deploy lucene 1.4.3.
Iam having lucene 1.4.3 sources.Please tell me the procedure to run
lucene in my system.Iam using windows as os.
First steps are to familiarize yourself with just what exactly Lucene
Hi,
Is there some one to guide to deploy lucene 1.4.3.
Iam having lucene 1.4.3 sources.Please tell me the procedure to run
lucene in my system.Iam using windows as os.
Thanks,
-Original Message-
From: Samuru Jackson [mailto:[EMAIL PROTECTED]
Sent: Tuesday, February 28, 2006 4:41 PM
Hi again!
> 2) Use a QueryFilter with that same TermQuery, and apply that Filter
> to your search method.
Thanks for the hint - I just bought "Lucene in Action" and now I'm
more into it :-)
But at this point I'm facing some Filter problems again.
As proposed in LiA the easiest way would
Daniel Naber wrote:
A fix has now been committed to trunk in SVN, it should be part of the next
1.9 release.
Performance seems to have recovered, more or less, thanks!
-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additiona
Hi all,
I need an official Maven2 package and wanted to build one. Then I saw
the following in the documentation of Maven:
"Maven partners
The following sites sync automatically their project repository with the
central one. If you want a project from any of this sites to be uploaded
to ibib
A few days ago someone on this list asked how to efficiently "update"
documents in the index, i.e.,
delete the old version of the document (found by some unique id field) and
add the new version.
The problem was that opening and closing the IndexReader and IndexWriter
after each document
was inef
I have a problem.
There is an index, which contains about 6,000,000 records (15,000,000
will be soon) the size is 4GB. Index is optimized and consists of only
one segment. This index stores the products. Each product has brand,
price and about 10 more additional fields. I want to not just find
so
39 matches
Mail list logo