I am working on the Lucene... I have prepared the document about in-depth
indexing. Unfortunately I can't attach it to the mail due to site
constraint. But I can send it to your Personal Email ..
---
Sachin
-Original Message-
From: Ajani, Akil (Cognizant) [mailto:[EMAIL PROTECTED]
Sent:
>
> : The query you want is
> : name:[A TO C] name:[G TO K]
> : (each clause being SHOULD, or put another way, an implicit "OR" in
between.
> :
> : The problem may be how you analyze the name field... is it tokenized at
all?
> : If so, you might be matching on first, last, and middle names, and the
Hello Otis & all,
I benchmarked it only subjectively - typical FieldCache'ing sort was an overkill
for my humble server (now I give sharehound about 200M RAM for 14mln index which
takes about 12G on disk). When sorting (using FieldCache) the first time after
index change Lucene
takes the whole in
: The query you want is
: name:[A TO C] name:[G TO K]
: (each clause being SHOULD, or put another way, an implicit "OR" in between.
:
: The problem may be how you analyze the name field... is it tokenized at all?
: If so, you might be matching on first, last, and middle names, and the
: combinatio
I have been reading the lists for couple of week now, and I noticed people
asking about placing their indexes into a RDBMS. What is the advantage of
that?
So far lucene was able to solve all my problems, but I am curious how else
people are using it (especially with RDBMS).
TIA
Is it possible with Lucene to limit a proximity query to a phrase to
determine if two words are in the same phrase? Along the same train of
thoughts, is it possible to determine if two words in a same phrase are
separated by a word, or a list of words? Like for example Virus (some
other words)
Hi Tom,
The query you want is
name:[A TO C] name:[G TO K]
(each clause being SHOULD, or put another way, an implicit "OR" in between.
The problem may be how you analyze the name field... is it tokenized at all?
If so, you might be matching on first, last, and middle names, and the
combination of
: Another quick question on the score. If my custom Query is returning a score
: that can be any value, and this custom Query is being used together with
: other standard Query in a BooleanQuery. How do I ensure the value return by
: the custome Query doesnt 'overshadow' the values return by other
Rejiv,
You may want to take a look at http://akismet.com/development/ - I don't
believe it's open source but it may be worth looking into.
Regards,
Bruce Ritchie
> -Original Message-
> From: Rajiv Roopan [mailto:[EMAIL PROTECTED]
> Sent: Wednesday, October 04, 2006 4:32 PM
> To: ja
> I was wondering if anyone knows of an open source
> spam filter that I can add to my project to scan
> the posts (which are just plain text) for spam?
I am not aware of any (which does not mean there is none), but just wanted
to draw your attention to a related discussion
http://www.nabble.com/
Hi -
I'm having a bit of trouble building a query to match a range of
values in a field that is not continuous.
For an example, say I want to find all people with last names
starting with A-C, and G-K.
If I use MUST on each element of the range, then I get nothing. This
I think I understan
Hello, I'm currently running a site which allows users to post. Lately posts
have been getting out of hand. I was wondering if anyone knows of an open
source spam filter that I can add to my project to scan the posts (which are
just plain text) for spam?
thanks in advance.
Rajiv
Chris, thanks again for your reply. Really appreciate your help.
Another quick question on the score. If my custom Query is returning a score
that can be any value, and this custom Query is being used together with
other standard Query in a BooleanQuery. How do I ensure the value return by
the cu
: (1) Should values returned by DocValues (return from ValueSource) must
: always betwen 1.0 and 0.0 ? How is this value affect the overall document
: scores, assuming there are others Query clauses as well that is perform on
: the document (on other fields).
The "values" returned by the various
Hi.
I've been wondering if anyone has tried to compare the performance of
any 'native' Java DB as index storage mechanism vs Lucene custom
implementation? I'm assuming that DB products should provide some
functionality for 'free' right out of the box (correct, if I'm wrong):
- easily managabl
On Wed, Oct 04, 2006 at 01:55:06PM +, eks dev said:
> have you considered hadoop "light" mesagging RPC, should have
> significantly smaller latencies than RMI
Yes, it's one of the things I'm looking at.
-
To unsubscribe, e-
Erick, thanks for your reply.
I have the LIA. But the sorting is not the solution I am looking for. As if
I sort, I will lose out the relevancy from searches of other fields. I want
the number proximity to be one in many of the fields that is searched. So
the "num" field will contribute to the ov
have you considered hadoop "light" mesagging RPC, should have significantly
smaller latencies than RMI
- Original Message
From: Simon Wistow <[EMAIL PROTECTED]>
To: java-user@lucene.apache.org
Sent: Wednesday, 4 October, 2006 3:26:38 PM
Subject: Re: Searching documents on big index by u
> Prelimary experimentation with a RemoteSearch/ParallelMultiSearcher
> combination found that there were issues with the RMI causing
> significant blocking.
>
> I'm currently playing around with trying alternative messaging
> approaches so that I can also load balance requests.
Wow, it is very i
On Wed, Oct 04, 2006 at 08:14:38AM -0400, Haines, Ronald C. (LNG-DAY) said:
> I too am interested in learning more about a large scale distributed
> Lucene model.
I'm also building a large scale (billions of documents) Lucene index.
Prelimary experimentation with a RemoteSearch/ParallelMultiSear
Indeed, I am using a bit complex Query (4 fields with OR).
My index has fields Title, Sub-title, Content, Author.
And search them by one query like as web search engine.
Thank you for details about weight.
So I need to avoid remote calls to rewrite() and docFreq().
I'll try to make Hits object
My index increases periodically.
Now 1 sec for 10G indexes.
I am worried that futurely, how about response time
for 20G, 30G,,, and 50G indexes?
I'll try remote Hits (result set) object and the SearchMaster merges
top N of them.
Thank you.
Erick Erickson wrote:
OK, you're now officially bey
Keep in mind, that depending on your queries (lots of terms, wildcards,
date ranges), you can spend quite a bit of time during the 'Weight'
calculation...this all happens pre-search. During the Weight
calculation, you will be making remote calls to the rewrite() and
docFreq() methods. There will
One helpful thing to do is call IndexWriter.setInfoStream(...) and
save the resulting output. This prints details about which segments
were merged, and what the merged segment name is. This might provide
some useful details for example was your deleted segments file one
that was just merged away
Hes Siemelink wrote:
> It happens from time to time... but I don't know how to reproduce it.
>
> Rebuilding this particular index unfortunately takes about 10 hrs, so
it's
> not feasable to delete the index and rebuild it when this happens...
our
> users would be missing a lot of search result
Don't interpret my reponses as *recommending* a database, since I don't know
much about your problem space. It may or may not be the right choice.
Mostly, I was thinking that your particular use of lucene as stated wasn't
playing to lucene's strengths.
It may well be that lucene is a fine choice
OK, you're now officially beyond my competence, so I'll have to wait for
people who actually know
Although if I read your stats right, you're getting approximately 1 sec
response time over 10M documents on a 10G index. That's not bad at all. What
kind of response time do you need?
On 10/3/0
Sorry if this is a re-post, but I got an "undeliverable" error last time I
tried to post it, something about SPAM. The nerve of that filter!
I don't have my book handy, but you might want to check out "Lucene In
Action". There's an example of how to create an index of restaurants
Hi Mike, thank you for your detailed reply. I put my answers inline.
On 10/4/06, Michael McCandless <[EMAIL PROTECTED]> wrote:
Hes Siemelink wrote:
> It happens from time to time... but I don't know how to reproduce it.
>
> Rebuilding this particular index unfortunately takes about 10 hrs, so
I've started to look into this (and the whole javacc syntax) I'll keep
you posted on my results.
Patrick
Erik Hatcher wrote:
Currently AND/OR/NOT are hardcoded into the .jj file. A patch to make
this configurable would be welcome!
Erik
On Oct 3, 2006, at 11:15 AM, Patrick Turcotte w
Hes Siemelink wrote:
It happens from time to time... but I don't know how to reproduce it.
Rebuilding this particular index unfortunately takes about 10 hrs, so it's
not feasable to delete the index and rebuild it when this happens... our
users would be missing a lot of search results then!
The
On Oct 4, 2006, at 2:18 AM, Ronnie Kolehmainen wrote:
Wouldn't the easiest fix be to just alter the users query string
before passing
it to queryparser (moving the semantics of your search app outside
of lucene)?
Something like:
str.replaceAll(" ET ", " AND ").replaceAll(" OU ", " OR
").
It happens from time to time... but I don't know how to reproduce it.
Rebuilding this particular index unfortunately takes about 10 hrs, so it's
not feasable to delete the index and rebuild it when this happens... our
users would be missing a lot of search results then!
There are a couple of wor
Once I got same problem and following Jira not alone. I deleted index
and rebuild it from source again and problem was gone. Im unable to
reproduce it. Are you able to reproduce the problem?
Karel
java.io.FileNotFoundException: /lucene-indexes/mediafragments/_8km.fnm
(No
---
Thanks, Erick! I'll try to use LIKE query to database.
Hi all
I'm having trouble with FileNotFoundException that pops up every once and a
while. Everything works fine in my application (description below), but
after running for some time (eg. 20 hours) an exception like this one may
occur:
java.io.FileNotFoundException: /lucene-indexes/mediafragment
Thanks Chis.
After spending half a day to "really" look into FunctionQuery (and related
classes), and re-reading about Weight and Scorer. I think I am beginning to
understand a bit. But more questions.
(1) Should values returned by DocValues (return from ValueSource) must
always betwen 1.0 and 0
37 matches
Mail list logo