Hi Mike,
Zoie itself doesn't do anything with the new with the distributed
side of things - it just plays nicely with it. Zoie, at its core,
exposes a couple of primary interfaces (well, this is a slightly
simplified form of them) :
interface IndexReaderFactory { List getIndexReaders(); },
Hi Jake,
Zoie looks like a a really cool project. I'd like to learn more about
the distributed part of the setup. Any way you could describe that
here or on the wiki?
-Mike
On Thu, Oct 8, 2009 at 9:24 PM, Jake Mannix wrote:
> On Thu, Oct 8, 2009 at 7:00 PM, Angel, Eric wrote:
>
>>
>> Does anyo
My deepest apologies for the spam, everyone. I slipped on my G-mail button :)
On Fri, Oct 9, 2009 at 9:09 PM, Bradford Stephens
wrote:
> Hey Eric,
>
> My consulting company specializes in scalable, real-time search with
> distributed Lucene. I'm more than happy to chat, if you'd like! :)
>
> Chee
Hey Eric,
My consulting company specializes in scalable, real-time search with
distributed Lucene. I'm more than happy to chat, if you'd like! :)
Cheers,
Bradford
On Thu, Oct 8, 2009 at 7:00 PM, Angel, Eric wrote:
>
> Does anyone have any recommendations? I've looked at Katta, but it doesn't
>
Great Scott (hah!) - please do report back, even if it just works fine and
you have no more questions, I'd like to know whether this really is
what you were after and actually works for you.
Note that the FieldCache is kinda "magic" - it's lazy (so the first query
will
be slow and you should fire
Thanks Jake! I will test this out and report back soon in case it's helpful
to others. Definitely appreciate the help.
Scott
On Fri, Oct 9, 2009 at 3:33 PM, Jake Mannix wrote:
> On Fri, Oct 9, 2009 at 3:07 PM, scott w wrote:
>
> > Example Document:
> > model_1_score = 0.9
> > model_2_score = 0
On Fri, Oct 9, 2009 at 3:07 PM, scott w wrote:
> Example Document:
> model_1_score = 0.9
> model_2_score = 0.3
> model_3_score = 0.7
>
> I want to be able to pass in the following map at query time:
> {model_1_score=0.4, model_2_score=0.7} and have that map get used as input
> to a custom score f
Hi Jake --
Sorry for the confusion. I have two similar but slightly different use cases
in mind and the example I gave you corresponds to one use case while the
code corresponds to the other slightly more complicated one. Ignore the
original example, and let me restate the one I have in mind so it
Hey Scott,
I'm still not sure I understand what your dynamic boosts are for: they
are the names of fields, right, not terms in the fields? So in terms
of your example { company = microsoft, city = redmond, size = big },
the three possible choices for keys in your map are company, city,
or size,
(Apologies if this message gets sent more than once. I received an error
sending it the first two times so sent directly to Jake but reposting to
group.)
Hi Jake --
Thanks for the feedback.
What I am trying to implement is a way to custom score documents using a
scoring function that takes as inp
Right exactly. I looked into payload initially and realized it wouldn't work
for my use case.
On Fri, Oct 9, 2009 at 2:00 PM, Grant Ingersoll wrote:
> Oops, just reread and realized you wanted query time weights. Payloads are
> an index time thing.
>
>
> On Oct 9, 2009, at 5:49 PM, Grant Ingers
If you are really using all of that precision (down to the second) the
short answer is YES.
If you can remove much of that precision (only keep down to the day,
for example), then you may be able to get perfectly good performance
with strings alone when the range is only over a small set of terms,
Oops, just reread and realized you wanted query time weights.
Payloads are an index time thing.
On Oct 9, 2009, at 5:49 PM, Grant Ingersoll wrote:
If you are trying to add specific term weights to terms in the index
and then incorporate them into scoring, you might benefit from
payloads a
If you are trying to add specific term weights to terms in the index
and then incorporate them into scoring, you might benefit from
payloads and the PayloadTermQuery option. See http://www.lucidimagination.com/blog/2009/08/05/getting-started-with-payloads/
-Grant
On Oct 8, 2009, at 11:56 AM
Hi,
I have a Date field in my Lucene index that is currently stored as a
String field with format: MMDDHHMISS. I perform RangeFilter on it
when searching and also sort the results specifying it as a String
field. My question is, will converting it to a Numeric field and start
using Numeri
I can provide some preliminary numbers (we will need to do some detailed
analysis and post it somewhere):
Dataset: medline
starting index: empty.
add only, no update, for 30 min.
maximum indexing load, 1000 docs/ sec
Under stress, we take indexing events (add only) and stream into both
systems: Z
Michael McCandless wrote:
On Fri, Oct 9, 2009 at 3:26 PM, Paul Taylor wrote
It still relies on super.getRangeQuery() for non-numeric fields. If
you don't have non-numeric fields that accept range queries you can
simply call NumericRangeQuery.newXXXRange directly.
For some indexes I hav
Michael McCandless wrote:
On Fri, Oct 9, 2009 at 3:26 PM, Paul Taylor wrote:
I currently use NumberTools.longToString() to add integer fields to
an index and allow range searching, then when searching I then
preprocess the query (using regular expressions) and convert integer
fields to Numb
The dimensions sound good. It's unclear if you're going to post a
chart again, numbers, or code? There's a LUCENE-1577 Jira issue for
code.
On Fri, Oct 9, 2009 at 12:37 PM, Jake Mannix wrote:
> Jason,
>
> We've been running some perf/load/stress tests lately, but on a suggestion
>
> from Ted D
Hi Paul,
for creating NumericFields just refer to the JavaDoc. As Mike said on the
query side you can create NumericRangeQuery directly (recommended) - see
javadocs. If you want to use QueryParser, you have to customize it, as
QueryParser does not support NumericRangeQuery natively.
Uwe
-
Uw
On Fri, Oct 9, 2009 at 3:26 PM, Paul Taylor wrote:
> I currently use NumberTools.longToString() to add integer fields to
> an index and allow range searching, then when searching I then
> preprocess the query (using regular expressions) and convert integer
> fields to NumberTools.longToString bef
Jason,
We've been running some perf/load/stress tests lately, but on a suggestion
from Ted Dunning, I've been trying to come up with a more "realistic" set of
stress
tests and indexing rates to see where NRT performs well and where it does
not,
instead of just indexing at maximum rate, looping
Jake and John,
It would be interesting and enlightening to see NRT performance
numbers in a variety of configurations. The best way to go about
this is to post benchmarks that others may run in their
environment which can then be tweaked for their unique edge
cases. I wish I had more time to work
Hi
I currently use NumberTools.longToString() to add integer fields to an
index and allow range searching, then when searching I then preprocess
the query (using regular expressions) and convert integer fields to
NumberTools.longToString before it is parsed by the QueryParser, then
when I re
Scott,
To reiterate what Erick and Andrzej's said: calling
IndexReader.document(docId)
in your inner scoring loop is the source of your performance problem -
iterating
over all these stored fields is what is killing you.
To do this a better way, can you try to explain exactly what this Scorer
Thanks for the suggestions Erick. I am using Lucene 2.3. Terms are stored
and given Andrzej's comments in the follow up email sounds like it's not the
stored field issue. I'll keep investigating...
thanks,
Scott
On Thu, Oct 8, 2009 at 8:06 AM, Erick Erickson wrote:
> I suspect your problem here
Hi,
we also index linguistic data, but (someone correct me if I'm wrong) you
have to deal with what the lucene store is offering.
You can store
usable on the search side :
- a term (TermAttribute)
- the position of the term (PositionIncrementAttribute)
- an arbitrary payload (PayloadAttrib
I am quite new to Lucene, but I have searched the FAQs and consulted
the mailinglist archive. I debugged through the source codes as well.
I have writen an Analyzer, that analyzes a stream by sending it to a
whole pipeline of linguistic processing and uses the internal
representation to construct
Got it -- thanks, Mark! (Recently I read elsewhere in the archives of this
list about the value or lack thereof of segments.gen, so skipping that file
was in the back of my mind as well.)
Chris
On Thu, Oct 8, 2009 at 3:04 PM, Mark Miller wrote:
> Nigel wrote:
> > Thanks, Mark. That makes sens
Thank you.
Starting from CachingTokenFilter was indeed the correct way to proceed.
Regards
Enrico
2009/10/8 Uwe Schindler
> restoreState only restores the token contents, not the complete stream. So
> you cannot roll back the token stream (and this was also not possible with
> the old API). Th
Were there any exceptions inside Lucene, before the hang?
The fact that you're hitting AlreadyClosedException is a spooky sign
-- that means IW thinks you had in fact closed the writer, but then
used it again.
For increasing indexing throughput, I'd start here:
http://wiki.apache.org/lucene-
Hi Mike
There are other threads involved but none are simultaneously modifying
the index.
There is one thread that retrieves the total count every 2 seconds on
the index for GUI display:
public long getTotalMessageCount(Volume volume) throws
MessageSearchException {
if (volum
Hi Mike
There are other threads involved but none are simultaneously modifying
the idex.
There is one read that retrieves the total count every 2 seconds on the
Index for GUI display:
public long getTotalMessageCount(Volume volume) throws
MessageSearchException {
if (volume =
You can use o.a.l.index.CheckIndex to fix the index. It will remove
references to any segments that are missing or have problems during
testing. First run it without -fix to see what problems there are.
Then take a backup of the index. Then run it with -fix. The index
will lose all docs in thos
Are there other threads involved, besides the one hung in close? Can
you post their stack traces?
This stack trace seems to indicate that IW believes another thread is
in the process of closing.
Can you call IndexWriter.setInfoStream and post the output leading to the hang?
Mike
On Fri, Oct 9,
HI Michael / Uwe / others
Sorry for the repost... it just does not look like the earlier message I
sent go through.
FYI: there are no large Lucene merges taking place.
Jamie Band wrote:
Hi Michael
Thanks for your help. Here are the stacks:
index processor [TIME_WAITING] CPU time: 33:01
java
Incidentally, there are no Lucene merge threads doing any work. See
attached.
-
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org
Hi Michael
Thanks for your help. Here are the stacks:
index processor [TIME_WAITING] CPU time: 33:01
java.lang.Object.wait(long)
org.apache.lucene.index.IndexWriter.doWait()
org.apache.lucene.index.IndexWriter.shouldClose()
org.apache.lucene.index.IndexWriter.close(boolean)
org.apache.lucene.ind
We use a statistical approach. So we have little language dependent context in
our search.
A simplified description:
Our data gets indexed with a "normal" analyzer in a data index.
In a second step we index all terms of defined search fields with a different
analyzer which uses bigrams on the ch
I am using lucene 2.9.0 ,Compass2.2.0. I have configured for jdbc store.In my
oracle db, there are 22000 users in User_ table . I am unable to index 22000
users.It stops at 13000 users. Problem is that table LUCENE_10109(my jdbc
index table) is getting populated with no of records. now I have 2772
40 matches
Mail list logo