AUTOMATIC REPLY
LUX is closed until 5th January 2009
-
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org
AUTOMATIC REPLY
LUX is closed until 5th January 2009
-
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org
: you can solve your problem at search time by passing a custom Similarity class
In particular, consider subclassing SeweetSpotSimilrity ... instead of a
truely "flat" tf function, it makes it easy for you to define a
"sweetspot" so 2 instances of a word can score a lot higher then 1
instance,
Well, your query results are consistent with what Luke is
reporting. So I'd go back and test your assumptions. I
suspect that you're not indexing what you think you are.
For your test document, I'd just print out what you're indexing
and the field it's going into. *for each field*. that is, every
So my apologies for the duplicate comments, I went to go get proof of
duplicates and was confused as we apparently have duplicates across
different shards now in our distributed setup (a bug on our end.) I assumed
when I saw duplicates that it was the same problem as last time. Still
doesn't help m
Hi Erick
Thanks for your reply.
I have used luke to inspect the document and I am some what confused.
For example when I view the index using the overview tab of Luke I get
the following:
1 bodytest
1 id 1234
1 namertfDocumentToIndex.rtf
1 pathrtfD
So you have a segment (_tej) with 22201 docs, all but 30 of which are
deleted, and somehow one of the posting lists in _tej.frq is referencing an
out-of-bound docID 34950. Odd...
Are you sure the IO system doesn't have any consistency issues? What
environment are you running on (machine, OS, fil
Also, this (Solr server going down during an add) should not be able to
cause this kind of corruption.
Mike
Yonik Seeley wrote:
> On Fri, Jan 2, 2009 at 3:47 PM, Brian Whitman wrote:
> > I will but I bet I can guess what happened -- this index has many
> duplicates
> > in it as well (same uniqu
On Fri, Jan 2, 2009 at 3:47 PM, Brian Whitman wrote:
> I will but I bet I can guess what happened -- this index has many duplicates
> in it as well (same uniqueKey id multiple times) - this happened to us once
> before and it was because the solr server went down during an add.
That should no lon
I don't think there is any API support for this, but in theory it is
possible, as long as you aren't changing the size. It sounds like it
could work for you since you just plan to do it offline after indexing
and presumably you don't have anything else going on, right? I think
hacking it
Casing is usually handled by the analyzer. Since you construct
the term query programmatically, it doesn't go through
any analyzers, thus is not converted into lower case for
searching as was done automatically for you when you
indexed using StandardAnalyzer.
As for why you aren't getting hits, it
Here's checkindex:
NOTE: testing will be more thorough if you run java with
'-ea:org.apache.lucene', so assertions are enabled
Opening index @ /vol/solr/data/index/
Segments file=segments_vxx numSegments=8 version=FORMAT_HAS_PROX [Lucene
2.4]
1 of 8: name=_ks4 docCount=2504982
compound=fal
I will but I bet I can guess what happened -- this index has many duplicates
in it as well (same uniqueKey id multiple times) - this happened to us once
before and it was because the solr server went down during an add. We may
have to re-index, but I will run checkIndex now. Thanks
(Thread for dupe
It looks like your index has some kind of corruption. Were there any other
exceptions prior to this one, or, any previous problems with the OS/IO
system?
Can you run CheckIndex (java org.apache.lucene.index.CheckIndex to see
usage) and post the output?
Mike
Brian Whitman wrote:
> I am getting
Nuno,
Check towards the end of this article:
http://www.onjava.com/pub/a/onjava/2003/03/05/lucene.html
Otis
--
Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch
- Original Message
> From: Nuno Seco
> To: java-...@lucene.apache.org
> Sent: Friday, January 2, 2009 12:53:14 PM
I am getting this on a 10GB index (via solr 1.3) during an optimize:
Jan 2, 2009 6:51:52 PM org.apache.solr.common.SolrException log
SEVERE: java.io.IOException: background merge hit exception: _ks4:C2504982
_oaw:C514635 _tll:C827949 _tdx:C18372 _te8:C19929 _tej:C22201 _1agw:C1717926
_1agz:C1 into
Hello,
the easiest way would be to construct the combined document using the
data from your primary source rather than reconstructing it from the
index. If the source data no longer is available you could still
reconstruct a token stream. The data is however a bit spread out so it
can tur
Hi,
I have already indexed documents. I want to recombine them into new
documents. Is this possible without the original documents - only with the
index?
Example:
doc1, doc2, doc3 are indexed.
I want a new indexed doc4 which is indexed as if I had concatenated doc1,
doc2, doc3 into doc4 and then
Lucene implements ACID (like modern databases), with the restriction
that only one transaction may be open at a time.
So, once commit (your step 4) is called and succeeds, Lucene
guarantees that any prior changes (eg your step 2) are written to
stable storage and will not be lost ("durability").
Hi Hoss,
Before posting this question, I did try FieldNormModifier approach.
It did modify it.
>From one big segment it added 7 more small segments per field.
However, upon testing this index, the norms problem still occurs with the same
stack trace error.
This leads me to believe that FieldN
Hi
I have tried this and it doesn't work. I don't understand why using
"amin" instead of "Amin" would work, is it not case insensitive?
I tried "test" for field "body" and this works. Any other terms don't
work for example:
"document"
"indexed"
these are tokens that were extracted when
Hi,
I was reading the 2.4 javadoc as well as other sources but couldn't
find clear answer.
I need to know whether the sequence
(1) open index writer -> (2) write something to index -> (3)
optimize index -> (4) commit
can corrupt the index / lose the data written at the point of (2)
after (4) is
Basically Lucene stores analyzed tokens, and looks up for the matches based
on the tokens.
"Amin" after StandardAnalyzer is "amin", so you need to use new Term("body",
"amin"), instead of new Term("body", "Amin"), to search.
--
Chris Lu
-
Instant Scalable Full-Text Search
23 matches
Mail list logo