Re: Re: Re: Search Problem

2009-01-02 Thread tom
AUTOMATIC REPLY LUX is closed until 5th January 2009 - To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: java-user-h...@lucene.apache.org

Re: Re: Search Problem

2009-01-02 Thread tom
AUTOMATIC REPLY LUX is closed until 5th January 2009 - To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: java-user-h...@lucene.apache.org

Re: Any way to ignore repeated terms in TF calculation?

2009-01-02 Thread Chris Hostetter
: you can solve your problem at search time by passing a custom Similarity class In particular, consider subclassing SeweetSpotSimilrity ... instead of a truely "flat" tf function, it makes it easy for you to define a "sweetspot" so 2 instances of a word can score a lot higher then 1 instance,

Re: Search Problem

2009-01-02 Thread Erick Erickson
Well, your query results are consistent with what Luke is reporting. So I'd go back and test your assumptions. I suspect that you're not indexing what you think you are. For your test document, I'd just print out what you're indexing and the field it's going into. *for each field*. that is, every

Re: background merge hit exception

2009-01-02 Thread Brian Whitman
So my apologies for the duplicate comments, I went to go get proof of duplicates and was confused as we apparently have duplicates across different shards now in our distributed setup (a bug on our end.) I assumed when I saw duplicates that it was the same problem as last time. Still doesn't help m

Re: Search Problem

2009-01-02 Thread Amin Mohammed-Coleman
Hi Erick Thanks for your reply. I have used luke to inspect the document and I am some what confused. For example when I view the index using the overview tab of Luke I get the following: 1 bodytest 1 id 1234 1 namertfDocumentToIndex.rtf 1 pathrtfD

Re: background merge hit exception

2009-01-02 Thread Michael McCandless
So you have a segment (_tej) with 22201 docs, all but 30 of which are deleted, and somehow one of the posting lists in _tej.frq is referencing an out-of-bound docID 34950. Odd... Are you sure the IO system doesn't have any consistency issues? What environment are you running on (machine, OS, fil

Re: background merge hit exception

2009-01-02 Thread Michael McCandless
Also, this (Solr server going down during an add) should not be able to cause this kind of corruption. Mike Yonik Seeley wrote: > On Fri, Jan 2, 2009 at 3:47 PM, Brian Whitman wrote: > > I will but I bet I can guess what happened -- this index has many > duplicates > > in it as well (same uniqu

Re: background merge hit exception

2009-01-02 Thread Yonik Seeley
On Fri, Jan 2, 2009 at 3:47 PM, Brian Whitman wrote: > I will but I bet I can guess what happened -- this index has many duplicates > in it as well (same uniqueKey id multiple times) - this happened to us once > before and it was because the solr server went down during an add. That should no lon

Re: updating payloads

2009-01-02 Thread Grant Ingersoll
I don't think there is any API support for this, but in theory it is possible, as long as you aren't changing the size. It sounds like it could work for you since you just plan to do it offline after indexing and presumably you don't have anything else going on, right? I think hacking it

Re: Search Problem

2009-01-02 Thread Erick Erickson
Casing is usually handled by the analyzer. Since you construct the term query programmatically, it doesn't go through any analyzers, thus is not converted into lower case for searching as was done automatically for you when you indexed using StandardAnalyzer. As for why you aren't getting hits, it

Re: background merge hit exception

2009-01-02 Thread Brian Whitman
Here's checkindex: NOTE: testing will be more thorough if you run java with '-ea:org.apache.lucene', so assertions are enabled Opening index @ /vol/solr/data/index/ Segments file=segments_vxx numSegments=8 version=FORMAT_HAS_PROX [Lucene 2.4] 1 of 8: name=_ks4 docCount=2504982 compound=fal

Re: background merge hit exception

2009-01-02 Thread Brian Whitman
I will but I bet I can guess what happened -- this index has many duplicates in it as well (same uniqueKey id multiple times) - this happened to us once before and it was because the solr server went down during an add. We may have to re-index, but I will run checkIndex now. Thanks (Thread for dupe

Re: background merge hit exception

2009-01-02 Thread Michael McCandless
It looks like your index has some kind of corruption. Were there any other exceptions prior to this one, or, any previous problems with the OS/IO system? Can you run CheckIndex (java org.apache.lucene.index.CheckIndex to see usage) and post the output? Mike Brian Whitman wrote: > I am getting

Re: Too many open files

2009-01-02 Thread Otis Gospodnetic
Nuno, Check towards the end of this article: http://www.onjava.com/pub/a/onjava/2003/03/05/lucene.html Otis -- Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch - Original Message > From: Nuno Seco > To: java-...@lucene.apache.org > Sent: Friday, January 2, 2009 12:53:14 PM

background merge hit exception

2009-01-02 Thread Brian Whitman
I am getting this on a 10GB index (via solr 1.3) during an optimize: Jan 2, 2009 6:51:52 PM org.apache.solr.common.SolrException log SEVERE: java.io.IOException: background merge hit exception: _ks4:C2504982 _oaw:C514635 _tll:C827949 _tdx:C18372 _te8:C19929 _tej:C22201 _1agw:C1717926 _1agz:C1 into

Re: Re-combining already indexed documents

2009-01-02 Thread Karl Wettin
Hello, the easiest way would be to construct the combined document using the data from your primary source rather than reconstructing it from the index. If the source data no longer is available you could still reconstruct a token stream. The data is however a bit spread out so it can tur

Re-combining already indexed documents

2009-01-02 Thread spring
Hi, I have already indexed documents. I want to recombine them into new documents. Is this possible without the original documents - only with the index? Example: doc1, doc2, doc3 are indexed. I want a new indexed doc4 which is indexed as if I had concatenated doc1, doc2, doc3 into doc4 and then

Re: Optimization and commit

2009-01-02 Thread Michael McCandless
Lucene implements ACID (like modern databases), with the restriction that only one transaction may be open at a time. So, once commit (your step 4) is called and succeeds, Lucene guarantees that any prior changes (eg your step 2) are written to stable storage and will not be lost ("durability").

Re: Extract the text that was indexed

2009-01-02 Thread Lebiram
Hi Hoss, Before posting this question, I did try FieldNormModifier approach. It did modify it. >From one big segment it added 7 more small segments per field. However, upon testing this index, the norms problem still occurs with the same stack trace error. This leads me to believe that FieldN

Re: Search Problem

2009-01-02 Thread Amin Mohammed-Coleman
Hi I have tried this and it doesn't work. I don't understand why using "amin" instead of "Amin" would work, is it not case insensitive? I tried "test" for field "body" and this works. Any other terms don't work for example: "document" "indexed" these are tokens that were extracted when

Optimization and commit

2009-01-02 Thread Mindaugas Žakšauskas
Hi, I was reading the 2.4 javadoc as well as other sources but couldn't find clear answer. I need to know whether the sequence (1) open index writer -> (2) write something to index -> (3) optimize index -> (4) commit can corrupt the index / lose the data written at the point of (2) after (4) is

Re: Search Problem

2009-01-02 Thread Chris Lu
Basically Lucene stores analyzed tokens, and looks up for the matches based on the tokens. "Amin" after StandardAnalyzer is "amin", so you need to use new Term("body", "amin"), instead of new Term("body", "Amin"), to search. -- Chris Lu - Instant Scalable Full-Text Search