Thanks Anshum and Eric.
Well, I was looking for something like searching by domain name in the email
address etc.
How can I reverse the tokens? Can you please explain in little detail?
Thanks,
Aditi
On Thu, Oct 30, 2008 at 10:58 AM, Anshum <[EMAIL PROTECTED]> wrote:
> Hi Aditi,
> As Eric mentio
Mike,
regarding this paragraph:
"To workaround this, on catching an OOME on any of IndexWriter's
methods, you should 1) forcibly remove the write lock
(IndexWriter.unlock static method) and then 2) not call any methods on
the old writer. Even if the old writer has concurrent merges running,
the
Hi Aditi,
As Eric mentioned, we'd need to know more to provide a rather apt solution.
At the same time, a prefix wildcard is a highly unoptimized thing for lucene
because of the way the index is stored/read. Ideally you'd atleast want to
reverse the tokens as already mentioned.
This is because the
Hi,
I am using RegexQuery and Highlighter, my query works fine and i get the
matches, but there is nothing being printed out from highlighter ?
at the same time, if I use Query, it works fine .
is something wrong with the code below ?
code --
//line -->input string (ie ".*out")
RegexQ
Thanks Mark. I appreciate the help.
I thought our memory may be low but wanted to verify there if there is
any way to control memory usage. I think we'll likely upgrade the
memory on the machines but that may just delay the inevitable.
Wondering if anyone else has encountered similar issues wit
The term, terminfo, indexreader internals stuff is prob on the low end
compared to the size of your field caches (needed for sorting). If you
are sorting by String I think the space needed is 32 bits x number of
docs + an array to hold all of the unique terms. So checking 300 million
docs (I kn
There's usually only a couple sort fields and a bunch of terms in the
various indices. The terms are user entered on various media so the
number of terms is very large.
Thanks for the help.
Todd
On 10/29/08, Todd Benge <[EMAIL PROTECTED]> wrote:
> Hi,
>
> I'm the lead engineer for search on a
not in 2.3.2 though.
cheers,
jed.
Michael McCandless wrote:
Or you can use IndexReader.unlock.
Mike
Jed Wesley-Smith wrote:
Michael McCandless wrote:
To workaround this, on catching an OOME on any of IndexWriter's
methods, you should 1) forcibly remove the write lock
(IndexWriter.unlock
How many fields are you sorting on? Lots of unuiqe terms in those
fields?
- Mark
On Oct 29, 2008, at 6:03 PM, "Todd Benge" <[EMAIL PROTECTED]> wrote:
Hi,
I'm the lead engineer for search on a large website using lucene for
search.
We're indexing about 300M documents in ~ 100 indices.
Hi,
I'm the lead engineer for search on a large website using lucene for search.
We're indexing about 300M documents in ~ 100 indices. The indices add
up to ~ 60G.
The indices are sorted into 4 different Multisearcher with the largest
handling ~50G.
The code is basically like the following:
OK I created this issue:
https://issues.apache.org/jira/browse/LUCENE-1430
Mike
Mindaugas Žakšauskas wrote:
Hi,
see my comments between Mike's text:
On Wed, Oct 29, 2008 at 4:05 PM, Michael McCandless
<[EMAIL PROTECTED]> wrote:
Hmm, so somehow your stored fields file is truncated --
Actually, compound file defaults to true.
One odd thing about your index: it has a single segment with 0 docs.
What was the history that led to this index? Did you create an index,
and then delete all of its documents, and optimize that? Or...
something else?
Mike
Mindaugas Žakšauska
Hi,
see my comments between Mike's text:
On Wed, Oct 29, 2008 at 4:05 PM, Michael McCandless
<[EMAIL PROTECTED]> wrote:
>
> Hmm, so somehow your stored fields file is truncated -- FieldsReader was
> unable to read the first int.
>
> Are you using compound file format in this index?
I'm not calli
Hmm, so somehow your stored fields file is truncated -- FieldsReader
was unable to read the first int.
Are you using compound file format in this index?
Do you have any idea how your index may have become corrupt?
Do you still have the original corrupt (not yet fixed) index? If so
can yo
Hi,
Following Mike's advice, the actual (non-masked exception using
Directory constructor) was as following:
Exception in thread "main" java.io.IOException: read past EOF
at
org.apache.lucene.store.BufferedIndexInput.refill(BufferedIndexInput.java:151)
at
org.apache.lucene.store
I think I see how this exception can happen. I think you are hitting
a different exception, which is masked by the exception you're seeing.
Can you run CheckIndex on this index? I think that should show the
actual root cause.
I think another simple way to see the root cause would be to
Hi Darren,
How large is your corpus? The speed you can expect depends on how much
data you load it with. There is a graph in the package level javadocs
that shows this:
http://lucene.apache.org/java/2_4_0/api/contrib-instantiated/org/apache/lucene/store/instantiated/package-summary.html
Hi Erick,
Sorry for not providing the context. The problem is that I couldn't
work out the exact test case for causing this - I will definitely post
one if I find. There's a possible cause for this but I don't want to
speculate as I don't know for sure.
Just to answer (some of) your questions, th
Well, I'd expect it to throw this error if you tried to close
an already-closed FSDirectory, But that's pretty useless since
you don't provide much context around your problem.
Did this just start occurring? Did you just migrate to 2.4 from
a previous version? Are you sure you aren't closing an al
Hmm, this strikes me as there being something wrong with the index,
but it could be a bug, too. Do you get an error if you just run the
BooleanQuery without the filter? How about if you run a simple
TermQuery with the Filter? Can you open the index with Luke? Does
the CheckIndex tool (i
Hi,
We're using Lucene 2.4.0 on Linux. Java version is 1.6.0_06.
Is there any reason why Lucene would be throwing this error:
org.apache.lucene.store.AlreadyClosedException: this Directory is closed
at org.apache.lucene.store.Directory.ensureOpen(Directory.java:220)
at org.apache
Sure, there are many tricks. If you search the mail archives you'll
find a bunch of them.
One would be to reverse the tokens and make your leading
wildcard queries into trailing ones on the reversed field.
But without more details about what you're trying to accomplish,
there's not much really us
Actually, FWIW, just after I posted last night I realized why the
ID was always the same, perhaps it'll be useful as an insight
into how Lucene works...
When you add the same field to a document, all the values are
added and retrieved in order. So calling " hits.doc(i).get("id")"
returns the *firs
It's fine to change any of IW's parameters on an existing index.
Nothing will break.
However, in general, such changes won't be retroactive: they only
apply to future actions the IW will take.
So, changing maxMergeDocs will only prevent future merges from
producing segments larger than
Hi All,
I have been wanting to do a wildcard search with * as a first letter on an
index.
Is there a way out except for setAllowLeadingWildcard() of QueryParser to
true? Because, i have heard it is an expensive operation.
Thanks
Aditi
Jed Wesley-Smith wrote:
Yeah, I saw the change to flush(). Trying to work out the correct
strategy for our IndexWriter handling now. We probably should not be
using autocommit for our writers.
autoCommit=true is deprecated as of 2.4.0, and will go away when we
finally get to 3.0, so I th
It looks like this is using Lucene 2.4.0.
Indexing time suddenly increased with respect to what baseline? 2.3?
A previous run on 2.4?
Mike
Birendar Singh Waldiya -X (bwaldiya - TCS at Cisco) wrote:
Hi Gurus,
We are using Lucene for creating indexes on some database column and
suddenly
Or you can use IndexReader.unlock.
Mike
Jed Wesley-Smith wrote:
Michael McCandless wrote:
To workaround this, on catching an OOME on any of IndexWriter's
methods, you should 1) forcibly remove the write lock
(IndexWriter.unlock static method)
IndexWriter.unlock(*) is 2.4 only.
Use the fo
28 matches
Mail list logo