Roger Ford wrote:
[...index size troubles...]
Believe it or not, this 10 million documents was meant to be a single
partition of a much larger dataset. I'm not sure I'm at liberty to
discuss in detail the data I'm indexing - but it's a massive
geneological database.
Roger,
maybe your data type is
Execute 'ulimit -f' to see what your current limit is... And then change appropriately
after reading the man pages. My redhat machines come up with an unlimited file size
limit. I don't know what the real limit is of an "unlimited" limit - but I haven't
found it yet
Dan
-
Ryan Clifton wrote:
You seem to by implying that it is possible to optimize very large indexes. My index has a couple million records, but more importantly it's about 40 gigs in size. I have tried many times to optimize it and this always results in hitting the Linux file size limit. Is there a
Doug,
You seem to by implying that it is possible to optimize very large indexes. My index
has a couple million records, but more importantly it's about 40 gigs in size. I have
tried many times to optimize it and this always results in hitting the Linux file size
limit. Is there a way to ge
Armbrust, Daniel C. wrote:
If you set your mergeFactor back down to something closer to the default (10) - you probably wouldn't have any problems with file handles. The higher you make it, the more open files you will have. When I set it at 90 for performance reasons, I would run out of file han
Hi Claude,
one solution is to make the tokenStream method in the Analyzer subclass
listen to the field name. Example:
public TokenStream tokenStream(String fieldName, Reader reader) {
TokenStream result = new StandardTokenizer(reader);
result = new Standa
I would say that something definitely went wrong to make your index that big that
early - now that I saw you are only storing one field.
Even if you make your indexes partitioned at 2.5 instead of 10 million (which you
probably don't need to do) I would still recommend that you lower your mergeF
Roger,
> Given that on my previous 16GB partition it managed 1.5 million rows
> before failing, it looks like disk space requirements grow
> exponentially
> with number of documents indexed. Can anyone comment whether this
> should be true?
Exponentially? Would be surprising.
When you add docu
Lichtner, Guglielmo wrote:
That's 46 hits/s. That's not bad, actually.
It's not the time I'm worried about, so much as the disk consumption.
It's just failed optimizing 3 million documents with "No space left on
device". That's 100GB it's used!
Given that on my previous 16GB partition it managed 1.
Oh, and you may be short on diskspace You must have double the amount of diskspace
available as the end size of your index to call optimize - you may get by with less
diskspace if just do a single merge - never calling optimize - but I'm not sure about
this.
Our index of 15 million docume
Roger,
Just to double-check...
> Each document is typically only around 2K in size. Each field is
> free-text indexed, but only the "key" field is stored.
> After experimenting, I've set
>Java memory to 750MB
>writer.mergeFactor = 1
>- and run an optimize every 50,000 documents
We are currently doing something similar here.
We have upwards of 15 million documents in our index.
There has been a lot of discussion on this in the past... But I'll give a few details:
My current techniques for indexing very large amounts of data is to
Set the merge factor to 90
Leave the m
That's 46 hits/s. That's not bad, actually.
It's an interesting problem. It certainly seems that when index such a large
number of documents the indexing should be parallel. So far I have assumed
that
Lucene is not able to use multiple threads to speed up the indexing run.
If it did, I guess it
I'm trying to index 10 million small XML-like documents, extracted
from an Oracle database.
Lucene version is 1.2, and I'm using RedHat 7.0 Advanced Server,
on an AMD XP1800+ with 1GB RAM and 46GB+120GB hard disks. The
database is on a separate machine, connected by thin JDBC.
Each document consist
On Monday, July 28, 2003, at 01:32 AM, Claude Libois wrote:
My question is in the title: how can I use a different Analyzer for
each field of a Document object? My problem is that if I use
LetterTokenizer for a field which contains a String representation of
a number, after I can't delete it.
On Monday, July 28, 2003, at 03:12 AM, Kelvin Tan wrote:
AFAIK, there is a one-one mapping between an index and an analyzer.
Not true. The Analyzer base class has a method tokenStream that
accepts the field name. None of the built-in analyzers use the field
name to do anything different based
I think this may be exactly what i'm looking for!
Thanx a lot
Russs
I'll let you know how it works outthanx again!
-Original Message-
From: Otis Gospodnetic [mailto:[EMAIL PROTECTED]
Sent: Monday, July 28, 2003 6:56 AM
To: Lucene Users List
Subject: Re: How can I index JS
You could try using a spider such as Spindle. Don't have the URL, but
I'm sure you can find it via Google. Spindle uses Lucene.
Otis
--- "Pitre, Russell" <[EMAIL PROTECTED]> wrote:
> Reffering to this: http://www.jguru.com/faq/view.jsp?EID=1074516
>
>
>
>
>
> "To index the content of JS
Perhaps one way to do it is to have 2 separate indices for the 2 analyzers.
Then, depending on which field you wish to search, you can choose from either
index.
AFAIK, there is a one-one mapping between an index and an analyzer.
Kelvin
On Mon, 28 Jul 2003 10:32:21 +0200, Claude Libois said:
>My
My question is in the title: how can I use a different Analyzer for
each field of a Document object? My problem is that if I use
LetterTokenizer for a field which contains a String representation of a
number, after I can't delete it. Probably because this analyzer threw
away my number. So I n
20 matches
Mail list logo