Howdy,
Your work is outstanding and will hopefully be adopted soon.
The HDFS distributed Lucene index solves many of the various
dependencies introduced by achieving this another way using
RMI, HTTP (serialized objects w/servlets) or Tomcat balancing
with mysql databases, schemas and connection
We welcome your input. Discussions are mainly on
[EMAIL PROTECTED] now (a thread with the same title).
On 2/7/08, Dennis Kubes <[EMAIL PROTECTED]> wrote:
> This is actually something we were planning on building into Nutch.
>
> Dennis
This is actually something we were planning on building into Nutch.
Dennis
Ning Li wrote:
On 2/6/08, Ted Dunning <[EMAIL PROTECTED]> wrote:
Our best work-around is to simply take a shard out of service during delivery
of an updated index. This is obviously not a good solution.
How many shar
We have quite a few serving the load, but if we are trying to update
relatively often (say every 30 minutes), then having a server out of action
for several minutes really hurts. The outage is that long because you have
to
A) turn off traffic
B) wait for traffic to actually stop
C) move the mult
On 2/6/08, Ted Dunning <[EMAIL PROTECTED]> wrote:
> Our best work-around is to simply take a shard out of service during delivery
> of an updated index. This is obviously not a good solution.
How many shard servers are serving each shard? If it's more than one,
you can have the rest of the shard
Very nice summary.
One of the issues that we have had with multiple search servers is that on
linux, there can be substantial contention for disk I/O. This means that as
a new index is being written, access to the current index can be stalled for
very long periods of time (sometimes >10s). This
There have been several proposals for a Lucene-based distributed index
architecture.
1) Doug Cutting's "Index Server Project Proposal" at
http://www.mail-archive.com/[EMAIL PROTECTED]/msg00338.html
2) Solr's "Distributed Search" at
http://wiki.apache.org/solr/DistributedSearch
3) Mark Bu