Lucene on a local disk benefits significantly from the local filesystem's
RAM cache (aka the kernel's buffer cache). HDFS has no such local RAM cache
outside of the stream's buffer. The cache would need to be no larger than
the kernel's buffer cache to get an equivalent hit ratio. And if
in HDFS compares with a ParallelMultiReader (or whatever its
called) over RPC on a local filesystem?
I'm missing why you would ever want the Lucene index in HDFS for
reading.
Ian
Ning Li ning.li...@gmail.com writes:
I should have pointed out that Nutch index build and contrib/index
targets
ratio...
Cheers,
Ning
On Mon, Mar 16, 2009 at 5:36 PM, Doug Cutting cutt...@apache.org wrote:
Ning Li wrote:
With
http://issues.apache.org/jira/browse/HADOOP-4801, however, it may
become feasible to search on HDFS directly.
I don't think HADOOP-4801 is required. It would help, certainly
Or you can check out the index contrib. The difference of the two is that:
- In Nutch's indexing map/reduce job, indexes are built in the
reduce phase. Afterwards, they are merged into smaller number of
shards if necessary. The last time I checked, the merge process does
not use map/reduce.
-
On 8/12/08, Deepika Khera [EMAIL PROTECTED] wrote:
I was imagining the 2 concepts of i) using hadoop.contrib.index to index
documents ii) providing search in a distributed fashion, to be all in
one box.
Ideally, yes. However, while it's good to use map/reduce when
batch-building index, there
1) Katta n Distributed Lucene are different projects though, right? Both
being based on kind of the same paradigm (Distributed Index)?
The design of Katta and that of Distributed Lucene are quite different
last time I checked. I pointed out the Katta project because you can
find the code for
http://wiki.apache.org/hadoop/DistributedLucene
and hadoop.contrib.index are two different things.
For information on hadoop.contrib.index, see the README file in the package.
I believe you can find code for http://wiki.apache.org/hadoop/DistributedLucene
at http://katta.wiki.sourceforge.net/.
You can build Lucene indexes using Hadoop Map/Reduce. See the index
contrib package in the trunk. Or is it still not something you are
looking for?
Regards,
Ning
On 4/4/08, Aayush Garg [EMAIL PROTECTED] wrote:
No, currently my requirement is to solve this problem by apache hadoop. I am
trying
Hi,
Nutch builds Lucene indexes. But Nutch is much more than that. It is a
web search application software that crawls the web, inverts links and
builds indexes. Each step is one or more Map/Reduce jobs. You can find
more information at http://lucene.apache.org/nutch/
The Map/Reduce job to build
Hi,
Is there any interest in a contrib package to build/update a Lucene index?
I should have asked the question before creating the JIRA issue and
attaching the patch. In any case, more details can be found at
https://issues.apache.org/jira/browse/HADOOP-2951
Regards,
Ning
We welcome your input. Discussions are mainly on
[EMAIL PROTECTED] now (a thread with the same title).
On 2/7/08, Dennis Kubes [EMAIL PROTECTED] wrote:
This is actually something we were planning on building into Nutch.
Dennis
that an
application has more control over where to store the primary and replicas of
an HDFS block. This feature may be useful for other HDFS applications (e.g.,
HBase). We would like to collaborate with other people who are interested in
adding this feature to HDFS.
Regards,
Ning Li
On 2/6/08, Ted Dunning [EMAIL PROTECTED] wrote:
Our best work-around is to simply take a shard out of service during delivery
of an updated index. This is obviously not a good solution.
How many shard servers are serving each shard? If it's more than one,
you can have the rest of the shard
13 matches
Mail list logo