I am guessing that the idea behind not putting the indexes in HDFS is
(1) maximize performance; (2) they are relatively transient - meaning
the data they are created from could be in HDFS, but the indexes
themselves are just local. To avoid having to recreate them, a backup
copy could be k
Hi Ning,
In continuation with our offline conversation, here is a public
expression of interest in your work and a description of our work. Sorry
for the length in advance and I hope that the folk will be able to
collaborate and/or share experiences and/or give us some pointers...
1) We are
Doug Cutting wrote:
Ning,
I am also interested in starting a new project in this area. The
approach I have in mind is slightly different, but hopefully we can come
to some agreement and collaborate.
I'm interested in this too.
My current thinking is that the Solr search API is the appropri
Ning,
I am also interested in starting a new project in this area. The
approach I have in mind is slightly different, but hopefully we can come
to some agreement and collaborate.
My current thinking is that the Solr search API is the appropriate
model. Solr's facets are an important featur
I'm pretty sure that what you describe is the case, specially taking into
consideration that PageRank (what drives their search results) is a per
document value that is probably recomputed after some long time interval. I
did see a MapReduce algorithm to compute PageRank as well. However I do
think
(trimming excessive cc-s)
Ning Li wrote:
No. I'm curious too. :)
On Feb 6, 2008 11:44 AM, J. Delgado <[EMAIL PROTECTED]> wrote:
I assume that Google also has distributed index over their
GFS/MapReduce implementation. Any idea how they achieve this?
I'm pretty sure that MapReduce/GFS/BigTabl
One main focus is to provide fault-tolerance in this distributed index
system. Correct me if I'm wrong, I think SOLR-303 is focusing on merging
results from multiple shards right now. We'd like to start an open source
project for a fault-tolerant distributed index system (or join if one
already exi
No. I'm curious too. :)
On Feb 6, 2008 11:44 AM, J. Delgado <[EMAIL PROTECTED]> wrote:
> I assume that Google also has distributed index over their
> GFS/MapReduce implementation. Any idea how they achieve this?
>
> J.D.
>
I work for IBM Research. I read the Rackspace article. Rackspace's Mailtrust
has a similar design. Happy to see an existing application on such a system.
Do they plan to open-source it? Is the AOL project an open source project?
On Feb 6, 2008 11:33 AM, Clay Webster <[EMAIL PROTECTED]> wrote:
>
>
Clay Webster wrote:
There seem to be a few other players in this space too.
Are you from Rackspace?
(http://highscalability.com/how-rackspace-now-uses-mapreduce-and-hadoop-
query-terabytes-data)
AOL also has a Hadoop/Solr project going on.
CNET does not have much brewing there. Although Yo
I assume that Google also has distributed index over their
GFS/MapReduce implementation. Any idea how they achieve this?
J.D.
On Feb 6, 2008 11:33 AM, Clay Webster <[EMAIL PROTECTED]> wrote:
>
> There seem to be a few other players in this space too.
>
> Are you from Rackspace?
> (http://highsc
11 matches
Mail list logo