Re: Sending Documents via SolrServer as MapReduce Jobs at Solrj

Furkan KAMACI Fri, 05 Jul 2013 14:59:18 -0700

Ok, I know that it is really unnecessary to start a complex design. On the
other hand if your resources and needs are adequate and if you have a
bottleneck at your design it is really a fail not to plan a new design.


We have more than terabytes of data and we have dedicated some developers
at Hadoop and Hbase side. There are many machines at our network
architecture (currently we are making some tests and improvements). Data is
stored as distributed and indexed at SolrCloud as distributed.

However there is a bottleneck at this architecture. Taking from data from
Hbase and sending to SolrCloud is not fast as much as other parts of
system. If we don't resolve that problem and use current architecture I
think that will be a design fault.

That's why I asked this question and it seems reasonable for me. I know
that some people are storing Lucene indexes at Hbase and that is the
correct design for them. Sending data at Solrj with Map Reduce jobs may be
another good thing according to our needs and I think there maybe some
people from community that has tried that or even think about it. Thanks
for the answers.

2013/7/5 Roman Chyla <roman.ch...@gmail.com>

> I don't want to sound negative, but I think it is a valid question to
> consider - for the lack of information and certain mental rigidity may make
> it sound bad - first of all, it is probably not for few gigabytes of data
> and I can imagine that building indexes at the side when data lives is much
> faster/cheaper, then sending data to solr - if we think the index is the
> product of the map, then the 'reduce' part may be this
> http://wiki.apache.org/solr/MergingSolrIndexes
>
> I don't really know enough about CloudSolrServer and how to fit the cloud
> there
>
> roman
>
> On Fri, Jul 5, 2013 at 12:23 PM, Jack Krupansky <j...@basetechnology.com
> >wrote:
>
> > Software developers are sometimes compensated based on the degree of
> > complexity that they deal with.
> >
> > And managers are sometimes compensated based on the number of people they
> > manage, as well as the degree of complexity of what they manage.
> >
> > And... training organizations can charge more and have a larger pool of
> > eager customers when the subject matter has higher complexity.
> >
> > And... consultants and contractors will be in higher demand and able to
> > charge more, based on the degree of complexity that they have mastered.
> >
> > So, more complexity results in greater opportunity for higher income!
> >
> > (Oh, and, writers and book authors have more to write about and readers
> > are more eager to purchase those writings as well, especially if the
> > subject matter is constantly changing.)
> >
> > Somebody please remind me I said this any time you catch me trying to
> > argue for Solr to be made simpler and easier to use!
> >
> > -- Jack Krupansky
> >
> > -----Original Message----- From: Walter Underwood
> > Sent: Friday, July 05, 2013 12:11 PM
> > To: solr-user@lucene.apache.org
> > Subject: Re: Sending Documents via SolrServer as MapReduce Jobs at Solrj
> >
> >
> > Why is it better to require another large software system (Hadoop), when
> > it works fine without it?
> >
> > That just sounds like more stuff to configure, misconfigure, and cause
> > problems with indexing.
> >
> > wunder
> >
> > On Jul 5, 2013, at 4:48 AM, Furkan KAMACI wrote:
> >
> >  We are using Nutch to crawl web sites and it stores documents at Hbase.
> >> Nutch uses Solrj to send documents to be indexed. We have Hadoop at our
> >> ecosystem as well. I think that there should be an implementation at
> Solrj
> >> that sends documents (via CloudSolrServer or something like that) as
> >> MapReduce jobs. Is there any implentation for it or is it not a good
> idea?
> >>
> >
> >
> >
>

Re: Sending Documents via SolrServer as MapReduce Jobs at Solrj

Reply via email to