Re: SOLR on hdfs

Joseph Lim Wed, 06 Mar 2013 21:11:29 -0800

Hi Amit,

Currently I am designing a Learning Management System where it is based on
Hadoop and hbase . Right now I want to integrate nutch with solr in it as
part of crawler module, so that users will only be able to search relevant
documents from specific source. And since crawling and indexing takes so
much of the time, (might be 5 to 6 hours ~ 5gb) so hope that if there is
anything happen to the server, there will be replicates to back it up.


I just saw what solrcloud can do but will need to check out if nutch is
able to work with it. Not knowing of other constraints I will encounter, so
was asking if I can just output the solr dir into a hdfs in the first place.

Cheers.

On Thursday, March 7, 2013, Amit Nithian wrote:

> Joseph,
>
> Doing what Otis said will do literally what you want which is copying the
> index to HDFS. It's no different than copying it to a different machine
> which btw is what Solr's master/slave replication scheme does.
> Alternatively, I think people are starting to setup new Solr instances with
> SolrCloud which doesn't have the concept of master/slave but rather a
> series of nodes with the option of having replicas (what I believe to be
> backup nodes) so that you have the redundancy you want.
>
> Honestly HDFS in the way that you are looking for is probably no different
> than storing  your solr index in a RAIDed storage format but I don't
> pretend to know much about RAID arrays.
>
> What exactly are you trying to achieve from a systems perspective? Why do
> you want Hadoop in the mix here and how does copying the index to HDFS help
> you? If SolrCloud seems complicated try just setting up a simple
> master/slave replication scheme for that's really easy.
>
> Cheers
> Amit
>
>
> On Wed, Mar 6, 2013 at 9:55 PM, Joseph Lim <ysli...@gmail.com> wrote:
>
> > Hi Amit,
> >
> > so you mean that if I just want to get redundancy for solr in hdfs, the
> > only best way to do it is to as per what Otis suggested using the
> following
> > command
> >
> > hadoop fs -copyFromLocal <localsrc> URI
> >
> > Ok let me try out solrcloud as I will need to make sure it works well
> with
> > nutch too..
> >
> > Thanks for the help..
> >
> >
> > On Thu, Mar 7, 2013 at 5:47 AM, Amit Nithian <anith...@gmail.com> wrote:
> >
> > > Why wouldn't SolrCloud help you here? You can setup shards and replicas
> > etc
> > > to have redundancy b/c HDFS isn't designed to serve real time queries
> as
> > > far as I understand. If you are using HDFS as a backup mechanism to me
> > > you'd be better served having multiple slaves tethered to a master (in
> a
> > > non-cloud environment) or setup SolrCloud either option would give you
> > more
> > > redundancy than copying an index to HDFS.
> > >
> > > - Amit
> > >
> > >
> > > On Wed, Mar 6, 2013 at 12:23 PM, Joseph Lim <ysli...@gmail.com> wrote:
> > >
> > > > Hi Upayavira,
> > > >
> > > > sure, let me explain. I am setting up Nutch and SOLR in hadoop
> > > environment.
> > > > Since I am using hdfs, in the event if there is any crashes to the
> > > > localhost(running solr), i will still have the shards of data being
> > > stored
> > > > in hdfs.
> > > >
> > > > Thanks you so much =)
> > > >
> > > > On Thu, Mar 7, 2013 at 1:19 AM, Upayavira <u...@odoko.co.uk> wrote:
> > > >
> > > > > What are you actually trying to achieve? If you can share what you
> > are
> > > > > trying to achieve maybe folks can help you find the right way to do
> > it.
> > > > >
> > > > > Upayavira
> > > > >
> > > > > On Wed, Mar 6, 2013, at 02:54 PM, Joseph Lim wrote:
> > > > > > Hello Otis ,
> > > > > >
> > > > > > Is there any configuration where it will index into hdfs instead?
> > > > > >
> > > > > > I tried crawlzilla and  lily but I hope to update specific
> package
> > > such
> > > > > > as
> > > > > > Hadoop only or nutch only when there are updates.
> > > > > >
> > > > > > That's y would prefer to install separately .
> > > > > >
> > > > > > Thanks so much. Looking forward for your reply.
> > > > > >
> > > > > > On Wednesday, March 6, 2013, Otis Gospodnetic wrote:
> > > > > >
> > > > > > > Hello Joseph,
> > > > > > >
> > > > > > > You can certainly put them there, as in:
> > > > > > >   hadoop fs -copyFromLocal <localsrc> URI
> > > > > > >
> > > > > > > But searching such an index will be slow.
> > > > > > > See also: http://katta.sourceforge.net/
> > > > > > >
> > > > > > > Otis
> > > > > > > --
> > > > > > > Solr & ElasticSearch Support
> > > > > > > http://sematext.com/
> > > > > > >
> > > > > > >
> > > > > > >
> > > > > > >
> > > > > > >
> > > > > > > On Wed, Mar 6, 2013 at 7:50 AM, Joseph Lim <ysli...@gmail.com
> > > > > <javascript:;>>
> > > > > > > wrote:
> > > > > > >
> > > > > > > > Hi,
> > > > > > > > Would like to know how can i put the indexed solr shards into
> > > hdfs?
> > > > > > > >
> > > > > > > > Thanks..
> > > > > > > >
> > > > > > > > Joseph
> > > > *Joseph*
> >
>


-- 
Best Regards,
*Joseph*

Re: SOLR on hdfs

Reply via email to