Hi Joseph,

I believe Nutch can index into Solr/SolrCloud just fine.  Sounds like that
is the approach you should take.

Otis
--
Solr & ElasticSearch Support
http://sematext.com/





On Thu, Mar 7, 2013 at 12:10 AM, Joseph Lim <ysli...@gmail.com> wrote:

> Hi Amit,
>
> Currently I am designing a Learning Management System where it is based on
> Hadoop and hbase . Right now I want to integrate nutch with solr in it as
> part of crawler module, so that users will only be able to search relevant
> documents from specific source. And since crawling and indexing takes so
> much of the time, (might be 5 to 6 hours ~ 5gb) so hope that if there is
> anything happen to the server, there will be replicates to back it up.
>
> I just saw what solrcloud can do but will need to check out if nutch is
> able to work with it. Not knowing of other constraints I will encounter, so
> was asking if I can just output the solr dir into a hdfs in the first
> place.
>
> Cheers.
>
> On Thursday, March 7, 2013, Amit Nithian wrote:
>
> > Joseph,
> >
> > Doing what Otis said will do literally what you want which is copying the
> > index to HDFS. It's no different than copying it to a different machine
> > which btw is what Solr's master/slave replication scheme does.
> > Alternatively, I think people are starting to setup new Solr instances
> with
> > SolrCloud which doesn't have the concept of master/slave but rather a
> > series of nodes with the option of having replicas (what I believe to be
> > backup nodes) so that you have the redundancy you want.
> >
> > Honestly HDFS in the way that you are looking for is probably no
> different
> > than storing  your solr index in a RAIDed storage format but I don't
> > pretend to know much about RAID arrays.
> >
> > What exactly are you trying to achieve from a systems perspective? Why do
> > you want Hadoop in the mix here and how does copying the index to HDFS
> help
> > you? If SolrCloud seems complicated try just setting up a simple
> > master/slave replication scheme for that's really easy.
> >
> > Cheers
> > Amit
> >
> >
> > On Wed, Mar 6, 2013 at 9:55 PM, Joseph Lim <ysli...@gmail.com> wrote:
> >
> > > Hi Amit,
> > >
> > > so you mean that if I just want to get redundancy for solr in hdfs, the
> > > only best way to do it is to as per what Otis suggested using the
> > following
> > > command
> > >
> > > hadoop fs -copyFromLocal <localsrc> URI
> > >
> > > Ok let me try out solrcloud as I will need to make sure it works well
> > with
> > > nutch too..
> > >
> > > Thanks for the help..
> > >
> > >
> > > On Thu, Mar 7, 2013 at 5:47 AM, Amit Nithian <anith...@gmail.com>
> wrote:
> > >
> > > > Why wouldn't SolrCloud help you here? You can setup shards and
> replicas
> > > etc
> > > > to have redundancy b/c HDFS isn't designed to serve real time queries
> > as
> > > > far as I understand. If you are using HDFS as a backup mechanism to
> me
> > > > you'd be better served having multiple slaves tethered to a master
> (in
> > a
> > > > non-cloud environment) or setup SolrCloud either option would give
> you
> > > more
> > > > redundancy than copying an index to HDFS.
> > > >
> > > > - Amit
> > > >
> > > >
> > > > On Wed, Mar 6, 2013 at 12:23 PM, Joseph Lim <ysli...@gmail.com>
> wrote:
> > > >
> > > > > Hi Upayavira,
> > > > >
> > > > > sure, let me explain. I am setting up Nutch and SOLR in hadoop
> > > > environment.
> > > > > Since I am using hdfs, in the event if there is any crashes to the
> > > > > localhost(running solr), i will still have the shards of data being
> > > > stored
> > > > > in hdfs.
> > > > >
> > > > > Thanks you so much =)
> > > > >
> > > > > On Thu, Mar 7, 2013 at 1:19 AM, Upayavira <u...@odoko.co.uk> wrote:
> > > > >
> > > > > > What are you actually trying to achieve? If you can share what
> you
> > > are
> > > > > > trying to achieve maybe folks can help you find the right way to
> do
> > > it.
> > > > > >
> > > > > > Upayavira
> > > > > >
> > > > > > On Wed, Mar 6, 2013, at 02:54 PM, Joseph Lim wrote:
> > > > > > > Hello Otis ,
> > > > > > >
> > > > > > > Is there any configuration where it will index into hdfs
> instead?
> > > > > > >
> > > > > > > I tried crawlzilla and  lily but I hope to update specific
> > package
> > > > such
> > > > > > > as
> > > > > > > Hadoop only or nutch only when there are updates.
> > > > > > >
> > > > > > > That's y would prefer to install separately .
> > > > > > >
> > > > > > > Thanks so much. Looking forward for your reply.
> > > > > > >
> > > > > > > On Wednesday, March 6, 2013, Otis Gospodnetic wrote:
> > > > > > >
> > > > > > > > Hello Joseph,
> > > > > > > >
> > > > > > > > You can certainly put them there, as in:
> > > > > > > >   hadoop fs -copyFromLocal <localsrc> URI
> > > > > > > >
> > > > > > > > But searching such an index will be slow.
> > > > > > > > See also: http://katta.sourceforge.net/
> > > > > > > >
> > > > > > > > Otis
> > > > > > > > --
> > > > > > > > Solr & ElasticSearch Support
> > > > > > > > http://sematext.com/
> > > > > > > >
> > > > > > > >
> > > > > > > >
> > > > > > > >
> > > > > > > >
> > > > > > > > On Wed, Mar 6, 2013 at 7:50 AM, Joseph Lim <
> ysli...@gmail.com
> > > > > > <javascript:;>>
> > > > > > > > wrote:
> > > > > > > >
> > > > > > > > > Hi,
> > > > > > > > > Would like to know how can i put the indexed solr shards
> into
> > > > hdfs?
> > > > > > > > >
> > > > > > > > > Thanks..
> > > > > > > > >
> > > > > > > > > Joseph
> > > > > *Joseph*
> > >
> >
>
>
> --
> Best Regards,
> *Joseph*
>

Reply via email to