Hi Joseph, I believe Nutch can index into Solr/SolrCloud just fine. Sounds like that is the approach you should take.
Otis -- Solr & ElasticSearch Support http://sematext.com/ On Thu, Mar 7, 2013 at 12:10 AM, Joseph Lim <ysli...@gmail.com> wrote: > Hi Amit, > > Currently I am designing a Learning Management System where it is based on > Hadoop and hbase . Right now I want to integrate nutch with solr in it as > part of crawler module, so that users will only be able to search relevant > documents from specific source. And since crawling and indexing takes so > much of the time, (might be 5 to 6 hours ~ 5gb) so hope that if there is > anything happen to the server, there will be replicates to back it up. > > I just saw what solrcloud can do but will need to check out if nutch is > able to work with it. Not knowing of other constraints I will encounter, so > was asking if I can just output the solr dir into a hdfs in the first > place. > > Cheers. > > On Thursday, March 7, 2013, Amit Nithian wrote: > > > Joseph, > > > > Doing what Otis said will do literally what you want which is copying the > > index to HDFS. It's no different than copying it to a different machine > > which btw is what Solr's master/slave replication scheme does. > > Alternatively, I think people are starting to setup new Solr instances > with > > SolrCloud which doesn't have the concept of master/slave but rather a > > series of nodes with the option of having replicas (what I believe to be > > backup nodes) so that you have the redundancy you want. > > > > Honestly HDFS in the way that you are looking for is probably no > different > > than storing your solr index in a RAIDed storage format but I don't > > pretend to know much about RAID arrays. > > > > What exactly are you trying to achieve from a systems perspective? Why do > > you want Hadoop in the mix here and how does copying the index to HDFS > help > > you? If SolrCloud seems complicated try just setting up a simple > > master/slave replication scheme for that's really easy. > > > > Cheers > > Amit > > > > > > On Wed, Mar 6, 2013 at 9:55 PM, Joseph Lim <ysli...@gmail.com> wrote: > > > > > Hi Amit, > > > > > > so you mean that if I just want to get redundancy for solr in hdfs, the > > > only best way to do it is to as per what Otis suggested using the > > following > > > command > > > > > > hadoop fs -copyFromLocal <localsrc> URI > > > > > > Ok let me try out solrcloud as I will need to make sure it works well > > with > > > nutch too.. > > > > > > Thanks for the help.. > > > > > > > > > On Thu, Mar 7, 2013 at 5:47 AM, Amit Nithian <anith...@gmail.com> > wrote: > > > > > > > Why wouldn't SolrCloud help you here? You can setup shards and > replicas > > > etc > > > > to have redundancy b/c HDFS isn't designed to serve real time queries > > as > > > > far as I understand. If you are using HDFS as a backup mechanism to > me > > > > you'd be better served having multiple slaves tethered to a master > (in > > a > > > > non-cloud environment) or setup SolrCloud either option would give > you > > > more > > > > redundancy than copying an index to HDFS. > > > > > > > > - Amit > > > > > > > > > > > > On Wed, Mar 6, 2013 at 12:23 PM, Joseph Lim <ysli...@gmail.com> > wrote: > > > > > > > > > Hi Upayavira, > > > > > > > > > > sure, let me explain. I am setting up Nutch and SOLR in hadoop > > > > environment. > > > > > Since I am using hdfs, in the event if there is any crashes to the > > > > > localhost(running solr), i will still have the shards of data being > > > > stored > > > > > in hdfs. > > > > > > > > > > Thanks you so much =) > > > > > > > > > > On Thu, Mar 7, 2013 at 1:19 AM, Upayavira <u...@odoko.co.uk> wrote: > > > > > > > > > > > What are you actually trying to achieve? If you can share what > you > > > are > > > > > > trying to achieve maybe folks can help you find the right way to > do > > > it. > > > > > > > > > > > > Upayavira > > > > > > > > > > > > On Wed, Mar 6, 2013, at 02:54 PM, Joseph Lim wrote: > > > > > > > Hello Otis , > > > > > > > > > > > > > > Is there any configuration where it will index into hdfs > instead? > > > > > > > > > > > > > > I tried crawlzilla and lily but I hope to update specific > > package > > > > such > > > > > > > as > > > > > > > Hadoop only or nutch only when there are updates. > > > > > > > > > > > > > > That's y would prefer to install separately . > > > > > > > > > > > > > > Thanks so much. Looking forward for your reply. > > > > > > > > > > > > > > On Wednesday, March 6, 2013, Otis Gospodnetic wrote: > > > > > > > > > > > > > > > Hello Joseph, > > > > > > > > > > > > > > > > You can certainly put them there, as in: > > > > > > > > hadoop fs -copyFromLocal <localsrc> URI > > > > > > > > > > > > > > > > But searching such an index will be slow. > > > > > > > > See also: http://katta.sourceforge.net/ > > > > > > > > > > > > > > > > Otis > > > > > > > > -- > > > > > > > > Solr & ElasticSearch Support > > > > > > > > http://sematext.com/ > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > On Wed, Mar 6, 2013 at 7:50 AM, Joseph Lim < > ysli...@gmail.com > > > > > > <javascript:;>> > > > > > > > > wrote: > > > > > > > > > > > > > > > > > Hi, > > > > > > > > > Would like to know how can i put the indexed solr shards > into > > > > hdfs? > > > > > > > > > > > > > > > > > > Thanks.. > > > > > > > > > > > > > > > > > > Joseph > > > > > *Joseph* > > > > > > > > -- > Best Regards, > *Joseph* >