Re: NUTCH, SOLR and HBase integration

Tony Mullins Fri, 28 Jun 2013 00:00:01 -0700

If you are using latest nutch 2.x and running your jobs with -crawlId param
then make sure you give the same -crawlId in your DBUpdateJob as well.


updatedb $commonOptions -crawlId $CRAWL_ID

Tony.


On Fri, Jun 28, 2013 at 11:39 AM, Mariam Salloum
<[email protected]>wrote:

> Hi Tejas,
>
> Thanks for your response. I'm using the latest version solr-4.3.1.
>
>
> On Jun 27, 2013, at 11:10 PM, Tejas Patil <[email protected]>
> wrote:
>
> > Which version of SOLR are you using ? It should go well with Solr 3.x
> >
> > http://wiki.apache.org/nutch/NutchTutorial
> >
> >
> > On Thu, Jun 27, 2013 at 11:02 PM, Mariam Salloum
> > <[email protected]>wrote:
> >
> >> I'm having problems with integrating SOLR and NUTCH. I have done the
> >> following:
> >>
> >> 1 - Installed/configured NUTCH, SOLR, and HBase.
> >>
> >> 2 - The crawl script did not work for me, so I'm using the step-by-step
> >> commands
> >>
> >> 3 - I ran inject, generate, fetch, and parse and all ran successfully.
> I'm
> >> able to see the table in HBase and see the fetch and parse flags set for
> >> the entries.
> >>
> >> 4 - I copied the /conf/schema.xml from the Nutch directory into the SOLR
> >> config directory and verified its using the right schema.xml file.
> >>
> >> 5 - I made sure that I updated schema.xml to set indexed and stored
> >> property to true
> >> <field name="content" type="text" stored="true" indexed="true"/>
> >>
> >> 6 - Finally, I started SOLR and tried running bin/nutch solrindex …
> >>
> >> SOLR runs without errors (checked the solr.log). However, nothing is
> >> loaded to SOLR. It states number of documents loaded is 0, and the query
> >> *:* returns nothing.
> >>
> >> What could be the problem? Any ideas will be appreciated.
> >>
> >> Thanks
> >>
> >> Mariam
>
>

Re: NUTCH, SOLR and HBase integration

Reply via email to