Thanks Rick

Sent from my iPhone

> On Dec 26, 2019, at 2:30 AM, Zara Parst <[email protected]> wrote:
> 
> Hi, Is it possible to crawl three different website like
> 
> 1. https://www.urgenthomework.com/
> 2. https://www.myassignmenthelp.net/
> 3. https://www.assignmenthelp.net/
> 
> in single nutch configuration and then send the respective index pages to
> corrosponding cores [ uah, mah , yah]  in solr.  I tried to acheieve it by
> exchange and writer id.  Please look below for my confirgurations
> 
> -------------exchange.xml---------------------------------
> 
> 
> 
> 
> 
> 
> 
> *<exchange id="uahIndexernew" class="default">    <writers>      <writer
> id="indexer_solr_1" />    </writers>    <params>      <param name="expr"
> value="doc.getFieldValue('host')=='urgenthomework.com
> <http://urgenthomework.com>'" />    </params>  </exchange>*
> 
> 
> 
> 
> 
> 
> 
> 
> *<exchange id="mahIndexernew" class="default">    <writers>      <writer
> id="indexer_solr_2" />    </writers>    <params>      <param name="expr"
> value="doc.getFieldValue('host')=='myassignmenthelp.net
> <http://myassignmenthelp.net>'" />    </params>  </exchange>*
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> * <exchange id="yahIndexernew" class="default">    <writers>      <writer
> id="indexer_solr_3" />    </writers>    <params>      <param name="expr"
> value="doc.getFieldValue('host')=='assignmenthelp.net
> <http://assignmenthelp.net>'" />    </params>  </exchange>*
> 
> 
> 
> ---------------------------------index.writers.xml----------------------------------------
> 
> <writer id="indexer_solr_1"
> class="org.apache.nutch.indexwriter.solr.SolrIndexWriter">
>    <parameters>
>      <param name="type" value="http" />
>      <param name="url" value="http://localhost:8983/solr/uah"; />
>      <param name="collection" value="" />
>      <param name="weight.field" value="" />
>      <param name="commitSize" value="1000" />
>      <param name="auth" value="false" />
>      <param name="username" value="username" />
>      <param name="password" value="password" />
>    </parameters>
>    <mapping>
>      <copy>
>        <!-- <field source="title" dest="content" />
>        <field source="metatag.description" dest="content" />
>        <field source="metatag.keywords" dest="content" /> -->
>      </copy>
>      <rename></rename>
>      <remove>
>        <field source="segment" />
>        <field source="host" />
>        <field source="url" />
>        <!-- <field source="metatag.description" />
>        <field source="metatag.keywords" />
>        <field source="date" />
>        <field source="url" />
>         -->
>      </remove>
>    </mapping>
>  </writer>
> 
> 
>  <writer id="indexer_solr_2"
> class="org.apache.nutch.indexwriter.solr.SolrIndexWriter">
>    <parameters>
>      <param name="type" value="http" />
>      <param name="url" value="http://localhost:8983/solr/mah"; />
>      <param name="collection" value="" />
>      <param name="weight.field" value="" />
>      <param name="commitSize" value="1000" />
>      <param name="auth" value="false" />
>      <param name="username" value="username" />
>      <param name="password" value="password" />
>    </parameters>
>    <mapping>
>      <copy>
>      </copy>
>      <rename></rename>
>      <remove>
>        <field source="segment" />
>        <field source="host" />
>        <field source="url" />
>      </remove>
>    </mapping>
>  </writer>
> 
> 
> 
>  <writer id="indexer_solr_3"
> class="org.apache.nutch.indexwriter.solr.SolrIndexWriter">
>    <parameters>
>      <param name="type" value="http" />
>      <param name="url" value="http://localhost:8983/solr/yah"; />
>      <param name="collection" value="" />
>      <param name="weight.field" value="" />
>      <param name="commitSize" value="1000" />
>      <param name="auth" value="false" />
>      <param name="username" value="username" />
>      <param name="password" value="password" />
>    </parameters>
>    <mapping>
>      <copy>
>      </copy>
>      <rename></rename>
>      <remove>
>        <field source="segment" />
>        <field source="host" />
>        <field source="url" />
>      </remove>
>    </mapping>
>  </writer>
> 
> ---------------------------------------------------------------------------------------------------------------
> 
> But it is not pushing data into corrosinding cores rather it is sending
> data in one core from different domain, Please do let me know. I am sure
> there has to be way to achieve it. I didnt try wth sobcollecion.xml. Do you
> think I can achieve it using subcollection?

Reply via email to