Thanks 

Sent from my iPhone

> On Dec 27, 2019, at 7:58 PM, Richard Lavin <[email protected]> wrote:
> 
> Thanks Rick
> 
> Sent from my iPhone
> 
>> On Dec 26, 2019, at 2:30 AM, Zara Parst <[email protected]> wrote:
>> 
>> Hi, Is it possible to crawl three different website like
>> 
>> 1. https://www.urgenthomework.com/
>> 2. https://www.myassignmenthelp.net/
>> 3. https://www.assignmenthelp.net/
>> 
>> in single nutch configuration and then send the respective index pages to
>> corrosponding cores [ uah, mah , yah]  in solr.  I tried to acheieve it by
>> exchange and writer id.  Please look below for my confirgurations
>> 
>> -------------exchange.xml---------------------------------
>> 
>> 
>> 
>> 
>> 
>> 
>> 
>> *<exchange id="uahIndexernew" class="default">    <writers>      <writer
>> id="indexer_solr_1" />    </writers>    <params>      <param name="expr"
>> value="doc.getFieldValue('host')=='urgenthomework.com
>> <http://urgenthomework.com>'" />    </params>  </exchange>*
>> 
>> 
>> 
>> 
>> 
>> 
>> 
>> 
>> *<exchange id="mahIndexernew" class="default">    <writers>      <writer
>> id="indexer_solr_2" />    </writers>    <params>      <param name="expr"
>> value="doc.getFieldValue('host')=='myassignmenthelp.net
>> <http://myassignmenthelp.net>'" />    </params>  </exchange>*
>> 
>> 
>> 
>> 
>> 
>> 
>> 
>> 
>> 
>> 
>> * <exchange id="yahIndexernew" class="default">    <writers>      <writer
>> id="indexer_solr_3" />    </writers>    <params>      <param name="expr"
>> value="doc.getFieldValue('host')=='assignmenthelp.net
>> <http://assignmenthelp.net>'" />    </params>  </exchange>*
>> 
>> 
>> 
>> ---------------------------------index.writers.xml----------------------------------------
>> 
>> <writer id="indexer_solr_1"
>> class="org.apache.nutch.indexwriter.solr.SolrIndexWriter">
>>   <parameters>
>>     <param name="type" value="http" />
>>     <param name="url" value="http://localhost:8983/solr/uah"; />
>>     <param name="collection" value="" />
>>     <param name="weight.field" value="" />
>>     <param name="commitSize" value="1000" />
>>     <param name="auth" value="false" />
>>     <param name="username" value="username" />
>>     <param name="password" value="password" />
>>   </parameters>
>>   <mapping>
>>     <copy>
>>       <!-- <field source="title" dest="content" />
>>       <field source="metatag.description" dest="content" />
>>       <field source="metatag.keywords" dest="content" /> -->
>>     </copy>
>>     <rename></rename>
>>     <remove>
>>       <field source="segment" />
>>       <field source="host" />
>>       <field source="url" />
>>       <!-- <field source="metatag.description" />
>>       <field source="metatag.keywords" />
>>       <field source="date" />
>>       <field source="url" />
>>        -->
>>     </remove>
>>   </mapping>
>> </writer>
>> 
>> 
>> <writer id="indexer_solr_2"
>> class="org.apache.nutch.indexwriter.solr.SolrIndexWriter">
>>   <parameters>
>>     <param name="type" value="http" />
>>     <param name="url" value="http://localhost:8983/solr/mah"; />
>>     <param name="collection" value="" />
>>     <param name="weight.field" value="" />
>>     <param name="commitSize" value="1000" />
>>     <param name="auth" value="false" />
>>     <param name="username" value="username" />
>>     <param name="password" value="password" />
>>   </parameters>
>>   <mapping>
>>     <copy>
>>     </copy>
>>     <rename></rename>
>>     <remove>
>>       <field source="segment" />
>>       <field source="host" />
>>       <field source="url" />
>>     </remove>
>>   </mapping>
>> </writer>
>> 
>> 
>> 
>> <writer id="indexer_solr_3"
>> class="org.apache.nutch.indexwriter.solr.SolrIndexWriter">
>>   <parameters>
>>     <param name="type" value="http" />
>>     <param name="url" value="http://localhost:8983/solr/yah"; />
>>     <param name="collection" value="" />
>>     <param name="weight.field" value="" />
>>     <param name="commitSize" value="1000" />
>>     <param name="auth" value="false" />
>>     <param name="username" value="username" />
>>     <param name="password" value="password" />
>>   </parameters>
>>   <mapping>
>>     <copy>
>>     </copy>
>>     <rename></rename>
>>     <remove>
>>       <field source="segment" />
>>       <field source="host" />
>>       <field source="url" />
>>     </remove>
>>   </mapping>
>> </writer>
>> 
>> ---------------------------------------------------------------------------------------------------------------
>> 
>> But it is not pushing data into corrosinding cores rather it is sending
>> data in one core from different domain, Please do let me know. I am sure
>> there has to be way to achieve it. I didnt try wth sobcollecion.xml. Do you
>> think I can achieve it using subcollection?

Reply via email to