Thanks Sent from my iPhone
> On Dec 27, 2019, at 7:58 PM, Richard Lavin <[email protected]> wrote: > > Thanks Rick > > Sent from my iPhone > >> On Dec 26, 2019, at 2:30 AM, Zara Parst <[email protected]> wrote: >> >> Hi, Is it possible to crawl three different website like >> >> 1. https://www.urgenthomework.com/ >> 2. https://www.myassignmenthelp.net/ >> 3. https://www.assignmenthelp.net/ >> >> in single nutch configuration and then send the respective index pages to >> corrosponding cores [ uah, mah , yah] in solr. I tried to acheieve it by >> exchange and writer id. Please look below for my confirgurations >> >> -------------exchange.xml--------------------------------- >> >> >> >> >> >> >> >> *<exchange id="uahIndexernew" class="default"> <writers> <writer >> id="indexer_solr_1" /> </writers> <params> <param name="expr" >> value="doc.getFieldValue('host')=='urgenthomework.com >> <http://urgenthomework.com>'" /> </params> </exchange>* >> >> >> >> >> >> >> >> >> *<exchange id="mahIndexernew" class="default"> <writers> <writer >> id="indexer_solr_2" /> </writers> <params> <param name="expr" >> value="doc.getFieldValue('host')=='myassignmenthelp.net >> <http://myassignmenthelp.net>'" /> </params> </exchange>* >> >> >> >> >> >> >> >> >> >> >> * <exchange id="yahIndexernew" class="default"> <writers> <writer >> id="indexer_solr_3" /> </writers> <params> <param name="expr" >> value="doc.getFieldValue('host')=='assignmenthelp.net >> <http://assignmenthelp.net>'" /> </params> </exchange>* >> >> >> >> ---------------------------------index.writers.xml---------------------------------------- >> >> <writer id="indexer_solr_1" >> class="org.apache.nutch.indexwriter.solr.SolrIndexWriter"> >> <parameters> >> <param name="type" value="http" /> >> <param name="url" value="http://localhost:8983/solr/uah" /> >> <param name="collection" value="" /> >> <param name="weight.field" value="" /> >> <param name="commitSize" value="1000" /> >> <param name="auth" value="false" /> >> <param name="username" value="username" /> >> <param name="password" value="password" /> >> </parameters> >> <mapping> >> <copy> >> <!-- <field source="title" dest="content" /> >> <field source="metatag.description" dest="content" /> >> <field source="metatag.keywords" dest="content" /> --> >> </copy> >> <rename></rename> >> <remove> >> <field source="segment" /> >> <field source="host" /> >> <field source="url" /> >> <!-- <field source="metatag.description" /> >> <field source="metatag.keywords" /> >> <field source="date" /> >> <field source="url" /> >> --> >> </remove> >> </mapping> >> </writer> >> >> >> <writer id="indexer_solr_2" >> class="org.apache.nutch.indexwriter.solr.SolrIndexWriter"> >> <parameters> >> <param name="type" value="http" /> >> <param name="url" value="http://localhost:8983/solr/mah" /> >> <param name="collection" value="" /> >> <param name="weight.field" value="" /> >> <param name="commitSize" value="1000" /> >> <param name="auth" value="false" /> >> <param name="username" value="username" /> >> <param name="password" value="password" /> >> </parameters> >> <mapping> >> <copy> >> </copy> >> <rename></rename> >> <remove> >> <field source="segment" /> >> <field source="host" /> >> <field source="url" /> >> </remove> >> </mapping> >> </writer> >> >> >> >> <writer id="indexer_solr_3" >> class="org.apache.nutch.indexwriter.solr.SolrIndexWriter"> >> <parameters> >> <param name="type" value="http" /> >> <param name="url" value="http://localhost:8983/solr/yah" /> >> <param name="collection" value="" /> >> <param name="weight.field" value="" /> >> <param name="commitSize" value="1000" /> >> <param name="auth" value="false" /> >> <param name="username" value="username" /> >> <param name="password" value="password" /> >> </parameters> >> <mapping> >> <copy> >> </copy> >> <rename></rename> >> <remove> >> <field source="segment" /> >> <field source="host" /> >> <field source="url" /> >> </remove> >> </mapping> >> </writer> >> >> --------------------------------------------------------------------------------------------------------------- >> >> But it is not pushing data into corrosinding cores rather it is sending >> data in one core from different domain, Please do let me know. I am sure >> there has to be way to achieve it. I didnt try wth sobcollecion.xml. Do you >> think I can achieve it using subcollection?

