Jonathan et al Thank you for the reply. I used this approach and it is working with one minor issue. It is the "one to many" requirement for each group. The intent is to use a filter query within solr on the group data element. I have tried the following:
group="filter1,filter2" group="filter1","filter2" group=filter1,filter2 group=filter1 filter2 group=filter1 group=filter2 Each of these choices create a single variable assigned to group. Do you have any suggestions on how to format the seed.txt file to support the "one to many" option? i.e. that each filter value can be used as a filter query element within solr? For those who find this thread searching for a similar solution, here is how to implement urlmeta: 1. Turn on the plugin by adding urlmeta in the plugin.includes property within nutch-site.xml. urlmeta is a standalone item within plugin value: ....|index-(basic|anchor|metadata)|urlmeta|indexer-solr|.... 2. Add the urlmeta.tags property to the nutch-site.xml file. Add the keywords you want to use as values. <property> <name>urlmeta.tags</name> <value>group1,group2</value> </property> 3. In your seed.txt file add the tag values for the urls as needed. make sure they are tab delimited. http://www.domain1.com /tgroup1=foo /tgroup2=bar http://www.domain2.com /tgroup1=faa /tgroup2=bur On Thu, Apr 2, 2015 at 9:36 AM, Jonathan Cooper-Ellis < [email protected]> wrote: > Hey Jeff, > > Check out the urlmeta plugin. You can inject metadata in with your seed > list and propagate it to outlinks. > > On Thu, Apr 2, 2015 at 10:09 AM, Jeff Cocking <[email protected]> > wrote: > > > Environment: Nutch 1.9, Solr 5.0 > > > > I am trying to define a group (category) of websites. Each website will > > have assigned group (1 to many). The assignment is known before the > > creation of seed.txt file. All pages within the website should inherit > the > > assigned group(s). The assigned group(s) need to be passed to Solr for > > faceted search. > > > > For example: > > www.site1.com group1, group2 group3 > > All pages within www.site1.com inherit group1, group2, group3 > > > > www.site2.com group2, group4, group5 > > All pages within www.site2.com inherit group2, group4, group5 > > > > Thoughts on ways to accomplish this? > > > > Thank you in advance. > > > > jeff > > > > > > -- > Jonathan Cooper-Ellis > Field Enablement Engineer > <http://www.cloudera.com> >

