Jonathan et al

Thank you for the reply.  I used this approach and it is working with one
minor issue. It is the "one to many" requirement for each group. The intent
is to use a filter query within solr on the group data element.  I have
tried the following:

group="filter1,filter2"
group="filter1","filter2"
group=filter1,filter2
group=filter1 filter2
group=filter1   group=filter2

Each of these choices create a single variable assigned to group. Do you
have any suggestions on how to format the seed.txt file to support the "one
to many" option? i.e. that each filter value can be used as a filter query
element within solr?


For those who find this thread searching for a similar solution, here is
how to implement urlmeta:

1. Turn on the plugin by adding urlmeta in the plugin.includes property
within nutch-site.xml. urlmeta is a standalone item within plugin value:
 ....|index-(basic|anchor|metadata)|urlmeta|indexer-solr|....
2. Add the urlmeta.tags property to the nutch-site.xml file. Add the
keywords you want to use as values.
<property>
  <name>urlmeta.tags</name>
  <value>group1,group2</value>
</property>
3. In your seed.txt file add the tag values for the urls as needed. make
sure they are tab delimited.
   http://www.domain1.com   /tgroup1=foo   /tgroup2=bar
   http://www.domain2.com   /tgroup1=faa   /tgroup2=bur



On Thu, Apr 2, 2015 at 9:36 AM, Jonathan Cooper-Ellis <
[email protected]> wrote:

> Hey Jeff,
>
> Check out the urlmeta plugin. You can inject metadata in with your seed
> list and propagate it to outlinks.
>
> On Thu, Apr 2, 2015 at 10:09 AM, Jeff Cocking <[email protected]>
> wrote:
>
> > Environment:  Nutch 1.9, Solr 5.0
> >
> > I am trying to define a group (category) of websites. Each website will
> > have assigned group (1 to many). The assignment is known before the
> > creation of seed.txt file.  All pages within the website should inherit
> the
> > assigned group(s). The assigned group(s) need to be passed to Solr for
> > faceted search.
> >
> > For example:
> > www.site1.com group1, group2 group3
> > All pages within www.site1.com inherit group1, group2, group3
> >
> > www.site2.com group2, group4, group5
> > All pages within www.site2.com inherit group2, group4, group5
> >
> > Thoughts on ways to accomplish this?
> >
> > Thank you in advance.
> >
> > jeff
> >
>
>
>
> --
> Jonathan Cooper-Ellis
> Field Enablement Engineer
> <http://www.cloudera.com>
>

Reply via email to