Yes, If a value is changed in the seed.txt file, will the new values be
used when the page is re-crawled/fetched?

Sorry for being vague.

jeff

On Fri, Apr 3, 2015 at 12:19 PM, Jonathan Cooper-Ellis <
[email protected]> wrote:

> If a value is changed in seed.txt?
>
> On Fri, Apr 3, 2015 at 12:44 PM, Jeff Cocking <[email protected]>
> wrote:
>
> > I figured i might have to inject a csv blob and manually explode as a
> > custom filter.
> >
> > As a second question, If a value is changed, will the new value propagate
> > with normal fetching cycle?
> >
> > thank you.
> >
> > jeff
> >
> > On Thu, Apr 2, 2015 at 5:06 PM, Jonathan Cooper-Ellis <
> > [email protected]> wrote:
> >
> > > Hi Jeff,
> > >
> > > Off the top of my head, the best way to do that might be to inject the
> > > filters as a CSV blob, and write (or modify) an indexing filter to
> split
> > up
> > > the blob and index them as separate values to a "multiValued" field in
> > > Solr.
> > >
> > > On Thu, Apr 2, 2015 at 3:07 PM, Jeff Cocking <[email protected]>
> > > wrote:
> > >
> > > > Jonathan et al
> > > >
> > > > Thank you for the reply.  I used this approach and it is working with
> > one
> > > > minor issue. It is the "one to many" requirement for each group. The
> > > intent
> > > > is to use a filter query within solr on the group data element.  I
> have
> > > > tried the following:
> > > >
> > > > group="filter1,filter2"
> > > > group="filter1","filter2"
> > > > group=filter1,filter2
> > > > group=filter1 filter2
> > > > group=filter1   group=filter2
> > > >
> > > > Each of these choices create a single variable assigned to group. Do
> > you
> > > > have any suggestions on how to format the seed.txt file to support
> the
> > > "one
> > > > to many" option? i.e. that each filter value can be used as a filter
> > > query
> > > > element within solr?
> > > >
> > > >
> > > > For those who find this thread searching for a similar solution, here
> > is
> > > > how to implement urlmeta:
> > > >
> > > > 1. Turn on the plugin by adding urlmeta in the plugin.includes
> property
> > > > within nutch-site.xml. urlmeta is a standalone item within plugin
> > value:
> > > >  ....|index-(basic|anchor|metadata)|urlmeta|indexer-solr|....
> > > > 2. Add the urlmeta.tags property to the nutch-site.xml file. Add the
> > > > keywords you want to use as values.
> > > > <property>
> > > >   <name>urlmeta.tags</name>
> > > >   <value>group1,group2</value>
> > > > </property>
> > > > 3. In your seed.txt file add the tag values for the urls as needed.
> > make
> > > > sure they are tab delimited.
> > > >    http://www.domain1.com   /tgroup1=foo   /tgroup2=bar
> > > >    http://www.domain2.com   /tgroup1=faa   /tgroup2=bur
> > > >
> > > >
> > > >
> > > > On Thu, Apr 2, 2015 at 9:36 AM, Jonathan Cooper-Ellis <
> > > > [email protected]> wrote:
> > > >
> > > > > Hey Jeff,
> > > > >
> > > > > Check out the urlmeta plugin. You can inject metadata in with your
> > seed
> > > > > list and propagate it to outlinks.
> > > > >
> > > > > On Thu, Apr 2, 2015 at 10:09 AM, Jeff Cocking <
> > [email protected]>
> > > > > wrote:
> > > > >
> > > > > > Environment:  Nutch 1.9, Solr 5.0
> > > > > >
> > > > > > I am trying to define a group (category) of websites. Each
> website
> > > will
> > > > > > have assigned group (1 to many). The assignment is known before
> the
> > > > > > creation of seed.txt file.  All pages within the website should
> > > inherit
> > > > > the
> > > > > > assigned group(s). The assigned group(s) need to be passed to
> Solr
> > > for
> > > > > > faceted search.
> > > > > >
> > > > > > For example:
> > > > > > www.site1.com group1, group2 group3
> > > > > > All pages within www.site1.com inherit group1, group2, group3
> > > > > >
> > > > > > www.site2.com group2, group4, group5
> > > > > > All pages within www.site2.com inherit group2, group4, group5
> > > > > >
> > > > > > Thoughts on ways to accomplish this?
> > > > > >
> > > > > > Thank you in advance.
> > > > > >
> > > > > > jeff
> > > > > >
> > > > >
> > > > >
> > > > >
> > > > > --
> > > > > Jonathan Cooper-Ellis
> > > > > Field Enablement Engineer
> > > > > <http://www.cloudera.com>
> > > > >
> > > >
> > >
> > >
> > >
> > > --
> > > Jonathan Cooper-Ellis
> > > Field Enablement Engineer
> > > <http://www.cloudera.com>
> > >
> >
>
>
>
> --
> Jonathan Cooper-Ellis
> Field Enablement Engineer
> <http://www.cloudera.com>
>

Reply via email to