See Julien's comments on implementing a custom scoring filter.
-Original message-
> From:SebaZ
> Sent: Mon 25-Jun-2012 11:29
> To: user@nutch.apache.org
> Subject: RE: HTTP REFERER is missing
>
> How can I add the referrer to the outlinks? Can I do it on sche
How can I add the referrer to the outlinks? Can I do it on schema or
something? Or I have to hack crawling code too like you wrote about protocol
plugin?
Markus Jelsma-2 wrote
>
> What you can try is to add the referrer to outlinks when parsing records.
> This outlink can be added to CrawlDatum'
What code do you mean? As I wrote earlier I'm not good in JAVA programming,
and just using Nutch as an application: install + use, not recode.
Julien Nioche-4 wrote
>
>> > You can write a custom scoringfilter to track the URL of the source,
>> see
>> > the one in urlmeta for an example. It shoul
> > You can write a custom scoringfilter to track the URL of the source, see
> > the one in urlmeta for an example. It should be fairly straightforward to
> > do
> >
> I'm not usind nutch index. Crawler is sending data to Solr.
>
ScoringFilters are used at pretty much every step of Nutch and are n
Julien Nioche-4 wrote
>
>> >
>> > Nutch cannot do this by default and is tricky to make because there may
>> > not be one unique referrer per page.
>> >
>> I don't realy need unique referrer. All I want is to inform requested
>> server
>> on which URL crawler found the link.
>>
>
> You can write
e LinkDB will not populate internal links.
>
>
> -Original message-
>> From:SebaZ <sebastian.zaborowski@>
>> Sent: Wed 20-Jun-2012 16:01
>> To: user@.apache
>> Subject: RE: HTTP REFERER is missing
>>
>>
>> Markus Jelsma-2 wrot
> >
> > Nutch cannot do this by default and is tricky to make because there may
> > not be one unique referrer per page.
> >
> I don't realy need unique referrer. All I want is to inform requested
> server
> on which URL crawler found the link.
>
You can write a custom scoringfilter to track the U
> To: user@nutch.apache.org
> Subject: RE: HTTP REFERER is missing
>
>
> Markus Jelsma-2 wrote
> >
> > Nutch cannot do this by default and is tricky to make because there may
> > not be one unique referrer per page.
> >
> I don't realy need unique ref
Markus Jelsma-2 wrote
>
> Nutch cannot do this by default and is tricky to make because there may
> not be one unique referrer per page.
>
I don't realy need unique referrer. All I want is to inform requested server
on which URL crawler found the link.
There is some site which admin informed me
Hi
Nutch cannot do this by default and is tricky to make because there may not be
one unique referrer per page. What you can try is to add the referrer to
outlinks when parsing records. This outlink can be added to CrawlDatum's
MetaData which you can then later use to set the referrer. To set t
10 matches
Mail list logo