RE: HTTP REFERER is missing

2012-06-25 Thread Markus Jelsma
See Julien's comments on implementing a custom scoring filter. -Original message- > From:SebaZ > Sent: Mon 25-Jun-2012 11:29 > To: user@nutch.apache.org > Subject: RE: HTTP REFERER is missing > > How can I add the referrer to the outlinks? Can I do it on sche

RE: HTTP REFERER is missing

2012-06-25 Thread SebaZ
How can I add the referrer to the outlinks? Can I do it on schema or something? Or I have to hack crawling code too like you wrote about protocol plugin? Markus Jelsma-2 wrote > > What you can try is to add the referrer to outlinks when parsing records. > This outlink can be added to CrawlDatum'

Re: HTTP REFERER is missing

2012-06-22 Thread SebaZ
What code do you mean? As I wrote earlier I'm not good in JAVA programming, and just using Nutch as an application: install + use, not recode. Julien Nioche-4 wrote > >> > You can write a custom scoringfilter to track the URL of the source, >> see >> > the one in urlmeta for an example. It shoul

Re: HTTP REFERER is missing

2012-06-21 Thread Julien Nioche
> > You can write a custom scoringfilter to track the URL of the source, see > > the one in urlmeta for an example. It should be fairly straightforward to > > do > > > I'm not usind nutch index. Crawler is sending data to Solr. > ScoringFilters are used at pretty much every step of Nutch and are n

Re: HTTP REFERER is missing

2012-06-21 Thread SebaZ
Julien Nioche-4 wrote > >> > >> > Nutch cannot do this by default and is tricky to make because there may >> > not be one unique referrer per page. >> > >> I don't realy need unique referrer. All I want is to inform requested >> server >> on which URL crawler found the link. >> > > You can write

RE: HTTP REFERER is missing

2012-06-21 Thread SebaZ
e LinkDB will not populate internal links. > > > -Original message- >> From:SebaZ <sebastian.zaborowski@> >> Sent: Wed 20-Jun-2012 16:01 >> To: user@.apache >> Subject: RE: HTTP REFERER is missing >> >> >> Markus Jelsma-2 wrot

Re: HTTP REFERER is missing

2012-06-21 Thread Julien Nioche
> > > > Nutch cannot do this by default and is tricky to make because there may > > not be one unique referrer per page. > > > I don't realy need unique referrer. All I want is to inform requested > server > on which URL crawler found the link. > You can write a custom scoringfilter to track the U

RE: HTTP REFERER is missing

2012-06-20 Thread Markus Jelsma
> To: user@nutch.apache.org > Subject: RE: HTTP REFERER is missing > > > Markus Jelsma-2 wrote > > > > Nutch cannot do this by default and is tricky to make because there may > > not be one unique referrer per page. > > > I don't realy need unique ref

RE: HTTP REFERER is missing

2012-06-20 Thread SebaZ
Markus Jelsma-2 wrote > > Nutch cannot do this by default and is tricky to make because there may > not be one unique referrer per page. > I don't realy need unique referrer. All I want is to inform requested server on which URL crawler found the link. There is some site which admin informed me

RE: HTTP REFERER is missing

2012-06-06 Thread Markus Jelsma
Hi Nutch cannot do this by default and is tricky to make because there may not be one unique referrer per page. What you can try is to add the referrer to outlinks when parsing records. This outlink can be added to CrawlDatum's MetaData which you can then later use to set the referrer. To set t