These are git patches and work differently then we are used to at the ASF (a/ 
and b/ prefixes).
In Nutch' root, patch -p1 < patchfile or -p0 for the usual SVN based patches.

 
 
-----Original message-----
> From:Ralf R. Kotowski <[email protected]>
> Sent: Tuesday 5th November 2013 13:12
> To: [email protected]
> Subject: RE: Language identification
> 
> Thank you,
> 
> I'm still learning ow to patch nutch... not much luck so far...
> 
> -----Original Message-----
> From: ilhami Kalkan [mailto:[email protected]] 
> Sent: Tuesday, November 05, 2013 10:36 AM
> To: [email protected]
> Subject: Re: Language identification
> 
> Hi Ralf,
> 
> I patched language-filter plugin for filter or accept pages which 
> specified languages while parse phase.
> 
> NUTCH-1663 <https://issues.apache.org/jira/browse/NUTCH-1663>
> 
> 
> On 02-11-2013 22:05, Julien Nioche wrote:
> > Ralf,
> >
> > The parameter http.accept.language tells the servers you are hitting that
> > they should provide you the content in the languages you specified but
> that
> > does not give you any guarantees nor allows you to filter the content.
> Look
> > at the languageidentifier plugin as a starting point, then you could add a
> > custom mapreduce job to remove the pages which are not in the languages of
> > interest.
> >
> > HTH
> >
> > Julien
> >
> >
> >
> > On 2 November 2013 17:15, Ralf R. Kotowski <[email protected]> wrote:
> >
> >> Hi,
> >>
> >>
> >>
> >> What is the correct process to only store documents in a desired
> language?
> >>
> >>
> >>
> >> I'm currently doing this:
> >>
> >>
> >>
> >> <property>
> >> <name>http.accept.language</name>
> >> <value>ja-jp, en-us,en-gb,en;q=0.7,*;q=0.3</value>
> >> <description>Value of the "Accept-Language" request header field.
> >> This allows selecting non-English language as default one to retrieve.
> >> It is a useful setting for search engines build for certain national
> group.
> >> </description>
> >> </property>
> >>
> >>
> >>
> >> Using a seed.txt with URL's I know are in the language I want, but as the
> >> crawl grows it seems I'm starting to get more and more docs in other
> >> languages.
> >>
> >>
> >>
> >>
> >>
> >> Thnx in advance
> >>
> >>
> >
> 
> 
> 

Reply via email to