https://issues.apache.org/jira/browse/NUTCH-1115

On Friday 16 September 2011 13:55:51 Markus Jelsma wrote:
> Anyone?
> 
> I see better results with fixing embedded params disabled? No more crap
> URL's and i've never actually seen embedded params used in real life.
> 
> On Tuesday 13 September 2011 13:53:40 Markus Jelsma wrote:
> > Hi,
> > 
> > Another complaint on Nutch' handling of outlinks. Since NUTCH-436 there
> > is better support for embedded segment parameters. This exotic feature,
> > however, causes a lot of invalid outlinks to be generated.
> > 
> > For some reason (most likely bad webmasters like my other thread) i see a
> > lot of URL's with embedded params that actually are not meant to be
> > embedded params such as:
> > 
> > http://<HOST>.nl/webwinkel-tips.html;-plezier/55802-speelspiraal-van-baby
> > - butt.html anchor: TIPS
> > 
> > I would propose an option to disable the fixing of embedded params in
> > DomContentUtils.
> > 
> > Thoughts?
> > 
> > Thanks,

-- 
Markus Jelsma - CTO - Openindex
http://www.linkedin.com/in/markus17
050-8536620 / 06-50258350

Reply via email to