Hi, 

Another complaint on Nutch' handling of outlinks. Since NUTCH-436 there is 
better support for embedded segment parameters. This exotic feature, however, 
causes a lot of invalid outlinks to be generated.

For some reason (most likely bad webmasters like my other thread) i see a lot 
of URL's with embedded params that actually are not meant to be embedded 
params such as:

http://<HOST>.nl/webwinkel-tips.html;-plezier/55802-speelspiraal-van-baby-
butt.html anchor: TIPS

I would propose an option to disable the fixing of embedded params in 
DomContentUtils. 

Thoughts?

Thanks,

-- 
Markus Jelsma - CTO - Openindex
http://www.linkedin.com/in/markus17
050-8536620 / 06-50258350

Reply via email to