Hi, Another complaint on Nutch' handling of outlinks. Since NUTCH-436 there is better support for embedded segment parameters. This exotic feature, however, causes a lot of invalid outlinks to be generated.
For some reason (most likely bad webmasters like my other thread) i see a lot of URL's with embedded params that actually are not meant to be embedded params such as: http://<HOST>.nl/webwinkel-tips.html;-plezier/55802-speelspiraal-van-baby- butt.html anchor: TIPS I would propose an option to disable the fixing of embedded params in DomContentUtils. Thoughts? Thanks, -- Markus Jelsma - CTO - Openindex http://www.linkedin.com/in/markus17 050-8536620 / 06-50258350

