Anyone? I see better results with fixing embedded params disabled? No more crap URL's and i've never actually seen embedded params used in real life.
On Tuesday 13 September 2011 13:53:40 Markus Jelsma wrote: > Hi, > > Another complaint on Nutch' handling of outlinks. Since NUTCH-436 there is > better support for embedded segment parameters. This exotic feature, > however, causes a lot of invalid outlinks to be generated. > > For some reason (most likely bad webmasters like my other thread) i see a > lot of URL's with embedded params that actually are not meant to be > embedded params such as: > > http://<HOST>.nl/webwinkel-tips.html;-plezier/55802-speelspiraal-van-baby- > butt.html anchor: TIPS > > I would propose an option to disable the fixing of embedded params in > DomContentUtils. > > Thoughts? > > Thanks, -- Markus Jelsma - CTO - Openindex http://www.linkedin.com/in/markus17 050-8536620 / 06-50258350

