https://issues.apache.org/jira/browse/NUTCH-1115
On Friday 16 September 2011 13:55:51 Markus Jelsma wrote: > Anyone? > > I see better results with fixing embedded params disabled? No more crap > URL's and i've never actually seen embedded params used in real life. > > On Tuesday 13 September 2011 13:53:40 Markus Jelsma wrote: > > Hi, > > > > Another complaint on Nutch' handling of outlinks. Since NUTCH-436 there > > is better support for embedded segment parameters. This exotic feature, > > however, causes a lot of invalid outlinks to be generated. > > > > For some reason (most likely bad webmasters like my other thread) i see a > > lot of URL's with embedded params that actually are not meant to be > > embedded params such as: > > > > http://<HOST>.nl/webwinkel-tips.html;-plezier/55802-speelspiraal-van-baby > > - butt.html anchor: TIPS > > > > I would propose an option to disable the fixing of embedded params in > > DomContentUtils. > > > > Thoughts? > > > > Thanks, -- Markus Jelsma - CTO - Openindex http://www.linkedin.com/in/markus17 050-8536620 / 06-50258350

