RE: HTML parsing, script tags,

2017-06-28 Thread Jim Idle
Thanks Ken, that’s probably what I need. I was trying to find a Config class but it seems I need to use a different mapper as you say. Jim From: Ken Krugler [mailto:kkrugler_li...@transpac.com] Sent: Wednesday, June 28, 2017 23:06 To: user@tika.apache.org Subject: Re: HTML parsing, script tags,

Re: HTML parsing, script tags,

2017-06-28 Thread Ken Krugler
Hi Jim, > On Jun 28, 2017, at 12:07am, Jim Idle wrote: > > So right now it looks the HTML parser only sends through script tags if the > hay a src attribute. Is this likely to change or should I use another parser > for HTML? I could submit a patch for this of course. You can use a custom map

HTML parsing, script tags,

2017-06-28 Thread Jim Idle
So right now it looks the HTML parser only sends through script tags if the hay a src attribute. Is this likely to change or should I use another parser for HTML? I could submit a patch for this of course. Also, does anyone have an opinion if the underlying tag soup stuff is tolerant of HTML in