Re: improving Tika for web contents

2018-07-26 Thread Tim Allison
Y, we're waiting on dl4j so we have a week probably. On Thu, Jul 26, 2018 at 8:06 AM gbouchar wrote: > > Thank you very much, Tim! Do you think it will make it for the next release ? > > ‐‐‐ Original Message ‐‐‐ > Le 26 juillet 2018 1:58 PM, Tim Allison a écrit : > > > Y. Sorry. At beach

Re: improving Tika for web contents

2018-07-26 Thread Tim Allison
Y. Sorry. At beach last week. Took care of quick issues yesterday, will try to return to your PRs today. Thank you! On Thu, Jul 26, 2018 at 5:38 AM gbouchar wrote: > Greetings everyone! > > I have two pull requests related to the use of tika for web contents that > have been waiting for quite so

improving Tika for web contents

2018-07-26 Thread gbouchar
Greetings everyone! I have two pull requests related to the use of tika for web contents that have been waiting for quite some time now. - [Improving html charset detection](https://github.com/apache/tika/pull/242) : None of the current charset detectors in tika respect the web standards, and i