Hello,
I saved 2 .eml files saved by my Thunderbird, and one of them contained
plain text content, whereas other one rich HTML content.
The plain text one got recognized by Tika as "message/rfc822" file, but
the other one incorrectly as "text/html" (and textual content being
incorrectly extr
Also take a look at Scrapy and the work that Hyperion
Grey is doing with Splash and Avatar/HH.
Cheers,
Chris
—
Chris Mattmann
chris.mattm...@gmail.com
-Original Message-
From: Ken Krugler
Reply-To:
Date: Thursday, November 12, 2015 at 10:58 AM
To:
Subject: RE: Extraction table fr
There's no (semi)automated method.
For simple tables you could create a custom ContentHandler that triggers of
appropriate HTML tags.
But a general purpose extractor is a serious technical challenge.
Companies like Factual have invested heavily in being able to find & extract
this type of stru
Hi
Is there a way to extract tables from a HTML document using Tika?
thanks!
Benjamin