Hi,
Is this currently possible with Tika 0.9 in Nutch branch 1.4? I would have
thought that this would have been dealt with in Tika, however I have seen no
mention of anyone having problems extracting this from web documents when
fetching with Nutch or even discussing it.
For example say I had
You simply need to write a HTMLParser, they receive the DOM representation
of the page from parse-tika (or parse-html). See JIRA for the entry on the
metatag parser for an example and discussion. There is usually no need to
modify parse-html or tika at all
Julien
On 17 July 2011 16:23, lewis
2 matches
Mail list logo