SWF parser
----------
Key: TIKA-337
URL: https://issues.apache.org/jira/browse/TIKA-337
Project: Tika
Issue Type: New Feature
Components: parser
Reporter: Julien Nioche
Here is an initial implementation of a SWF Parser which uses JavaSWF and has
been adapted from A. Bialecki's implementation for Nutch.
The main differences with the implementation for Nutch is that we use the
latest version of JavaSWF and do not try to extract text from the actions or
structured URLs. As usual URLs can be obtained from the text extracted using
ParserPostProcessor.
JavaSWF has changed quite a bit since the Nutch integration and I wanted to
keep this initial port nice and simple. It should be possible to extract the
URLs from the actions using JavaSWF's API, I think this is what they did in
Heritrix.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.