Error on parsing Visio file

Van Tassell, Kristian Wed, 15 Feb 2012 05:23:33 -0800

Hi everyone,

We're using a Solr environment and will soon be utilizing Lucene as well. The 
bulk of our data is xml and images, but we do have a small percentage of data 
in other formats such as VSD. Our test suite contains roughly 100 filetypes 
(xml, pdf, word, vsd, etc). Thus far, we've successfully indexed 200 VSD files 
but I came across one that just appears to hang when OfficeParser.parse is 
called (An exception is set to be caught and logged, but I don't seem to be 
getting one...still checking into that).


The file opens fine is Visio. I've tried both tika 0.9 and 1.0.

Is this the proper method to parse a .vsd file? Or do you have other 
suggestions?

TikaConfig tc = TikaConfig.getDefaultConfig();
ParseContext context = new ParseContext();
Metadata metadata = new Metadata();
ContentHandler handler = new WriteOutContentHandler(10*1024*1024);

InputStream fis = new URL(url.toString()).openStream();

OfficeParser officeParser = new OfficeParser();
officeParser.parse(fis, handler, metadata, context); // hangs here


Thanks for any information you can provide!
-Kristian

Error on parsing Visio file

Reply via email to