Hi everyone, We're using a Solr environment and will soon be utilizing Lucene as well. The bulk of our data is xml and images, but we do have a small percentage of data in other formats such as VSD. Our test suite contains roughly 100 filetypes (xml, pdf, word, vsd, etc). Thus far, we've successfully indexed 200 VSD files but I came across one that just appears to hang when OfficeParser.parse is called (An exception is set to be caught and logged, but I don't seem to be getting one...still checking into that).
The file opens fine is Visio. I've tried both tika 0.9 and 1.0. Is this the proper method to parse a .vsd file? Or do you have other suggestions? TikaConfig tc = TikaConfig.getDefaultConfig(); ParseContext context = new ParseContext(); Metadata metadata = new Metadata(); ContentHandler handler = new WriteOutContentHandler(10*1024*1024); InputStream fis = new URL(url.toString()).openStream(); OfficeParser officeParser = new OfficeParser(); officeParser.parse(fis, handler, metadata, context); // hangs here Thanks for any information you can provide! -Kristian