Anyone with the same problem or some experience with this?
jonycus wrote:
>
> Hi all,
>
> I am trying to extract a whole .doc document and have managed to do great
> with text, tables and bullets, but I remain stuck regarding the images.
> AFAIK the images in the MSWord file are stored as .emz, which is a gzip-ed
> emf file. This is my code:
>
>
> List picList = picTable.getAllPictures();
> Picture picture = (Picture) picList.get(picC);
> String folderPath = PATH;
> String emzPath = folderPath+picture.suggestFullFileName()+".emz";
> OutputStream image = new FileOutputStream(emzPath);
> picture.writeImageContent(image);
> image.close();
> InputStream is = new FileInputStream(new File(emzPath));
> GZIPInputStream gzipis = new GZIPInputStream(is);
> OutputStream emfos = new FileOutputStream(new
> File(folderPath+picture.suggestFullFileName()+".emf"));
> byte[] buf = new byte[1024];
> int len;
> while ((len = gzipis.read(buf)) > 0) {
> emfos.write(buf, 0, len);
> }
> gzipis.close();
> emfos.close();
>
> This should do the extraction of the emf image file from the emz. However
> my
> code fails to do so because the gzipis (the supposed gzip InputStream) is
> not a gzip at all! It seems that the extracted image is not an emz file. I
> tried another approach, to save the word file as HTML (which stores the
> images in a separate folder) and I got the images as .emz and gif. Now the
> size of the .emz file from that extraction and my extraction defer in
> bytes,
> meaning that the extraction is done wrong? I have been able to open the
> .emz
> file from the HTML extraction with gzip, but not my extracted file,
> getting
> an not good gzip file?
>
> Any help with this?
>
> Best regards,
> Vasko
>
>
--
View this message in context:
http://old.nabble.com/HWPF-image-extraction-problem-tp26300123p26498551.html
Sent from the POI - User mailing list archive at Nabble.com.
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]