Hi Ralph, I haven't tested the PPT extractor with any other languages. I remember reading about other people having problems with different character sets though.
Could you send a before and after example file here or to bugzilla? -Ryan Rhodes -----Original Message----- From: Ralph Scheuer [mailto:[EMAIL PROTECTED] Sent: Wednesday, July 28, 2004 10:01 AM To: slide Subject: MSPowerPointExtractor problem Hello everybody, When I was searching for a Java class to extract text from PowerPoint files, I accidentally discovered Slide. I pulled the MSPowerPointExtractor class and some other stuff it depends on via CVS and tried it for some text extraction. The method I used looks very similar to the provided example main method (see below). However. when I tried to extract text from a German PowerPoint presentation, I had some problems with the encoding. I did not know which encoding to use, converting the output to ISO Latin 1 with my text editor solved only part of the problem (some German Umlaute were displayed correctly, some were not). Is this a known issue or am I doing something wrong? Any hints for me? Thanks in advance. Ralph Scheuer BTW. I am using Mac OS X 10.3.4 with JDK 1.4.2_03, the native encoding on this platform is MacRoman. public static String contentStringForData(NSData data){ StringBuffer buf = new StringBuffer(); try{ ByteArrayInputStream input = data.stream(); MSPowerPointExtractor ex = new MSPowerPointExtractor(null, null); Reader reader = ex.extract(input); int c; do { c = reader.read(); buf.append((char)c); } while( c != -1 ); }catch(Exception e){ } return buf.toString(); } --------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]