MSPowerPointExtractor problem

Ralph Scheuer Wed, 28 Jul 2004 07:02:02 -0700

Hello everybody,

When I was searching for a Java class to extract text from PowerPoint files, I accidentally discovered Slide.

I pulled the MSPowerPointExtractor class and some other stuff it depends on via CVS and tried it for some text extraction.

The method I used looks very similar to the provided example main method (see below).

However. when I tried to extract text from a German PowerPoint presentation, I had some problems with the encoding. I did not know which encoding to use, converting the output to ISO Latin 1 with my text editor solved only part of the problem (some German Umlaute were displayed correctly, some were not).

Is this a known issue or am I doing something wrong? Any hints for me?

Thanks in advance.

Ralph Scheuer

BTW. I am using Mac OS X 10.3.4 with JDK 1.4.2_03, the native encoding on this platform is MacRoman.


    public static String contentStringForData(NSData data){
        
        StringBuffer buf = new StringBuffer();
        try{
            ByteArrayInputStream input = data.stream();
            MSPowerPointExtractor ex = new MSPowerPointExtractor(null, null);
        
            Reader reader = ex.extract(input);
        
            int c;
            do
                {
                    c = reader.read();
                
                    buf.append((char)c);
                }
            while( c != -1 );
        }catch(Exception e){
        
        }
        
        return buf.toString();
    }

MSPowerPointExtractor problem

Reply via email to