Hi Nick, thanks for your response! I didn't use POIFSViewer but I know (now) the structure of my PDF Ole Object. Unfortunately this isn't enough ...
Here is what I did: First of all I created a Word2003 xml file with Word and imported a pdf file. The PDF is recognized as a package (not as a pdf file) as there wasn't a program to handle pdf files on that computer. These are the important parts: <w:docOleData> <w:binData w:name="oledata.mso"> 0M8R4KGxGuEAAAAAAAAAAAAAAAAAAAAAPgADAP7/ ... </w:binData></w:docOleData> <o:OLEObject Type="Embed" ProgID="Package" ShapeID="_x0000_i1025" DrawAspect="Content" ObjectID="_1277043057"/> In the word xml file the ole object is base64 encoded. I decoded it and wrote a binary file (OleObject.bin) that I inspected (first with 7-zip, later with POIFS) The structure of OleObject.bin is the following + Root entry ++ _1277043057 +++[3]OleObjectInfo +++[1]Ole10Native +++[1]Ole +++[1]CompObj Ole10Native represents my pdf with a custom header that word attached. To get to this content I had to: 1. Create a POIFSFilesSystem based on OleObject.bin 2. Get the Entry "_1277043057" and write it to the hard disk (as "_1277043057"). 3. Strip the first 4 Bytes of "_1277043057" 4. Use the inflate algortithm to decompress it as "_1277043057_decompressed" 5. Create a POIFSFileSystem again based on the decompressed "_1277043057_decompressed") 6. Write the contents listed above to the hard disk. ==>I could then open my PDF file. So far, so good. Now I tried it vice versa. After packaging the content again and tried to open the file in Word, Word complained that it can't open the file because "The server application, the source file, or the element wasn't found" (this is only a translation) The I was looking for the step that that fails. Steps 1 to 4 worked also in the other direction but creating "_1277043057_decompressed" seemed not to work. When I compared the to original "_1277043057_decompressed" to the generated one there are many similarities (file size and most of the content). But in first part of the file original there is more information. I had a look at it in a text editor. The information is some kind of metadata: 1. The alphabet 2. The structure of the ole object. "R.o.o.t. .E.n.t.r.y .... O.l.e. ... C.o.m.p.O.b.j...." 3. The kind of ole object "P.a.c.k.a.g.e" Does anyone know how I get this information into my file? Cheers, Helmut P. S. The reverse enineering is based on this excellent article http://www.trustedsource.org/download/research_publications/CAlme_VBOct06.pdf ---- -------- Original-Nachricht -------- > Datum: Thu, 24 Jul 2008 11:42:10 +0100 (BST) > Von: Nick Burch <[EMAIL PROTECTED]> > An: POI Users List <[email protected]> > Betreff: Re: Can POIFS convert PDF to OLE > On Thu, 24 Jul 2008, Helmut Ziegler wrote: > > Actually the Word document should also carry other documents like other > > word files. > > I'd suggest dumping out the stream(s), and looking at them with things > like org.apache.poi.poifs.dev.POIFSViewer > > Start by seeing if you can change on bit of one file in the poifs stream, > and have the change noticed. If that works, but adding a new poifs stream > doesn't, then there are extra things in the poifs stream that need to be > set up. I think you're probably going to need to run diff quite a bit, > across two files (one that works, one that doesn't) and see what's > different > > Nick > > --------------------------------------------------------------------- > To unsubscribe, e-mail: [EMAIL PROTECTED] > For additional commands, e-mail: [EMAIL PROTECTED] -- Ist Ihr Browser Vista-kompatibel? Jetzt die neuesten Browser-Versionen downloaden: http://www.gmx.net/de/go/browser --------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
