Re: Reading PDF - a cry for help

Dar Scott Thu, 29 Sep 2011 09:51:18 -0700

On Sep 29, 2011, at 9:24 AM, Ken Ray wrote:
> Are you looking at just extracting the images? Or other relevant parts of the 
> PDF? The reason I ask is that it looks like binary data is always contained 
> between two lines: "stream" and "endstream", so extracting just the streaming 
> data should be pretty quick to do; although the next step would be going to 
> read the bytes of what was extracted and then determine if it's an image or 
> some other thing that had to be represented with a "stream" in the PDF...



There are a couple issues that complicate this in general.  

The parameters needed to process the stream need to be parsed and they can be 
far away.  

There are many stream filters (some complicated compression) and they can be 
nested.  I looked at a corpus of PDF files and, yeah, a several are used in 
practice.

However, if one needs to parse the output of a specific program or a specific 
model of a scanner, then the work to do parsing in LiveCode is a lot less.

I hope that makes sense; I'm a little under the weather today.

Dar



_______________________________________________
use-livecode mailing list
use-livecode@lists.runrev.com
Please visit this url to subscribe, unsubscribe and manage your subscription 
preferences:
http://lists.runrev.com/mailman/listinfo/use-livecode

Re: Reading PDF - a cry for help

Reply via email to