PDF contributor : Swapna KM, an Employee of MphasiS Software Services What is PDF? The Portable Document Format (PDF) is the file format created by Adobe Systems in 1993 for document exchange. PDF is fixed-layout document format used for representing two-dimensional documents in a manner independent of the application software, hardware, and operating system. Each PDF file encapsulates a complete description of a 2-D document (and, with Acrobat 3-D, embedded 3-D documents) that includes the text, fonts, images, and 2-D vector graphics that compose the document. PDF is an open standard, and recently took a major step towards becoming ISO 32000. Why PDF? 1. Multiplatform - Viewable and printable on any platform Macintosh, Microsoft® Windows®, UNIX®, and many mobile platforms. 2. Extensible - More than 1,800 vendors worldwide offer PDF-based solutions including creation, plug-in, consulting, training, and support tools. 3. Trusted and reliable - More than 200 million PDF documents on the web today serve as evidence of the number of organizations that rely on Adobe PDF to capture information. 4. Maintain information integrity - Adobe PDF files look exactly like original documents and preserve source file information text, drawings, 3D, full-color graphics, photos, and even business logic regardless of the application used to create them. 5. Keep information secure - Digitally sign or password-protect Adobe PDF documents created with Adobe Acrobat® 8 or Adobe LiveCycle software. 6. Searchable - Leverage full-text search features to locate words, bookmarks, and data fields in documents.
7. Accessible - Adobe PDF documents work with assistive technology to help make information accessible to people with disabilities. Technical Overview: File structure:- A PDF file consists primarily of objects, of which there are eight types: Boolean values (representing true or false), Numbers, Strings, Names, Arrays (ordered collections of objects), Dictionaries (collections of objects indexed by Names), Streams (usually containing large amounts of data) and The null object. Objects may be either direct (embedded in another object) or indirect. Indirect objects are numbered with an object number and a generation number. An index table called the xref table gives the byte offset of each indirect object from the start of the file. This design allows for efficient random access to the objects in the file, and also allows for small changes to be made without rewriting the entire file (incremental update). Beginning with PDF version 1.5, indirect objects may also be located in special streams known as object streams. This technique reduces the size of files that have large numbers of small indirect objects and is especially useful for Tagged PDF. There are two layouts to the PDF filesnon-linear (not optimized) and linear (optimized). Non-linear PDF files consume less disk space than their linear counterparts, though they are slower to access because portions of the data required to assemble pages of the document are scattered throughout the PDF file. Linear PDF files (also called optimized or web optimized PDF files) are constructed in a manner which enables them to be read in a Web browser plugin since they are written to disk in a linear (as in page order) fashion. PDF files may be optimized using Adobe Acrobat software or pdfopt, which is part of GPL Ghostscript. Imaging model:- The basic design of how graphics are represented in PDF is very similar to that of PostScript, except for the use of transparency. PDF graphics use a device independent Cartesian coordinate system to describe the surface of a page. A PDF page description can use a matrix to scale, rotate, or skew graphical elements. A key concept in PDF is that of the graphics state, which is a collection of graphical parameters that may be changed, saved, and restored by a page description. PDF has (as of version 1.6) 24 graphics state properties, of which some of the most important are: The current transformation matrix (CTM) which determines the coordinate system, The clipping path, The color space and The alpha constant which is a key component of transparency. Vector graphics:- Vector graphics in PDF, as in PostScript, are constructed with paths. Paths are usually composed of lines and cubic Bezier curves, but can also be constructed from the outlines of text. Unlike PostScript, PDF does not allow a single path to mix text outlines with lines and curves. Paths can be stroked, filled, or used for clipping. Strokes and fills can use any color set in the graphics state, including patterns. Raster images:- Raster images in PDF (called Image XObjects) are represented by dictionaries with an associated stream. The dictionary describes properties of the image, and the stream contains the image data. Text:- Text in PDF is represented by text elements in page content streams. A text element specifies that characters should be drawn at certain positions. The characters are specified using the encoding of a selected font resource. Fonts:- A font object in PDF is a description of a digital typeface. It may either describe the characteristics of a typeface, or it may include an embedded font file. Encodings:- Within text strings characters are shown using character codes (integers) that map to glyphs in the current font using an encoding. There are a number of built-in encodings, including WinAnsi, MacRoman, and a large number of encodings for East Asian languages. (Although the WinAnsi and MacRoman encodings are derived from the historical properties of the Windows and Macintosh operating systems, fonts using these encodings work equally well on any platform.) The encoding mechanisms in PDF were designed for Type 1 fonts, and the rules for applying them to TrueType fonts are complex. Transparency:- The original imaging model of PDF was, like PostScript's, opaque: each object drawn on the page completely replaced anything previously marked in the same location. In PDF 1.4 the imaging model was extended to allow transparency. When transparency is used, new objects interact with previously marked objects to produce blending effects. The addition of transparency to PDF was done by means of new extensions that were designed to be ignored in products written to the PDF 1.3 and earlier specifications. As a result, files that use a small amount of transparency might view acceptably in older viewers, but files making extensive use of transparency could view completely wrong in an older viewer without warning. Interactive elements:- PDF files may contain interactive elements such as annotations and form fields. Logical structure and accessibility:- A PDF may contain structure information to enable better text extraction and accessibility. Security and signatures:- A PDF file may be encrypted for security, or digitally signed for authentication. Subsets:- Proper subsets of PDF have been, or are being, standardized under ISO for several constituencies: 1. PDF/X for the printing and graphic arts as ISO 15930 (working in ISO TC130) 2. PDF/A for archiving in corporate/government/library/etc environments as ISO 19005 (work done in ISO TC171) 3. PDF/E for exchange of engineering drawings (work done in ISO TC171) 4. PDF/UA for universally accessible PDF files A PDF/H variant (PDF for Healthcare) is being developed.[11] However, it may consist more of a set of "best practices" than of a specific format or subset. Further References Wikipedia : http://en.wikipedia.org/wiki/Portable_Document_Format Other Links : http://www.adobe.com/products/acrobat/adobepdf.html Save all your chat conversations. Find them online at http://in.messenger.yahoo.com/webmessengerpromo.php To unsubscribe send a message to [EMAIL PROTECTED] with the subject unsubscribe. To change your subscription to digest mode or make any other changes, please visit the list home page at http://accessindia.org.in/mailman/listinfo/accessindia_accessindia.org.in