Author: ks
Date: Sun Jul 15 23:28:46 2007
New Revision: 5716
Log:
- Added design.txt and prototypes updated.
Added:
experimental/Document/design/design.txt
experimental/Document/src/output/
experimental/Document/src/output.php
Removed:
experimental/Document/src/writer.php
experimental/Document/src/writers/
Modified:
experimental/Document/design/requirements.txt
experimental/Document/src/conversion.php
experimental/Document/src/transformer.php
experimental/Document/src/validator.php
Added: experimental/Document/design/design.txt
==============================================================================
--- experimental/Document/design/design.txt (added)
+++ experimental/Document/design/design.txt [iso-8859-1] Sun Jul 15 23:28:46
2007
@@ -1,0 +1,160 @@
+eZ Component: Document, Design
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+Design description
+==================
+
+ezcDocConversion
+----------------
+
+ezcDocConversion class is a wrapper around real conversion classes.
+It keeps information about available formats and possible transformation
+ways and cares about choosing a way to transform from any user's input
+format to any output format.
+
+Often direct conversion is impossible and some intermediate formats are needed.
+A sequence of all intermediate formats and corresponding conversion classes is
+called "conversion chain". Sometimes there could be more then one chain, in
this
+case a shorter one is choosen automatically. But if you want to alter this, its
+possible to use getConversionChains() function to get all the chains and
+useChain() to select one of them.
+
+This class also cares about input/output data types, which can be text, DOM
and file,
+and converts data (parses/dumps XML and reads/writes files) when it is
necessary.
+Not all the data can be converted to another type directly (for example plain
text
+formats can't be converted to DOM), in this case it throws a error.
+
+ezcDocConverterBase
+-------------------
+
+A small abstract class that other converters are based on.
+
+ezcDocParser
+------------
+
+A class based on ezcDocConverterBase for creating text format parsers. Contains
+methods for document parsing using a formal grammar. Exact formal grammars and
+format-specific callback handlers (if needed) are set in derived classes.
+
+ezcDocTransformer
+-----------------
+
+A class based on ezcDocConverterBase for transforming DOM documents using
special
+rules and element callback handlers. Contains only methods for document
+transformation. Exact rules and element handlers are set in derived classes.
+
+ezcDocOutput
+------------
+
+Performs an output of the given document tree in the text format using simple
+internal templating system. Also it cares about text indenting to show the
+structure of the document. Exact templates for element output and helper
+formatting functions are set in derived classes.
+
+ezcDocOutputTemplate
+--------------------
+
+Implemented in the DocumentTemplateTieIn component. It extends ezcDocOutput
+class for using Template component for elements output.
+
+ezcDocValidator
+---------------
+
+Validates a document or a separate element against it's schema. This class uses
+RelaxNG schema format as the input.
+
+Converter classes
+-----------------
+
+Classes derived from ezcDocConverterBase or from classes that extend
+ezcDocConverterBase are used to convert a document from one format to another.
+
+
+Algorithms
+==========
+
+Transforming XML
+----------------
+
+The component supports 2 ways of transforming DOM documents:
+1) Using XSLT stylesheet. In this case the converter is derived from
+ ezcDocConverterBase class and applies XSLT stylesheets to the document using
+ PHP XSL extension.
+2) Using ezcDocTransformer class. In this case the converter is derived from
+ ezcDocTransformer class.
+
+ezcDocTransformer class provides the interface to process the document.
+It's principle is completely different from XSL and it assumes that the result
+of transformations is the same document, but with diffrent schema.
+It's main function walks around the document's tree and calls callback handlers
+for elements. Derived classes contain control arrays and elements' handlers.
+
+Element's handler can pass information to next processed element's handler.
+This makes it possible to handle complex transformations that evolve many
+elements. For instance the handler of "text" element "knows" that it is needed
+to have <p> element as a parent. It creates new <p> element and passes it
+to the next element's handler by reference. So if the next element is a text,
+it will have a new parent element to be attached to.
+
+Parsing text/XML
+----------------
+
+ezcDocParser class performs a parsing of the input text and presents
+it as a DOM tree.
+
+This is not an implementation of a real context-free parser.
+There is an assumption that input language is XML-like, i.e. consists
+of elements that have their opening and ending parts and some
+content between them (that may contain another elements).
+
+Sometimes it's hard or impossible to formalize input in these terms,
+so some special algorithms or custom element handlers will be used
+in this case.
+
+Document output
+---------------
+
+ezcDocOutput class performs an output of the given document tree in the text
+format using simple internal templating system. Also it cares about text
+indenting to show the structure of the document.
+
+Exact templates for element output and helper formatting functions are set
+in derived classes. Templates are simple strings in which some character
+or sequence is replaced with another string using str_replace.
+
+ezcDocOutputTemplate class is implemented in the DocumentTemplateTieIn
+component. It extends this class to use Template component for elements
+output.
+
+Validating documents
+--------------------
+
+ezcDocValidator is used to validate a document or a separate element
+against it's schema.
+
+This class uses RelaxNG schema format as the input, then transforms it
+to the inner format for fast processing. The processed schema is stored
+in cached .php file for faster access in the future.
+
+The idea for fast validation is using regular expressions and strings.
+Here is an example:
+
+ <element name="elem1">
+ <zeroOrMore>
+ <element name="elem2">
+ ...
+ </element>
+ </zeroOrMore>
+ <element name="elem3">
+ ...
+ </element>
+</element>
+
+This RelaxNG schema for the element's content can be presented with regexp:
+
+'#(elem2)*elem3#'
+
+Validated document element's children can be also presented with a string,
+like 'elem2elem2elem3' for instance, which is validated with this regexp.
+
+The similar process used for attributes.
Modified: experimental/Document/design/requirements.txt
==============================================================================
--- experimental/Document/design/requirements.txt [iso-8859-1] (original)
+++ experimental/Document/design/requirements.txt [iso-8859-1] Sun Jul 15
23:28:46 2007
@@ -106,7 +106,7 @@
- eZ publish 4 XML text
- eZ publish 4 simplified XML text
-The attached picture (document-formats.png) shows which formats will be
+The attached diagram (document-formats.svg) shows which formats will be
supported in the first release of the component and possible directions of
transforming from one format to anthoer.
Modified: experimental/Document/src/conversion.php
==============================================================================
--- experimental/Document/src/conversion.php [iso-8859-1] (original)
+++ experimental/Document/src/conversion.php [iso-8859-1] Sun Jul 15 23:28:46
2007
@@ -39,16 +39,20 @@
/**
* List of available formats
*/
- private $availableFormats = array( 'oehtmlinput', 'oehtml', 'ezxmltext',
'docbook', 'xhtml', 'xhtmlbody' );
+ private $availableFormats = array( 'oe', 'oehtml', 'ezp', 'docbook',
'xhtml', 'xhtmlbody', 'simple', 'simplexml' );
/**
* List of available converter classes
*/
- private $availableConverters = array( 'oehtmlinput' => array( 'oehtml' =>
'ezcParserOe' ),
- 'oehtml' => array( 'ezxmltext' =>
'ezcTransformOeEzp' ),
- 'ezxmltext' => array( 'docbook' =>
'ezcTransformEzpDocbook',
- 'xhtml' =>
'ezcTransformEzpXhtml',
- 'xhtmlbody' =>
'ezcWriteEzpXhtml' ) );
+ private $availableConverters = array( 'oe' => array( 'oehtml' =>
'ezcParseOE' ),
+ 'oehtml' => array( 'ezp' =>
'ezcTransformOEhtmlEzp' ),
+ 'ezp' => array( 'docbook' =>
'ezcTransformEzpDocbook',
+ 'xhtml' =>
'ezcTransformEzpXhtml',
+ 'xhtmlbody' =>
'ezcOutputEzpXhtml',
+ 'simplexml' =>
'ezcTransformEzpSimplexml' ),
+ 'docbook' => array( 'ezp' =>
'ezcTransformDocbookEzp',
+ 'xhtml' =>
'ezcTransformDocbookXhtml' ),
+ 'simple' => array( 'simplexml' =>
'ezcParseSimple' ) );
/**
* Sets source format, data type is TEXT, DOM or FILE
*/
Added: experimental/Document/src/output.php
==============================================================================
--- experimental/Document/src/output.php (added)
+++ experimental/Document/src/output.php [iso-8859-1] Sun Jul 15 23:28:46 2007
@@ -1,0 +1,71 @@
+<?php
+/**
+ * File containing the ezcDocOutput class
+ *
+ * @package Document
+ * @version //autogen//
+ * @copyright Copyright (C) 2005-2007 eZ systems as. All rights reserved.
+ * @license http://ez.no/licenses/new_bsd New BSD License
+ */
+
+/**
+ * ezcDocOutput class performs an output of the given document tree in the text
+ * format using simple internal templating system. Also it cares about text
+ * indenting to show the structure of the document.
+ *
+ * Exact templates for element output and helper formatting functions are set
+ * in derived classes.
+ *
+ * ezcDocOutputTemplate class is implemented in the DocumentTemplateTieIn
+ * component. It extends this class to use Template component for elements
+ * output.
+ *
+ */
+
+abstract class ezcDocOutput extends ezcDocConverterBase
+{
+
+ /**
+ * Rules for elements output, specified in derived classes
+ *
+ * $elementsOutput = array( 'element1' => array( 'startTag' =>
'<element1$>'.
+ * 'endTag' =>
"</element1>\n",
+ * 'attribute1' => '
attr1="$"',
+ * 'attribute2' => '
attr2="$"' ... ),
+ * ... );
+ *
+ * '$' sign is replaced with attributes string in tag's start or end
template,
+ * and with attributes' values in attributes templates.
+ *
+ * (There should also be a possibility to set default view)
+ *
+ */
+ protected $elementsOutput;
+
+ /**
+ * String to use for indentations
+ */
+ protected $indentString;
+
+ /** Conversion function
+ * @return output text string
+ */
+ public function convert( $source )
+ {
+ //...
+ return $text;
+ }
+
+ /** Main tree-walk recursive function.
+ * @return string part
+ */
+ protected function elementOutput( $element )
+ {
+
+ }
+
+ $sourceDataType = DOM;
+ $destDataType = TEXT;
+}
+
+?>
Modified: experimental/Document/src/transformer.php
==============================================================================
--- experimental/Document/src/transformer.php [iso-8859-1] (original)
+++ experimental/Document/src/transformer.php [iso-8859-1] Sun Jul 15 23:28:46
2007
@@ -22,7 +22,6 @@
abstract class ezcDocTransformer extends ezcDocConverterBase
{
-
/**
* Attribute conversion rules:
*
@@ -46,7 +45,7 @@
/** Main walk-tree recursive function.
*/
- protected function transform( $element, &$handlersData )
+ protected function transform( &$element, &$handlersData )
{
// call of the 1st element's handler
$this->initHandler( $element, &$handlersData );
@@ -55,9 +54,10 @@
$child = $element->firstChild;
do
{
+ $next =& $element->nextSibiling;
$this->transform( $child, &$handlersData );
- $child = $element->nextSibiling;
-
+ $child =& $next;
+
}while( $child );
// call of the 2nd element's handler
Modified: experimental/Document/src/validator.php
==============================================================================
--- experimental/Document/src/validator.php [iso-8859-1] (original)
+++ experimental/Document/src/validator.php [iso-8859-1] Sun Jul 15 23:28:46
2007
@@ -13,9 +13,8 @@
* against it's schema.
*
* This class uses RelaxNG schema format as the input, then transforms it
- * to the inner format for fast processing.
- *
- * (cache to a file?? use cache component?)
+ * to the inner format for fast processing. The processed schema is stored
+ * in cached .php file for faster access in the future.
*
* The idea for fast validation is using regular expressions and strings.
* Here is an example:
--
svn-components mailing list
[email protected]
http://lists.ez.no/mailman/listinfo/svn-components