Re: [Zorba-coders] [Merge] lp:~zorba-coders/zorba/fread-pdf-trunk into lp:zorba

2012-09-22 Thread Cezar Andrei
See answers inline:
- the error seems to be too general, essentially it always raises 
JAVA-EXCEPTION no matter what goes wrong (e.g. it the given input is not a 
valid pdf)
I adapted the error msg to be more clear/specific.

- the java stack trace seems to be sent to standard error
Goes to std err.

- Renders the each page of the PDF document as an image. = Renders each page 
of the PDF document as an image.
Done.

- the names of the private functions should also adhere to the code conventions 
renderToImages = render-to-images
Done.

- make xqdoc failes because the comments seem to contain invalid xml
/home/mbrantner/zorba/build/URI_PATH/com/zorba-xquery/www/modules/project_xqdoc.xq:142,9:
 user-defined error [err:UE004]: Error processing module zerr:ZXQD0002 -  This 
module provides funtionality to read the text from PDF documents and
 to render PDF documents to images.
 a href=http://pdfbox.apache.org;Apache PDFBox/a library is used to
 implement these functions.
 br /
 br /
 bNote:/b Since this module has a Java library dependency a JVM required
 to be installed on the system. For Windows: jvm.dll is required on the system
 path ( usually located in C:\Program Files\Java\jre6\bin\client.
 bNote:b For Debian based Linux distributions install PdfBox and FontBox
 packages: sudo apt-get install libpdfbox-java libfontbox-java
: can not parse as XML for xqdoc: loader parsing error: Opening and ending tag 
mismatch: b line 0 and root
; raised at 
/home/mbrantner/zorba/sandbox/src/runtime/errors_and_diagnostics/errors_and_diagnostics_impl.cpp:81
Done.

- adapt the year in Copyright 2006-2009 The FLWOR Foundation. in the .xq file 
(and some other files also)
Done.

- would it make sense to return one string per page in the pdf instead of one 
big string?
The API doesn't alow it, but I added two more optional options, to insert a 
user defined string at the start and end of each page.

- remove commented out code in read-pdf.cpp
Done.

- valgrind shows tons of invalid writes. Why? Are they critical? Is there 
anything we can do?
Jvm always shows in valgrind, even if nothing is done with it. I was careful to 
remove any allocated memory.

- would it make sense to return the images in a streaming fashion (i.e. don't 
create all base64's in a vector)?
No, because it's a push write of all images. And as discussed, optimize only a 
copy in some cases isn't worth the effort.

- encoding each image shouldn't be necessary and will probably we wasted effort 
because the images might be written to a file in their binary form
Done.
-- 
https://code.launchpad.net/~zorba-coders/zorba/fread-pdf-trunk/+merge/125338
Your team Zorba Coders is subscribed to branch lp:zorba.

-- 
Mailing list: https://launchpad.net/~zorba-coders
Post to : zorba-coders@lists.launchpad.net
Unsubscribe : https://launchpad.net/~zorba-coders
More help   : https://help.launchpad.net/ListHelp


[Zorba-coders] [Merge] lp:~zorba-coders/zorba/fread-pdf-trunk into lp:zorba

2012-09-22 Thread Cezar Andrei
Cezar Andrei has proposed merging lp:~zorba-coders/zorba/fread-pdf-trunk into 
lp:zorba.

Requested reviews:
  Matthias Brantner (matthias-brantner)
  Cezar Andrei (cezar-andrei)
  Chris Hillery (ceejatec)
Related bugs:
  Bug #1012417 in Zorba: PDF to XML data convertor
  https://bugs.launchpad.net/zorba/+bug/1012417

For more details, see:
https://code.launchpad.net/~zorba-coders/zorba/fread-pdf-trunk/+merge/125858

Make doc comments for createBaser64Binary more explicit on what parameters they 
expect and what they do.
Change, return value to xs_int for getIntValue() method.
-- 
https://code.launchpad.net/~zorba-coders/zorba/fread-pdf-trunk/+merge/125858
Your team Zorba Coders is subscribed to branch lp:zorba.
=== modified file 'ChangeLog'
--- ChangeLog	2012-09-21 18:25:49 +
+++ ChangeLog	2012-09-22 18:12:23 +
@@ -17,6 +17,7 @@
   * Implemented semantics of null for comparison and arithmetics operations.
   * Positional pagination support for index probes 
   * Recognize the no-copy pragma to avoid copying nodes before insertion into a collection.
+  * Adding new external module read-pdf, it converts PDF documents to text or rendered images.
 
 Optimizations:
   * New memory management for compiler expressions (no more ref counting)

=== modified file 'cmake_modules/ZorbaModule.cmake'
--- cmake_modules/ZorbaModule.cmake	2012-09-17 00:36:37 +
+++ cmake_modules/ZorbaModule.cmake	2012-09-22 18:12:23 +
@@ -114,9 +114,20 @@
 #  relative to CMAKE_CURRENT_SOURCE_DIR)
 #   LINK_LIBRARIES - (optional) List of libraries to link external
 #  function library against
+#   CONFIG_FILES - (optional) List of files to configure with package
+#  information; see below
 #   TEST_ONLY - (optional) Module is for testcases only and should not
 #  be installed
 #
+# CONFIG_FILES - any files specific here will be copied to
+# CMAKE_CURRENT_BINARY_DIR using CONFIGURE_FILE(). They may contain
+# the following @VARIABLES@ which will be substituted:
+#   ZORBA_MODULE_RELATIVE_DIR - directory portion of mangled URI
+#   ZORBA_MODULE_LIBFILE_WE - filename (without extension) portion of
+#  mangled URI
+# The input files should have a .in extension. The resulting file in
+# the build directory will have the .in removed.
+#
 # QQQ this currently doesn't support modules with multiple component
 # .xq files. (Neither does Zorba's automatic loading mechanism, so
 # this probably isn't a huge deal, but worth thinking about.)
@@ -125,7 +136,7 @@
 # file enough to deduce the URI and version?
 MACRO (DECLARE_ZORBA_MODULE)
   # Parse and validate arguments
-  PARSE_ARGUMENTS(MODULE LINK_LIBRARIES;EXTRA_SOURCES
+  PARSE_ARGUMENTS(MODULE LINK_LIBRARIES;EXTRA_SOURCES;CONFIG_FILES
 URI;FILE;VERSION TEST_ONLY ${ARGN})
   IF (NOT MODULE_FILE)
 MESSAGE (FATAL_ERROR 'FILE' argument is required for ZORBA_DECLARE_MODULE())
@@ -353,6 +364,20 @@
   ${version_infix}  1 ${MODULE_TEST_ONLY})
   ENDFOREACH (version_infix)
 
+  # Configure any module-specified config files.
+  SET (ZORBA_MODULE_RELATIVE_DIR ${module_path})
+  SET (ZORBA_MODULE_LIBFILE_WE ${module_filewe})
+  FOREACH (_config_file ${MODULE_CONFIG_FILES})
+# Strip off .in - can't use GET_FILENAME_COMPONENT as it always removes
+# the longest possible extension
+STRING (REGEX REPLACE \\.in$  _config_filename_we ${_config_file})
+IF (NOT IS_ABSOLUTE ${_config_file})
+  SET (_config_file ${CMAKE_CURRENT_SOURCE_DIR}/${_config_file})
+ENDIF (NOT IS_ABSOLUTE ${_config_file})
+CONFIGURE_FILE (${_config_file}
+  ${CMAKE_CURRENT_BINARY_DIR}/${_config_filename_we} @ONLY)
+  ENDFOREACH (_config_file)
+
   # Last but not least, whip up a test case that ensures the module
   # can at least be compiled. Don't bother for test-only modules
   # (presumably they're there to be tested!).

=== modified file 'include/zorba/item_factory.h'
--- include/zorba/item_factory.h	2012-09-17 00:36:37 +
+++ include/zorba/item_factory.h	2012-09-22 18:12:23 +
@@ -123,8 +123,8 @@
   /** \brief Creates a Base64Binary Item
* see [http://www.w3.org/TR/xmlschema-2/#base64Binary]
*
-   * @param aBinData a pointer to the base6c4 binary data.
-   * @param aLength the length of the base64 binary data.
+   * @param aBinData a pointer to the base64 encoded data. The data is copied from aBinData.
+   * @param aLength the length of the base64 encoded data.
* @return The Base64Binary Item.
*/
   virtual Item
@@ -133,7 +133,7 @@
   /** \brief Creates a Base64Binary Item
* see [http://www.w3.org/TR/xmlschema-2/#base64Binary]
*
-   * @param aStream A stream containing the Base64 encoded data.
+   * @param aStream A stream containing the Base64 encoded data. The data is copied from aStream imediately.
* @return the Base64Binary Item.
*/
   virtual Item
@@ -142,11 +142,11 @@
   

[Zorba-coders] [Merge] lp:~zorba-coders/zorba/bug923686 into lp:zorba

2012-09-22 Thread Rodolfo Ochoa
The proposal to merge lp:~zorba-coders/zorba/bug923686 into lp:zorba has been 
updated.

Status: Needs review = Approved

For more details, see:
https://code.launchpad.net/~zorba-coders/zorba/bug923686/+merge/124531
-- 
https://code.launchpad.net/~zorba-coders/zorba/bug923686/+merge/124531
Your team Zorba Coders is subscribed to branch lp:zorba.

-- 
Mailing list: https://launchpad.net/~zorba-coders
Post to : zorba-coders@lists.launchpad.net
Unsubscribe : https://launchpad.net/~zorba-coders
More help   : https://help.launchpad.net/ListHelp


[Zorba-coders] [Merge] lp:~zorba-coders/zorba/bug923686 into lp:zorba

2012-09-22 Thread Zorba Build Bot
Validation queue starting for merge proposal.
Log at: 
http://zorbatest.lambda.nu:8080/remotequeue/bug923686-2012-09-22T19-53-57.765Z/log.html
-- 
https://code.launchpad.net/~zorba-coders/zorba/bug923686/+merge/124531
Your team Zorba Coders is subscribed to branch lp:zorba.

-- 
Mailing list: https://launchpad.net/~zorba-coders
Post to : zorba-coders@lists.launchpad.net
Unsubscribe : https://launchpad.net/~zorba-coders
More help   : https://help.launchpad.net/ListHelp


Re: [Zorba-coders] [Merge] lp:~zorba-coders/zorba/bug923686 into lp:zorba

2012-09-22 Thread Zorba Build Bot
The attempt to merge lp:~zorba-coders/zorba/bug923686 into lp:zorba failed. 
Below is the output from the failed tests.


CMake Error at /home/ceej/zo/testing/zorbatest/tester/TarmacLander.cmake:274 
(message):
  Validation queue job bug923686-2012-09-22T19-53-57.765Z is finished.  The
  final status was:

  

  2 tests did not succeed - changes not commited.


Error in read script: /home/ceej/zo/testing/zorbatest/tester/TarmacLander.cmake

-- 
https://code.launchpad.net/~zorba-coders/zorba/bug923686/+merge/124531
Your team Zorba Coders is subscribed to branch lp:zorba.

-- 
Mailing list: https://launchpad.net/~zorba-coders
Post to : zorba-coders@lists.launchpad.net
Unsubscribe : https://launchpad.net/~zorba-coders
More help   : https://help.launchpad.net/ListHelp


[Zorba-coders] [Merge] lp:~zorba-coders/zorba/bug923686 into lp:zorba

2012-09-22 Thread Zorba Build Bot
The proposal to merge lp:~zorba-coders/zorba/bug923686 into lp:zorba has been 
updated.

Status: Approved = Needs review

For more details, see:
https://code.launchpad.net/~zorba-coders/zorba/bug923686/+merge/124531
-- 
https://code.launchpad.net/~zorba-coders/zorba/bug923686/+merge/124531
Your team Zorba Coders is subscribed to branch lp:zorba.

-- 
Mailing list: https://launchpad.net/~zorba-coders
Post to : zorba-coders@lists.launchpad.net
Unsubscribe : https://launchpad.net/~zorba-coders
More help   : https://help.launchpad.net/ListHelp


Re: [Zorba-coders] [Merge] lp:~zorba-coders/zorba/fread-pdf-trunk into lp:zorba

2012-09-22 Thread Cezar Andrei
Review: Approve


-- 
https://code.launchpad.net/~zorba-coders/zorba/fread-pdf-trunk/+merge/125858
Your team Zorba Coders is subscribed to branch lp:zorba.

-- 
Mailing list: https://launchpad.net/~zorba-coders
Post to : zorba-coders@lists.launchpad.net
Unsubscribe : https://launchpad.net/~zorba-coders
More help   : https://help.launchpad.net/ListHelp


[Zorba-coders] [Merge] lp:~zorba-coders/zorba/fread-pdf-trunk into lp:zorba

2012-09-22 Thread Cezar Andrei
The proposal to merge lp:~zorba-coders/zorba/fread-pdf-trunk into lp:zorba has 
been updated.

Status: Needs review = Approved

For more details, see:
https://code.launchpad.net/~zorba-coders/zorba/fread-pdf-trunk/+merge/125858
-- 
https://code.launchpad.net/~zorba-coders/zorba/fread-pdf-trunk/+merge/125858
Your team Zorba Coders is subscribed to branch lp:zorba.

-- 
Mailing list: https://launchpad.net/~zorba-coders
Post to : zorba-coders@lists.launchpad.net
Unsubscribe : https://launchpad.net/~zorba-coders
More help   : https://help.launchpad.net/ListHelp


[Zorba-coders] [Merge] lp:~zorba-coders/zorba/fread-pdf-trunk into lp:zorba

2012-09-22 Thread Cezar Andrei
The proposal to merge lp:~zorba-coders/zorba/fread-pdf-trunk into lp:zorba has 
been updated.

Commit Message changed to:

Add read-pdf module, which reads the text from a pdf doc and renders its pages 
to images.
Add java.library.path to the jvm in util-jvm module.
Make doc comments for createBaser64Binary more explicit on what parameters they 
expect and what they do.
Change, return value to xs_int for getIntValue() method.

For more details, see:
https://code.launchpad.net/~zorba-coders/zorba/fread-pdf-trunk/+merge/125858
-- 
https://code.launchpad.net/~zorba-coders/zorba/fread-pdf-trunk/+merge/125858
Your team Zorba Coders is subscribed to branch lp:zorba.

-- 
Mailing list: https://launchpad.net/~zorba-coders
Post to : zorba-coders@lists.launchpad.net
Unsubscribe : https://launchpad.net/~zorba-coders
More help   : https://help.launchpad.net/ListHelp


[Zorba-coders] [Merge] lp:~zorba-coders/zorba/fread-pdf-trunk into lp:zorba

2012-09-22 Thread Zorba Build Bot
Validation queue starting for merge proposal.
Log at: 
http://zorbatest.lambda.nu:8080/remotequeue/fread-pdf-trunk-2012-09-22T21-29-56.025Z/log.html
-- 
https://code.launchpad.net/~zorba-coders/zorba/fread-pdf-trunk/+merge/125858
Your team Zorba Coders is subscribed to branch lp:zorba.

-- 
Mailing list: https://launchpad.net/~zorba-coders
Post to : zorba-coders@lists.launchpad.net
Unsubscribe : https://launchpad.net/~zorba-coders
More help   : https://help.launchpad.net/ListHelp


[Zorba-coders] [Merge] lp:~zorba-coders/zorba/fread-pdf-trunk into lp:zorba

2012-09-22 Thread Zorba Build Bot
Validation queue job fread-pdf-trunk-2012-09-22T21-29-56.025Z is finished. The 
final status was:

All tests succeeded!
-- 
https://code.launchpad.net/~zorba-coders/zorba/fread-pdf-trunk/+merge/125858
Your team Zorba Coders is subscribed to branch lp:zorba.

-- 
Mailing list: https://launchpad.net/~zorba-coders
Post to : zorba-coders@lists.launchpad.net
Unsubscribe : https://launchpad.net/~zorba-coders
More help   : https://help.launchpad.net/ListHelp


Re: [Zorba-coders] [Merge] lp:~zorba-coders/zorba/fread-pdf-trunk into lp:zorba

2012-09-22 Thread Zorba Build Bot
Voting does not meet specified criteria. Required: Approve  1, Disapprove  1, 
Needs Fixing  1, Pending  1. Got: 1 Approve, 2 Pending.
-- 
https://code.launchpad.net/~zorba-coders/zorba/fread-pdf-trunk/+merge/125858
Your team Zorba Coders is subscribed to branch lp:zorba.

-- 
Mailing list: https://launchpad.net/~zorba-coders
Post to : zorba-coders@lists.launchpad.net
Unsubscribe : https://launchpad.net/~zorba-coders
More help   : https://help.launchpad.net/ListHelp


[Zorba-coders] [Merge] lp:~zorba-coders/zorba/fread-pdf-trunk into lp:zorba

2012-09-22 Thread Zorba Build Bot
The proposal to merge lp:~zorba-coders/zorba/fread-pdf-trunk into lp:zorba has 
been updated.

Status: Approved = Needs review

For more details, see:
https://code.launchpad.net/~zorba-coders/zorba/fread-pdf-trunk/+merge/125858
-- 
https://code.launchpad.net/~zorba-coders/zorba/fread-pdf-trunk/+merge/125858
Your team Zorba Coders is subscribed to branch lp:zorba.

-- 
Mailing list: https://launchpad.net/~zorba-coders
Post to : zorba-coders@lists.launchpad.net
Unsubscribe : https://launchpad.net/~zorba-coders
More help   : https://help.launchpad.net/ListHelp