Re: [Zorba-coders] [Merge] lp:~zorba-coders/zorba/fread-pdf-trunk into lp:zorba
See answers inline: - the error seems to be too general, essentially it always raises JAVA-EXCEPTION no matter what goes wrong (e.g. it the given input is not a valid pdf) I adapted the error msg to be more clear/specific. - the java stack trace seems to be sent to standard error Goes to std err. - Renders the each page of the PDF document as an image. = Renders each page of the PDF document as an image. Done. - the names of the private functions should also adhere to the code conventions renderToImages = render-to-images Done. - make xqdoc failes because the comments seem to contain invalid xml /home/mbrantner/zorba/build/URI_PATH/com/zorba-xquery/www/modules/project_xqdoc.xq:142,9: user-defined error [err:UE004]: Error processing module zerr:ZXQD0002 - This module provides funtionality to read the text from PDF documents and to render PDF documents to images. a href=http://pdfbox.apache.org;Apache PDFBox/a library is used to implement these functions. br / br / bNote:/b Since this module has a Java library dependency a JVM required to be installed on the system. For Windows: jvm.dll is required on the system path ( usually located in C:\Program Files\Java\jre6\bin\client. bNote:b For Debian based Linux distributions install PdfBox and FontBox packages: sudo apt-get install libpdfbox-java libfontbox-java : can not parse as XML for xqdoc: loader parsing error: Opening and ending tag mismatch: b line 0 and root ; raised at /home/mbrantner/zorba/sandbox/src/runtime/errors_and_diagnostics/errors_and_diagnostics_impl.cpp:81 Done. - adapt the year in Copyright 2006-2009 The FLWOR Foundation. in the .xq file (and some other files also) Done. - would it make sense to return one string per page in the pdf instead of one big string? The API doesn't alow it, but I added two more optional options, to insert a user defined string at the start and end of each page. - remove commented out code in read-pdf.cpp Done. - valgrind shows tons of invalid writes. Why? Are they critical? Is there anything we can do? Jvm always shows in valgrind, even if nothing is done with it. I was careful to remove any allocated memory. - would it make sense to return the images in a streaming fashion (i.e. don't create all base64's in a vector)? No, because it's a push write of all images. And as discussed, optimize only a copy in some cases isn't worth the effort. - encoding each image shouldn't be necessary and will probably we wasted effort because the images might be written to a file in their binary form Done. -- https://code.launchpad.net/~zorba-coders/zorba/fread-pdf-trunk/+merge/125338 Your team Zorba Coders is subscribed to branch lp:zorba. -- Mailing list: https://launchpad.net/~zorba-coders Post to : zorba-coders@lists.launchpad.net Unsubscribe : https://launchpad.net/~zorba-coders More help : https://help.launchpad.net/ListHelp
[Zorba-coders] [Merge] lp:~zorba-coders/zorba/fread-pdf-trunk into lp:zorba
Cezar Andrei has proposed merging lp:~zorba-coders/zorba/fread-pdf-trunk into lp:zorba. Requested reviews: Matthias Brantner (matthias-brantner) Cezar Andrei (cezar-andrei) Chris Hillery (ceejatec) Related bugs: Bug #1012417 in Zorba: PDF to XML data convertor https://bugs.launchpad.net/zorba/+bug/1012417 For more details, see: https://code.launchpad.net/~zorba-coders/zorba/fread-pdf-trunk/+merge/125858 Make doc comments for createBaser64Binary more explicit on what parameters they expect and what they do. Change, return value to xs_int for getIntValue() method. -- https://code.launchpad.net/~zorba-coders/zorba/fread-pdf-trunk/+merge/125858 Your team Zorba Coders is subscribed to branch lp:zorba. === modified file 'ChangeLog' --- ChangeLog 2012-09-21 18:25:49 + +++ ChangeLog 2012-09-22 18:12:23 + @@ -17,6 +17,7 @@ * Implemented semantics of null for comparison and arithmetics operations. * Positional pagination support for index probes * Recognize the no-copy pragma to avoid copying nodes before insertion into a collection. + * Adding new external module read-pdf, it converts PDF documents to text or rendered images. Optimizations: * New memory management for compiler expressions (no more ref counting) === modified file 'cmake_modules/ZorbaModule.cmake' --- cmake_modules/ZorbaModule.cmake 2012-09-17 00:36:37 + +++ cmake_modules/ZorbaModule.cmake 2012-09-22 18:12:23 + @@ -114,9 +114,20 @@ # relative to CMAKE_CURRENT_SOURCE_DIR) # LINK_LIBRARIES - (optional) List of libraries to link external # function library against +# CONFIG_FILES - (optional) List of files to configure with package +# information; see below # TEST_ONLY - (optional) Module is for testcases only and should not # be installed # +# CONFIG_FILES - any files specific here will be copied to +# CMAKE_CURRENT_BINARY_DIR using CONFIGURE_FILE(). They may contain +# the following @VARIABLES@ which will be substituted: +# ZORBA_MODULE_RELATIVE_DIR - directory portion of mangled URI +# ZORBA_MODULE_LIBFILE_WE - filename (without extension) portion of +# mangled URI +# The input files should have a .in extension. The resulting file in +# the build directory will have the .in removed. +# # QQQ this currently doesn't support modules with multiple component # .xq files. (Neither does Zorba's automatic loading mechanism, so # this probably isn't a huge deal, but worth thinking about.) @@ -125,7 +136,7 @@ # file enough to deduce the URI and version? MACRO (DECLARE_ZORBA_MODULE) # Parse and validate arguments - PARSE_ARGUMENTS(MODULE LINK_LIBRARIES;EXTRA_SOURCES + PARSE_ARGUMENTS(MODULE LINK_LIBRARIES;EXTRA_SOURCES;CONFIG_FILES URI;FILE;VERSION TEST_ONLY ${ARGN}) IF (NOT MODULE_FILE) MESSAGE (FATAL_ERROR 'FILE' argument is required for ZORBA_DECLARE_MODULE()) @@ -353,6 +364,20 @@ ${version_infix} 1 ${MODULE_TEST_ONLY}) ENDFOREACH (version_infix) + # Configure any module-specified config files. + SET (ZORBA_MODULE_RELATIVE_DIR ${module_path}) + SET (ZORBA_MODULE_LIBFILE_WE ${module_filewe}) + FOREACH (_config_file ${MODULE_CONFIG_FILES}) +# Strip off .in - can't use GET_FILENAME_COMPONENT as it always removes +# the longest possible extension +STRING (REGEX REPLACE \\.in$ _config_filename_we ${_config_file}) +IF (NOT IS_ABSOLUTE ${_config_file}) + SET (_config_file ${CMAKE_CURRENT_SOURCE_DIR}/${_config_file}) +ENDIF (NOT IS_ABSOLUTE ${_config_file}) +CONFIGURE_FILE (${_config_file} + ${CMAKE_CURRENT_BINARY_DIR}/${_config_filename_we} @ONLY) + ENDFOREACH (_config_file) + # Last but not least, whip up a test case that ensures the module # can at least be compiled. Don't bother for test-only modules # (presumably they're there to be tested!). === modified file 'include/zorba/item_factory.h' --- include/zorba/item_factory.h 2012-09-17 00:36:37 + +++ include/zorba/item_factory.h 2012-09-22 18:12:23 + @@ -123,8 +123,8 @@ /** \brief Creates a Base64Binary Item * see [http://www.w3.org/TR/xmlschema-2/#base64Binary] * - * @param aBinData a pointer to the base6c4 binary data. - * @param aLength the length of the base64 binary data. + * @param aBinData a pointer to the base64 encoded data. The data is copied from aBinData. + * @param aLength the length of the base64 encoded data. * @return The Base64Binary Item. */ virtual Item @@ -133,7 +133,7 @@ /** \brief Creates a Base64Binary Item * see [http://www.w3.org/TR/xmlschema-2/#base64Binary] * - * @param aStream A stream containing the Base64 encoded data. + * @param aStream A stream containing the Base64 encoded data. The data is copied from aStream imediately. * @return the Base64Binary Item. */ virtual Item @@ -142,11 +142,11 @@
[Zorba-coders] [Merge] lp:~zorba-coders/zorba/bug923686 into lp:zorba
The proposal to merge lp:~zorba-coders/zorba/bug923686 into lp:zorba has been updated. Status: Needs review = Approved For more details, see: https://code.launchpad.net/~zorba-coders/zorba/bug923686/+merge/124531 -- https://code.launchpad.net/~zorba-coders/zorba/bug923686/+merge/124531 Your team Zorba Coders is subscribed to branch lp:zorba. -- Mailing list: https://launchpad.net/~zorba-coders Post to : zorba-coders@lists.launchpad.net Unsubscribe : https://launchpad.net/~zorba-coders More help : https://help.launchpad.net/ListHelp
[Zorba-coders] [Merge] lp:~zorba-coders/zorba/bug923686 into lp:zorba
Validation queue starting for merge proposal. Log at: http://zorbatest.lambda.nu:8080/remotequeue/bug923686-2012-09-22T19-53-57.765Z/log.html -- https://code.launchpad.net/~zorba-coders/zorba/bug923686/+merge/124531 Your team Zorba Coders is subscribed to branch lp:zorba. -- Mailing list: https://launchpad.net/~zorba-coders Post to : zorba-coders@lists.launchpad.net Unsubscribe : https://launchpad.net/~zorba-coders More help : https://help.launchpad.net/ListHelp
Re: [Zorba-coders] [Merge] lp:~zorba-coders/zorba/bug923686 into lp:zorba
The attempt to merge lp:~zorba-coders/zorba/bug923686 into lp:zorba failed. Below is the output from the failed tests. CMake Error at /home/ceej/zo/testing/zorbatest/tester/TarmacLander.cmake:274 (message): Validation queue job bug923686-2012-09-22T19-53-57.765Z is finished. The final status was: 2 tests did not succeed - changes not commited. Error in read script: /home/ceej/zo/testing/zorbatest/tester/TarmacLander.cmake -- https://code.launchpad.net/~zorba-coders/zorba/bug923686/+merge/124531 Your team Zorba Coders is subscribed to branch lp:zorba. -- Mailing list: https://launchpad.net/~zorba-coders Post to : zorba-coders@lists.launchpad.net Unsubscribe : https://launchpad.net/~zorba-coders More help : https://help.launchpad.net/ListHelp
[Zorba-coders] [Merge] lp:~zorba-coders/zorba/bug923686 into lp:zorba
The proposal to merge lp:~zorba-coders/zorba/bug923686 into lp:zorba has been updated. Status: Approved = Needs review For more details, see: https://code.launchpad.net/~zorba-coders/zorba/bug923686/+merge/124531 -- https://code.launchpad.net/~zorba-coders/zorba/bug923686/+merge/124531 Your team Zorba Coders is subscribed to branch lp:zorba. -- Mailing list: https://launchpad.net/~zorba-coders Post to : zorba-coders@lists.launchpad.net Unsubscribe : https://launchpad.net/~zorba-coders More help : https://help.launchpad.net/ListHelp
Re: [Zorba-coders] [Merge] lp:~zorba-coders/zorba/fread-pdf-trunk into lp:zorba
Review: Approve -- https://code.launchpad.net/~zorba-coders/zorba/fread-pdf-trunk/+merge/125858 Your team Zorba Coders is subscribed to branch lp:zorba. -- Mailing list: https://launchpad.net/~zorba-coders Post to : zorba-coders@lists.launchpad.net Unsubscribe : https://launchpad.net/~zorba-coders More help : https://help.launchpad.net/ListHelp
[Zorba-coders] [Merge] lp:~zorba-coders/zorba/fread-pdf-trunk into lp:zorba
The proposal to merge lp:~zorba-coders/zorba/fread-pdf-trunk into lp:zorba has been updated. Status: Needs review = Approved For more details, see: https://code.launchpad.net/~zorba-coders/zorba/fread-pdf-trunk/+merge/125858 -- https://code.launchpad.net/~zorba-coders/zorba/fread-pdf-trunk/+merge/125858 Your team Zorba Coders is subscribed to branch lp:zorba. -- Mailing list: https://launchpad.net/~zorba-coders Post to : zorba-coders@lists.launchpad.net Unsubscribe : https://launchpad.net/~zorba-coders More help : https://help.launchpad.net/ListHelp
[Zorba-coders] [Merge] lp:~zorba-coders/zorba/fread-pdf-trunk into lp:zorba
The proposal to merge lp:~zorba-coders/zorba/fread-pdf-trunk into lp:zorba has been updated. Commit Message changed to: Add read-pdf module, which reads the text from a pdf doc and renders its pages to images. Add java.library.path to the jvm in util-jvm module. Make doc comments for createBaser64Binary more explicit on what parameters they expect and what they do. Change, return value to xs_int for getIntValue() method. For more details, see: https://code.launchpad.net/~zorba-coders/zorba/fread-pdf-trunk/+merge/125858 -- https://code.launchpad.net/~zorba-coders/zorba/fread-pdf-trunk/+merge/125858 Your team Zorba Coders is subscribed to branch lp:zorba. -- Mailing list: https://launchpad.net/~zorba-coders Post to : zorba-coders@lists.launchpad.net Unsubscribe : https://launchpad.net/~zorba-coders More help : https://help.launchpad.net/ListHelp
[Zorba-coders] [Merge] lp:~zorba-coders/zorba/fread-pdf-trunk into lp:zorba
Validation queue starting for merge proposal. Log at: http://zorbatest.lambda.nu:8080/remotequeue/fread-pdf-trunk-2012-09-22T21-29-56.025Z/log.html -- https://code.launchpad.net/~zorba-coders/zorba/fread-pdf-trunk/+merge/125858 Your team Zorba Coders is subscribed to branch lp:zorba. -- Mailing list: https://launchpad.net/~zorba-coders Post to : zorba-coders@lists.launchpad.net Unsubscribe : https://launchpad.net/~zorba-coders More help : https://help.launchpad.net/ListHelp
[Zorba-coders] [Merge] lp:~zorba-coders/zorba/fread-pdf-trunk into lp:zorba
Validation queue job fread-pdf-trunk-2012-09-22T21-29-56.025Z is finished. The final status was: All tests succeeded! -- https://code.launchpad.net/~zorba-coders/zorba/fread-pdf-trunk/+merge/125858 Your team Zorba Coders is subscribed to branch lp:zorba. -- Mailing list: https://launchpad.net/~zorba-coders Post to : zorba-coders@lists.launchpad.net Unsubscribe : https://launchpad.net/~zorba-coders More help : https://help.launchpad.net/ListHelp
Re: [Zorba-coders] [Merge] lp:~zorba-coders/zorba/fread-pdf-trunk into lp:zorba
Voting does not meet specified criteria. Required: Approve 1, Disapprove 1, Needs Fixing 1, Pending 1. Got: 1 Approve, 2 Pending. -- https://code.launchpad.net/~zorba-coders/zorba/fread-pdf-trunk/+merge/125858 Your team Zorba Coders is subscribed to branch lp:zorba. -- Mailing list: https://launchpad.net/~zorba-coders Post to : zorba-coders@lists.launchpad.net Unsubscribe : https://launchpad.net/~zorba-coders More help : https://help.launchpad.net/ListHelp
[Zorba-coders] [Merge] lp:~zorba-coders/zorba/fread-pdf-trunk into lp:zorba
The proposal to merge lp:~zorba-coders/zorba/fread-pdf-trunk into lp:zorba has been updated. Status: Approved = Needs review For more details, see: https://code.launchpad.net/~zorba-coders/zorba/fread-pdf-trunk/+merge/125858 -- https://code.launchpad.net/~zorba-coders/zorba/fread-pdf-trunk/+merge/125858 Your team Zorba Coders is subscribed to branch lp:zorba. -- Mailing list: https://launchpad.net/~zorba-coders Post to : zorba-coders@lists.launchpad.net Unsubscribe : https://launchpad.net/~zorba-coders More help : https://help.launchpad.net/ListHelp