Re: Get DPI

2021-08-26 Thread Евгений Король
Hello. I think that DPI is value of inserted on PDF image. I would like to get DPI of original image. Before transformations of matrix. чт, 26 авг. 2021 г., 22:46 Tilman Hausherr : > Am 26.08.2021 um 04:20 schrieb Евгений Король: > > Hello. How i can get DPI value of raster image PDImageXObject w

Re: PDFText2HTML.java Working Example

2021-08-26 Thread Tilman Hausherr
Am 27.08.2021 um 00:57 schrieb flywire: I couldn't find a working PDFText2HTML.java example. Can you show me one, preferably as an app? It's available as part of ExtractText command line utility in pdfbox-app. Use the option "-html" to activate it. java -jar pdfbox-app-2.0.24.jar ExtractText

PDFText2HTML.java Working Example

2021-08-26 Thread flywire
I couldn't find a working PDFText2HTML.java example. Can you show me one, preferably as an app?

Re: PDF2MD - Images

2021-08-26 Thread flywire
Figures, Tables etc often have a unique caption line eg Figure N: Description... After extracting text I used this workaround to post-process the markdown files on Win10 with GNU sed (hence ^^): === display proposed changes for %f in (*.md) do sed -n 's/\(^^Figure \)\([0-9]\+\)\(\: .*\)/\n![]

Re: ExtractImages Ignoring Textboxes

2021-08-26 Thread Tilman Hausherr
Am 24.08.2021 um 02:03 schrieb flywire: https://fivedots.coe.psu.ac.th/~ad/jlop/chaps/46.%20Addons.pdf contains textboxes which are extracted as images containing a solid black box. How can I ignore those text boxes while extracting images and not increment image number contained in the filename.

Re: PDF2MD - Codeblocks

2021-08-26 Thread Tilman Hausherr
Am 24.08.2021 um 01:55 schrieb flywire: https://fivedots.coe.psu.ac.th/~ad/jlop/chaps/46.%20Addons.pdf contains codeblocks identified by a change of font and no other fonts on those lines. I'd like to insert control codes before and after them while I'm extracting text. I'm on Win10 using: java

Re: PDF2MD - Codeblocks

2021-08-26 Thread Tilman Hausherr
Am 24.08.2021 um 06:17 schrieb flywire: With a bit of customisation, PDFBox should be able to parse pdf to md . This probably involves a process like PDFText2HTML.java

Re: PDF2MD - Images

2021-08-26 Thread Tilman Hausherr
Am 24.08.2021 um 01:43 schrieb flywire: https://fivedots.coe.psu.ac.th/~ad/jlop/chaps/46.%20Addons.pdf contains images and I'd like to replace them with code while I'm extracting text. I'm on Win10 using: java -jar pdfbox-app-2.0.24.jar ExtractText %1 Required code is: %newline%[](%filename%-%

Re: Get DPI

2021-08-26 Thread Tilman Hausherr
Am 26.08.2021 um 04:20 schrieb Евгений Король: Hello. How i can get DPI value of raster image PDImageXObject when i processing PDF file? https://stackoverflow.com/questions/5472711/dpi-of-image-extracted-from-pdf-with-pdfbox ---