Re: [PR] Enable Native Markdown Extraction in Apache PDFBox [pdfbox]

2024-02-24 Thread via GitHub
THausherr commented on PR #180: URL: https://github.com/apache/pdfbox/pull/180#issuecomment-1962396268 All done in https://issues.apache.org/jira/browse/PDFBOX-5768 . -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the

Re: [PR] Enable Native Markdown Extraction in Apache PDFBox [pdfbox]

2024-02-21 Thread via GitHub
rawatsaurav01 commented on PR #180: URL: https://github.com/apache/pdfbox/pull/180#issuecomment-1957377183 Thank you @THausherr for your acknowledgement. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to

Re: [PR] Enable Native Markdown Extraction in Apache PDFBox [pdfbox]

2024-02-21 Thread via GitHub
THausherr commented on PR #180: URL: https://github.com/apache/pdfbox/pull/180#issuecomment-1956613451 Thank you, I've simplified this and committed it. If there isn't any further input then I'll commit to versions 2 and 3 in a few days. -- This is an automated message from the Apache Git

Re: [PR] Enable Native Markdown Extraction in Apache PDFBox [pdfbox]

2024-02-18 Thread via GitHub
rawatsaurav01 commented on PR #180: URL: https://github.com/apache/pdfbox/pull/180#issuecomment-1951624412 @THausherr I have added and pushed the code ,but it is not showing maybe the pr is closed.Here is the updated code. private static void appendEscaped(StringBuilder builder, char

Re: [PR] Enable Native Markdown Extraction in Apache PDFBox [pdfbox]

2024-02-18 Thread via GitHub
THausherr commented on PR #180: URL: https://github.com/apache/pdfbox/pull/180#issuecomment-1951372205 I don't see any additional commit, either add it or post the code change here. I assume it's in `appendEscaped()`. -- This is an automated message from the Apache Git Service. To respond

Re: [PR] Enable Native Markdown Extraction in Apache PDFBox [pdfbox]

2024-02-14 Thread via GitHub
THausherr commented on PR #180: URL: https://github.com/apache/pdfbox/pull/180#issuecomment-1944051450 Sonar found a missing break in appendEscaped() which is no big deal, but then I realized that some of the code isn't needed. At the same some needed code should be added, i.e. handling of

Re: [PR] Enable Native Markdown Extraction in Apache PDFBox [pdfbox]

2024-02-14 Thread via GitHub
rawatsaurav01 commented on PR #180: URL: https://github.com/apache/pdfbox/pull/180#issuecomment-1943387407 Thank you @THausherr . -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific com

Re: [PR] Enable Native Markdown Extraction in Apache PDFBox [pdfbox]

2024-02-14 Thread via GitHub
THausherr commented on PR #180: URL: https://github.com/apache/pdfbox/pull/180#issuecomment-1943386485 Thank you @rawatsaurav01 I committed to the trunk with some minor changes. I'll commit to 3.0 and 2.0 later. -- This is an automated message from the Apache Git Service. To respond to th

Re: [PR] Enable Native Markdown Extraction in Apache PDFBox [pdfbox]

2024-02-14 Thread via GitHub
asfgit closed pull request #180: Enable Native Markdown Extraction in Apache PDFBox URL: https://github.com/apache/pdfbox/pull/180 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific commen

Re: [PR] Enable Native Markdown Extraction in Apache PDFBox [pdfbox]

2024-02-13 Thread via GitHub
rawatsaurav01 commented on PR #180: URL: https://github.com/apache/pdfbox/pull/180#issuecomment-1942054023 @THausherr I have changed the author name to Axle . Can you please check it now. -- This is an automated message from the Apache Git Service. To respond to the message, please log on

Re: [PR] Enable Native Markdown Extraction in Apache PDFBox [pdfbox]

2024-02-11 Thread via GitHub
rawatsaurav01 commented on PR #180: URL: https://github.com/apache/pdfbox/pull/180#issuecomment-1938036985 @lehmi @THausherr Thanks a lot for your encouraging comments . I have encorporated the changes. Please review and let me know if some changes required. -- This is an automated messa

Re: [PR] Enable Native Markdown Extraction in Apache PDFBox [pdfbox]

2024-02-11 Thread via GitHub
rawatsaurav01 commented on code in PR #180: URL: https://github.com/apache/pdfbox/pull/180#discussion_r1485735727 ## tools/src/main/java/org/apache/pdfbox/tools/PDFText2Markdown.java: ## @@ -0,0 +1,375 @@ +//package com.pdfexample; + +package org.apache.pdfbox.tools; + + +import

Re: [PR] Enable Native Markdown Extraction in Apache PDFBox [pdfbox]

2024-02-11 Thread via GitHub
rawatsaurav01 commented on code in PR #180: URL: https://github.com/apache/pdfbox/pull/180#discussion_r1485735631 ## tools/src/main/java/org/apache/pdfbox/tools/PDFText2Markdown.java: ## @@ -0,0 +1,375 @@ +//package com.pdfexample; + +package org.apache.pdfbox.tools; + + +import

Re: [PR] Enable Native Markdown Extraction in Apache PDFBox [pdfbox]

2024-02-03 Thread via GitHub
lehmi commented on PR #180: URL: https://github.com/apache/pdfbox/pull/180#issuecomment-1925309963 @rawatsaurav01 thanks for the contribution. I've added some comments to your code and here are some more * add a apache license header to the file * remove/replace all HTML related c

Re: [PR] Enable Native Markdown Extraction in Apache PDFBox [pdfbox]

2024-02-03 Thread via GitHub
lehmi commented on code in PR #180: URL: https://github.com/apache/pdfbox/pull/180#discussion_r1477059422 ## tools/src/main/java/org/apache/pdfbox/tools/PDFText2Markdown.java: ## @@ -0,0 +1,375 @@ +//package com.pdfexample; + +package org.apache.pdfbox.tools; + + +import org.apa

Re: [PR] Enable Native Markdown Extraction in Apache PDFBox [pdfbox]

2024-02-03 Thread via GitHub
lehmi commented on code in PR #180: URL: https://github.com/apache/pdfbox/pull/180#discussion_r1477059126 ## tools/src/main/java/org/apache/pdfbox/tools/PDFText2Markdown.java: ## @@ -0,0 +1,375 @@ +//package com.pdfexample; + +package org.apache.pdfbox.tools; + + +import org.apa

Re: [PR] Enable Native Markdown Extraction in Apache PDFBox [pdfbox]

2024-02-03 Thread via GitHub
lehmi commented on PR #180: URL: https://github.com/apache/pdfbox/pull/180#issuecomment-1925308317 @THausherr IMHO it is not needed as the class is more or less a copy of PDFText2HTML -- This is an automated message from the Apache Git Service. To respond to the message, please log on to

Re: [PR] Enable Native Markdown Extraction in Apache PDFBox [pdfbox]

2024-02-03 Thread via GitHub
THausherr commented on PR #180: URL: https://github.com/apache/pdfbox/pull/180#issuecomment-1925264378 Seems like a good idea. @lehmi do we need an ICLA for this one? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the

[PR] Enable Native Markdown Extraction in Apache PDFBox [pdfbox]

2024-02-02 Thread via GitHub
rawatsaurav01 opened a new pull request, #180: URL: https://github.com/apache/pdfbox/pull/180 Propose the addition of native Markdown extraction support in Apache PDFBox to simplify the conversion of PDF content to Markdown, eliminating the need for intermediate HTML conversion.