[GitHub] [drill] dzamo commented on pull request #2359: DRILL-8028: Add PDF Format Plugin

2021-12-01 Thread GitBox
dzamo commented on pull request #2359: URL: https://github.com/apache/drill/pull/2359#issuecomment-983645762 Dear PR author and reviewers. This is a generic message to say that we would like to merge this PR in time for the 1.20 release. Currently we're targeting a master branch fre

[GitHub] [drill] dzamo commented on pull request #2359: DRILL-8028: Add PDF Format Plugin

2021-11-25 Thread GitBox
dzamo commented on pull request #2359: URL: https://github.com/apache/drill/pull/2359#issuecomment-979755114 Copy that @paul-rogers, I [have incorporated your advice above into the Drill 2.0 wiki page](https://github.com/apache/drill/wiki/Drill-2.0-Proposal#project-structure-packaging-and-

[GitHub] [drill] dzamo commented on pull request #2359: DRILL-8028: Add PDF Format Plugin

2021-11-09 Thread GitBox
dzamo commented on pull request #2359: URL: https://github.com/apache/drill/pull/2359#issuecomment-964848311 > The bigger implication of having a `columns` array vs `field_n` is when a user starts with `SELECT *` queries. It makes it harder for BI tools to gather schema metadata and it als

[GitHub] [drill] dzamo commented on pull request #2359: DRILL-8028: Add PDF Format Plugin

2021-11-08 Thread GitBox
dzamo commented on pull request #2359: URL: https://github.com/apache/drill/pull/2359#issuecomment-963148567 > * Sometimes the PDF reader does not read tables perfectly and you get a mix of found headers and not found headers, so that's one reason I took that approach. This consider

[GitHub] [drill] dzamo commented on pull request #2359: DRILL-8028: Add PDF Format Plugin

2021-11-08 Thread GitBox
dzamo commented on pull request #2359: URL: https://github.com/apache/drill/pull/2359#issuecomment-962903289 @cgivre Instead of generating `field_0`, `field_1` when no column names can be determined, is it possible to generate a `columns` array to be accessed using `columns[0], columns[1],

[GitHub] [drill] dzamo commented on pull request #2359: DRILL-8028: Add PDF Format Plugin

2021-11-07 Thread GitBox
dzamo commented on pull request #2359: URL: https://github.com/apache/drill/pull/2359#issuecomment-962895343 @paul-rogers, right, okay it's an expressiveness thing here rather than a scale thing. The expressiveness of Drill SQL ∪ Drill format config JSON falls well short of that of a gene

[GitHub] [drill] dzamo commented on pull request #2359: DRILL-8028: Add PDF Format Plugin

2021-11-07 Thread GitBox
dzamo commented on pull request #2359: URL: https://github.com/apache/drill/pull/2359#issuecomment-962569638 > The problem is, a tool that tries to be both a desert topping and a floor wax (let's see how old the readers are with this one), ends up being good at neither. @paul-roger

[GitHub] [drill] dzamo commented on pull request #2359: DRILL-8028: Add PDF Format Plugin

2021-11-03 Thread GitBox
dzamo commented on pull request #2359: URL: https://github.com/apache/drill/pull/2359#issuecomment-960472017 @cgivre @paul-rogers, my 2c. I guess some partial precedents for a format plugin like this are ones like format-image and format-esri (as noted), though those do only go after the