James,
If the extra check is costly, you might also observe that all (most?)
existing files have the proper header format. It is only new or changed
files that must be checked. So, you can use Git to determine the change set
on each PR and do the extra format check only on those files.
- Paul
On
Some ideas:
* Time marches on. Drill has a design from ten years back. What modern
environment things do current users need? Integration with Amazon Glue?
Delta lake/lakehouse/whatever the cool new thing is? Integration with the
latest & greatest BI tools?
* Seems many folks use Drill as a desktop
Hi James,
For some reason, Drill started with the license headers in Javadoc
comments. The (weak) explanation I got was that we never generate Javadoc,
so it didn't really matter. Later, we started converting the headers to
regular comments when convenient.
If we were to generate Javadoc, having
checkout master
git reset --hard origin/master
Use these with caution: I used a slightly different set to update my own
branch. Caveat emptor. This assumes that your Drill clone is "origin".
- Paul
On Thu, Jan 25, 2024 at 12:48 PM Paul Rogers wrote:
> The symbols in questions are some
The symbols in questions are some I modified in my recent PR. I wonder if
there was a merge issue somewhere? The PR did get a clean build on the
master branch.
I'll try a build myself to see if I can locate the issue.
- Paul
On Thu, Jan 25, 2024 at 10:56 AM Charles Givre wrote:
> All,
> I jus
Hi Peter,
It sounds like you are on the right track: the new option is the quick
short-term solution. The best long-term solution is to generalize Drill's
date/time type, but that would take much more work. (Drill also has a bug
where the treatment of timezones is incorrect, which forces Drill to
[
https://issues.apache.org/jira/browse/DRILL-8375?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Paul Rogers resolved DRILL-8375.
Resolution: Fixed
> Incomplete support for non-projected complex vect
Hi All,
Happy New Year!
I dusted off the work to add non-projection support in EVF for the UNION
and LIST types. I believe that only REPEATED LIST is missing.
"Non-projection support" just means that you can read a JSON file that
requires a UNION or LIST vector, and tell EVF to NOT actually proj
>> On 2024/01/01 03:16, Charles Givre wrote:
> >>>> I'll throw my .02 here... As a user of Drill, I've only had the
> occasion to use the Union once. However, when I used it, it consumed so
> much memory, we ended up finding a workaround anyway and stopped us
t;>> On 2024/01/01 03:16, Charles Givre wrote:
> >>>> I'll throw my .02 here... As a user of Drill, I've only had the
> occasion to use the Union once. However, when I used it, it consumed so
> much memory, we ended up finding a workaround anyway and stopped
Hi Luoc,
Thanks for reminding me about the EVF V2 work. I got mostly done adding
projection for complex types, then got busy on other projects. I've yet to
tackle the hard cases: unions, repeated unions and repeated lists (which
are, in fact, repeated repeated unions).
The code to handle unprojec
Hi Mike,
I wonder if you've got an array in there somewhere? Either in the data, or
you're creating an array in your code in response to the data?
If you have just scalars, then all you need to do is start a row, write the
scalars, and end the row. The starting and ending are done automagically b
rward, so i figured I should just
> > ask.
> >
> > This is just to get enough working (against local files only) that I can
> be
> > unblocked on creating and testing the rest of the Daffodil-to-Drill
> > metadata bridge and data bridge.
> >
> > My p
nly integer data fields, but should support lots
> of data shapes including vectors, choices, sequences, nested records, etc.
> >
> > Thanks for the help.
> >
> >>
> >>> On Oct 12, 2023, at 2:58 PM, Mike Beckerle <mailto:mbecke...@apache.org>> wrote:
> >>>
> >>> So when a data form
Hi Mike,
Congrats on the PR. I'll take a look soon.
You asked about initialization. Initialization is a bit tricky in a
distributed system such as Drill. There are a number of things
"initialization" could mean:
* Global, one-time initialization (per Drillbit): Unlike Druid, Drill has
no "lifecy
an equivalent
> Drill TupleMetadata from it. (Or, hopefully retrieve it from a cache)
>
> What objects do I call, or what classes do I have to create to make this
> Drill TupleMetadata available to Drill so it uses it in all the ways a
> static Drill schema can be useful?
>
> I just
Mike,
This is a complex question and has two answers.
First, the standard enhanced vector framework (EVF) used by most readers
assumes a "pull" model: read each record. This is where the next() comes
in: readers just implement this to read the next record. But, the code
under EVF works with a pus
Mike,
Just to echo Charles, thanks for the work; sounds like you are making good
progress.
The question you asked is tricky. Charles is right, the type of the data
structure is a map. The output you showed appears to be from the sqlline
tool. If so, then it helps to understand that sqlline "chea
Hi Mike,
Looks like you are wrestling with two separate issues. The first is how to
read the encoded data that you showed. In Drill, each data format generally
needs its own reader. Drill's reader operator provides all the plumbing
needed to handle multiple format readers, pack data into vectors,
Hi Mike,
I believe I sent a detailed response to this. Did it get through? If not,
I'll try sending it again...
- Paul
On Wed, Sep 13, 2023 at 6:44 AM Mike Beckerle wrote:
> ... sound of crickets on a summer night .
>
> It would really help me if I could get a response to this inquiry, to
Hi Mike,
You asked about how to work with nested data items. As noted in a previous
email, this can be a bit tricky. Drill uses SQL, and SQL does not have good
native support for structured data: it was designed in the 1970's for
record oriented data (tuples). Several attempts were made to extend
Great progress, Mike!
First, let's address the schema issue. As you've probably noticed, Drill's
original notion was that data needed no schema: the data itself provides
sufficient syntactic structure to let Drill infer schema. Also as you've
noticed, this assumption turned out to be more marketin
Hi Mike,
Good progress! There are a number of factors to consider. Let's work
through them one by one.
First, try the simplest possible query:
SELECT * FROM
If you are using the row set mechanism, grab the schema and print it. (My
memory is hazy, but I do believe that there are methods and cla
IIRC, the syntax for the "provided schema" for arrays is "ARRAY" such
as "ARRAY". This works, however, only if the XML reader uses the
(very complex) EVF framework and has a way to control parsing based on the
data type (and to set the data type based on parsing). The JSON reader has
such an integr
Unless something changed, Drill's build does not compile the .proto files.
Instead, the files are generated manually, and checked into git, on those
rare occasions that the API changes. I seem to recall that there are some
instructions somewhere, but a quick search didn't reveal anything.
- Paul
Hi Mike,
A quick glance at the log suggests a failure in the tests for the JSON
reader, in the Mongo extended types. Drill's date/time support has
historically been fragile. Some tests only work if your machine is set to
use the UTC time zone (or Java is told to pretend that the time is UTC.)
The
Drill can internally handle scalars, arrays (AKA vectors) and maps (AKA
tuples, structs). SQL, however, prefers to work with scalars: there is no
good syntax to reach inside a complex object for, say, a WHERE condition
without also projecting that item as a top-level scalar.
The cool thing, for ML
Paul Rogers created DRILL-8375:
--
Summary: Incomplete support for non-projected complex vectors
Key: DRILL-8375
URL: https://issues.apache.org/jira/browse/DRILL-8375
Project: Apache Drill
Issue
Hi All,
As others have said, the only difference between plans for “small” and “large”
queries is the queue size and memory. As I recall, those are spelled out in the
docs.
Ensure that there is sufficient memory for the slicing up done by the queue,
and the query. Memory is allocated to sorts,
Hi Luoc,
First, what poor soul is asked to deal with large amounts of XML in this
day and age? I thought we were past the XML madness, except in Maven and
Hadoop config files.
XML is much like JSON, only worse. JSON at least has well-defined types
that can be gleaned from JSON syntax. With XML...
Paul Rogers created DRILL-8185:
--
Summary: EVF 2 doen't handle map arrays or nested maps
Key: DRILL-8185
URL: https://issues.apache.org/jira/browse/DRILL-8185
Project: Apache Drill
Issue
Abhishek used to have that thing running like a charm. Great to see it
getting attention again.
+1
- Paul
On Thu, Mar 17, 2022 at 2:03 AM James Turton wrote:
> Hi dev community!
>
> Many of you need no introduction to the test framework developed by MapR
>
> https://github.com/mapr/drill-test-
Paul Rogers created DRILL-8159:
--
Summary: Upgrade HTTPD, Text readers to use EVF3
Key: DRILL-8159
URL: https://issues.apache.org/jira/browse/DRILL-8159
Project: Apache Drill
Issue Type: New
>
> > On 2022/02/07 21:05, Ted Dunning wrote:
> > > Another option is to store metadata as data in a distributed data
> store.
> > > For static resources, that can scale very well. For highly dynamic
> > > resources like conventional databases behind JDBC connect
Hi All,
Drill, like all open source projects, exists to serve those that use it. To
that end, the best contributions come when some company needs a feature
badly enough that it is worth the effort to develop and contribute a
solution. That's pretty standard, as along as the contribution is general
Paul Rogers created DRILL-8124:
--
Summary: Fix implicit file issue with EVF 2
Key: DRILL-8124
URL: https://issues.apache.org/jira/browse/DRILL-8124
Project: Apache Drill
Issue Type: New Feature
Paul Rogers created DRILL-8123:
--
Summary: Revise scan limit pushdown
Key: DRILL-8123
URL: https://issues.apache.org/jira/browse/DRILL-8123
Project: Apache Drill
Issue Type: New Feature
Paul Rogers created DRILL-8115:
--
Summary: LIMIT pushdown into EVF
Key: DRILL-8115
URL: https://issues.apache.org/jira/browse/DRILL-8115
Project: Apache Drill
Issue Type: New Feature
Congratulations James!
- Paul
On Mon, Jan 24, 2022 at 9:34 AM Charles Givre
wrote:
> The Project Management Committee (PMC) for Apache Drill is pleased to
> announce that we have invited James Turton to join us as a PMC member of
> the Drill project and he has accepted. Please join me in congr
Congratulations!
- Paul
On Mon, Jan 24, 2022 at 9:15 AM Charles Givre wrote:
> The Project Management Committee (PMC) for Apache Drill is pleased to
> announce that we have invited PJ Fanning to join us as a committer to the
> Drill project. PJ is a committer and PMC member for the Apache POI
gt; >>> available until I looked at a temporary solution..
> >>>
> >>> I use both Eclipse and IDEA, but I use Eclipse more often. I have no
> >>> objection to the use of Lombok, but suggest the following three points
> :
> >>>
> >>> 1
Hi All,
I look at any tool as a cost/benefit tradeoff. If Drill were a typical
business app, with lots of "data objects", then the hassle of Lomboc might
be a net win. However, the nature of Drill is that we have very few data
objects. We have lots of Protobuf objects, or Jackson-serialized object
aching a debugger to a running embedded Drill
> with the storage plugin deployed to it, or am I wrong here?
>
> On 2022/01/18 00:32, Paul Rogers wrote:
> > Hi Ted,
> >
> > Thanks for the explanation, makes sense.
> >
> > Ideally, the client side would be somewh
the same "already
> exists" benefit as does Maven.
>
>
>
> On Mon, Jan 17, 2022 at 1:30 PM Paul Rogers wrote:
>
>> Hi Ted,
>>
>> Well said. Just to be clear, I wasn't suggesting that we use
>> Maven-the-build-tool to distribute plugins. Rather, I
g
>> the ability to fetch and install plugins itself without too much
>> trouble, at least for Drill clusters with Internet access.
>> "Sideloading" by downloading from Maven and copying manually would
>> always remain possible.
>>
>> @Paul I
Hi All,
James raises an important issue, I've noticed that it used to be easy to
build and test Drill, now it is a struggle, because of the many odd
external dependencies we have introduced. That acts as a big damper on
contributions: none of us get paid enough to spend more time fighting
builds t
Hey All,
Other members of the Hadoop Ecosystem rely on external systems to handle
permissions: Ranger or Sentry. There is probably something different in the
AWS world.
As you look into security, you'll see that you need to maintain permissions
on many entities: files, connections, etc. You need
Hi Ted,
I like where you're going with how to manage the discussion.
Here's a trick that I saw someone do recently. The design/discussion as a
PR.
Comments are just code review comments, tagged to a specific line. The "er,
never mind"
aspect that Ted talks about is handled by pushing a new versio
info there, and I'd hate to see it get lost.
> -- C
>
> > On Jan 3, 2022, at 7:41 PM, Paul Rogers wrote:
> >
> > Hi All,
> >
> > Thanks Charles for dredging up that old discussion, your memory is better
> > than mine! And, thanks Ted for that summary
e most was the ability to move data
> > between platforms without having to serialize/deserialize the data. From
> my
> > understanding, MapR did some research and didn't find a significant
> > performance advantage and hence didn't really pursue the integration. The
&g
Paul Rogers created DRILL-8102:
--
Summary: Tests use significant space outside the drill directory
Key: DRILL-8102
URL: https://issues.apache.org/jira/browse/DRILL-8102
Project: Apache Drill
Paul Rogers created DRILL-8101:
--
Summary: Resolve the TIMESTAMP madness
Key: DRILL-8101
URL: https://issues.apache.org/jira/browse/DRILL-8101
Project: Apache Drill
Issue Type: Bug
Affects
Paul Rogers created DRILL-8100:
--
Summary: JSON record writer does not convert Dril local timestamp
to UTC
Key: DRILL-8100
URL: https://issues.apache.org/jira/browse/DRILL-8100
Project: Apache Drill
Paul Rogers created DRILL-8099:
--
Summary: Parquet record writer does not convert Dril local
timestamp to UTC
Key: DRILL-8099
URL: https://issues.apache.org/jira/browse/DRILL-8099
Project: Apache Drill
Paul Rogers created DRILL-8087:
--
Summary:
{{TestNestedDateTimeTimestamp.testNestedDateTimeCTASExtendedJson}} assumes time
zone
Key: DRILL-8087
URL: https://issues.apache.org/jira/browse/DRILL-8087
Paul Rogers created DRILL-8086:
--
Summary: Convert the CSV (AKA "compliant text") reader to EVF V2
Key: DRILL-8086
URL: https://issues.apache.org/jira/browse/DRILL-8086
Project: Ap
Paul Rogers created DRILL-8085:
--
Summary: EVF V2 support in the "Easy" format plugin
Key: DRILL-8085
URL: https://issues.apache.org/jira/browse/DRILL-8085
Project: Apache Drill
Issue
Paul Rogers created DRILL-8084:
--
Summary: Scan LIMIT pushdown fails across files
Key: DRILL-8084
URL: https://issues.apache.org/jira/browse/DRILL-8084
Project: Apache Drill
Issue Type: Bug
Paul Rogers created DRILL-8083:
--
Summary: HttpdLogBatchReader creates unnecessary empty maps
Key: DRILL-8083
URL: https://issues.apache.org/jira/browse/DRILL-8083
Project: Apache Drill
Issue
James Turton wrote:
> Hi Charles
>
> When I first took this idea to Paul I proposed that we attribute
> authorship but he declined that bit. We do have the Git history for the
> wiki, and the lines shown for the last Git commit to affect a page are
> quite visible in the wiki, e
[
https://issues.apache.org/jira/browse/DRILL-7325?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Paul Rogers resolved DRILL-7325.
Resolution: Fixed
A number of individual commits fixed problems found in each operator. This
[
https://issues.apache.org/jira/browse/DRILL-6953?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Paul Rogers resolved DRILL-6953.
Resolution: Fixed
Resolved via series of individual tickets.
> Merge row set-based JSON rea
Paul Rogers created DRILL-7789:
--
Summary: Exchanges are slow on large systems & queries
Key: DRILL-7789
URL: https://issues.apache.org/jira/browse/DRILL-7789
Project: Apache Drill
Issue
Hi Abhishek,
Downloaded the tar file, installed Drill, cleaned my ZK and poked around in
the UI.
As you noted, you've already run the thousands of unit tests and the test
framework, so no point in trying to repeat that. Our tests, however, don't
cover the UI much at all, so I clicked around on th
Paul Rogers created DRILL-7734:
--
Summary: Revise the result set reader
Key: DRILL-7734
URL: https://issues.apache.org/jira/browse/DRILL-7734
Project: Apache Drill
Issue Type: Improvement
Paul Rogers created DRILL-7733:
--
Summary: Use streaming for REST JSON queries
Key: DRILL-7733
URL: https://issues.apache.org/jira/browse/DRILL-7733
Project: Apache Drill
Issue Type: Improvement
Paul Rogers created DRILL-7730:
--
Summary: Reduce overhead of web queries displayed in HTML
Key: DRILL-7730
URL: https://issues.apache.org/jira/browse/DRILL-7730
Project: Apache Drill
Issue Type
Paul Rogers created DRILL-7729:
--
Summary: Use java.time in column accessors
Key: DRILL-7729
URL: https://issues.apache.org/jira/browse/DRILL-7729
Project: Apache Drill
Issue Type: Improvement
y.
On Sun, May 3, 2020 at 2:42 PM Paul Rogers
wrote:
> Hi Tug,
>
> Glad to hear from you again. Ted's summary is pretty good; here's a bit
> more detail.
>
>
> Presto is another alternative which seems to have gained the most traction
> outside of the Cloud eco
Hi Tug,
Glad to hear from you again. Ted's summary is pretty good; here's a bit more
detail.
Presto is another alternative which seems to have gained the most traction
outside of the Cloud ecosystem on the one hand, and the Cloudera/HortonWorks
ecosystem on the other. Presto does, however, de
Paul Rogers created DRILL-7728:
--
Summary: Drill SPI framework
Key: DRILL-7728
URL: https://issues.apache.org/jira/browse/DRILL-7728
Project: Apache Drill
Issue Type: Improvement
Affects
Paul Rogers created DRILL-7725:
--
Summary: Updates to EVF2
Key: DRILL-7725
URL: https://issues.apache.org/jira/browse/DRILL-7725
Project: Apache Drill
Issue Type: Improvement
Affects
Paul Rogers created DRILL-7724:
--
Summary: Refactor metadata controller batch
Key: DRILL-7724
URL: https://issues.apache.org/jira/browse/DRILL-7724
Project: Apache Drill
Issue Type: Improvement
Paul Rogers created DRILL-7717:
--
Summary: Support Mongo extended types in V2 JSON loader
Key: DRILL-7717
URL: https://issues.apache.org/jira/browse/DRILL-7717
Project: Apache Drill
Issue Type
Hi All,
I think there may be a bit of confusion. It may be true that some of Drill's
dependencies now use the newer version of the library
httpcomponents:httpclient. However, it looks like ES directly depends on the
older flavor.
We have pom file entries which exclude that old version. As a res
Hi All,
This is a quick note for any of you who create or work on format plugins in
Drill. You will see that all existing plugins have been modified so that config
properties are immutable. This note will explain why.
Drill uses storage and format plugins as keys into an internal map. (That's
Paul Rogers created DRILL-7711:
--
Summary: Add data path, parameter filter pushdown to HTTP plugin
Key: DRILL-7711
URL: https://issues.apache.org/jira/browse/DRILL-7711
Project: Apache Drill
Paul Rogers created DRILL-7709:
--
Summary: CTAS as CSV creates files which the "csv" plugin can't
read
Key: DRILL-7709
URL: https://issues.apache.org/jira/browse/DRILL-7709
Projec
Hi Charles,
Excellent point. The problem is deeper. Drill serializes plugin configs in the
query plan which it sends to each worker (Drillbit.) Why? To avoid race
conditions if you start a query then change the plugin config and thus
different nodes see different versions of the config.
Maskin
Paul Rogers created DRILL-7708:
--
Summary: Downgrade maven from 3.6.3 to 3.6.0
Key: DRILL-7708
URL: https://issues.apache.org/jira/browse/DRILL-7708
Project: Apache Drill
Issue Type: Bug
Hi Arina,
Thanks for keeping us up to date!
As it turns out, I use Ubuntu (Linux Mint) for development. Maven is installed
as a package using apt-get. Packages can lag behind a bit. The latest maven
available via apt-get is 3.6.0.
It is a nuisance to install a new version outside the package m
[
https://issues.apache.org/jira/browse/DRILL-7655?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Paul Rogers resolved DRILL-7655.
Resolution: Fixed
Fixed as part of PR #2052.
> Add Default Schema text box to Edit Query page
Paul Rogers created DRILL-7703:
--
Summary: Support for 3+D arrays in EVF JSON loader
Key: DRILL-7703
URL: https://issues.apache.org/jira/browse/DRILL-7703
Project: Apache Drill
Issue Type
Paul Rogers created DRILL-7701:
--
Summary: EVF V2 Scan Framework
Key: DRILL-7701
URL: https://issues.apache.org/jira/browse/DRILL-7701
Project: Apache Drill
Issue Type: Improvement
Affects
[
https://issues.apache.org/jira/browse/DRILL-7685?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Paul Rogers resolved DRILL-7685.
Resolution: Cannot Reproduce
Tested in Drill 1.18 (snapshot) and found that the provided query
Paul Rogers created DRILL-7697:
--
Summary: Revise query editor in profile page of web UI
Key: DRILL-7697
URL: https://issues.apache.org/jira/browse/DRILL-7697
Project: Apache Drill
Issue Type
[
https://issues.apache.org/jira/browse/DRILL-6672?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Paul Rogers resolved DRILL-6672.
Resolution: Not A Problem
Storage and format plugins must be immutable since their entire values
Paul Rogers created DRILL-7696:
--
Summary: EVF v2 Scan Schema Resolution
Key: DRILL-7696
URL: https://issues.apache.org/jira/browse/DRILL-7696
Project: Apache Drill
Issue Type: Improvement
Paul Rogers created DRILL-7690:
--
Summary: Display (major) operators in fragment title bar in Web UI
Key: DRILL-7690
URL: https://issues.apache.org/jira/browse/DRILL-7690
Project: Apache Drill
Paul Rogers created DRILL-7689:
--
Summary: Do not save profiles for trivial queries
Key: DRILL-7689
URL: https://issues.apache.org/jira/browse/DRILL-7689
Project: Apache Drill
Issue Type
Paul Rogers created DRILL-7688:
--
Summary: Provide web console option to see non-default options
Key: DRILL-7688
URL: https://issues.apache.org/jira/browse/DRILL-7688
Project: Apache Drill
Issue
Paul Rogers created DRILL-7687:
--
Summary: Inaccurate memory estimates in hash join
Key: DRILL-7687
URL: https://issues.apache.org/jira/browse/DRILL-7687
Project: Apache Drill
Issue Type: Bug
Paul Rogers created DRILL-7686:
--
Summary: Excessive memory use in partition sender
Key: DRILL-7686
URL: https://issues.apache.org/jira/browse/DRILL-7686
Project: Apache Drill
Issue Type: Bug
Paul Rogers created DRILL-7683:
--
Summary: Add "message parsing" to new JSON loader
Key: DRILL-7683
URL: https://issues.apache.org/jira/browse/DRILL-7683
Project: Apache Drill
Paul Rogers created DRILL-7680:
--
Summary: Move UDF projects before plugins in contrib
Key: DRILL-7680
URL: https://issues.apache.org/jira/browse/DRILL-7680
Project: Apache Drill
Issue Type
te an issue on our JIRA board?
Idan Sheinberg 8:43 AM
Sure
8:43
I'll get to it
cgivre 8:44 AM
I'd like for Paul Rogers to see this as I think he was the author of some of
this.
Idan Sheinberg 8:44 AM
Hmm. I'll keep that in mind
cgivre 8:47 AM
We've been refactoring some of t
Paul Rogers created DRILL-7658:
--
Summary: Vector allocateNew() has poor error reporting
Key: DRILL-7658
URL: https://issues.apache.org/jira/browse/DRILL-7658
Project: Apache Drill
Issue Type
Paul Rogers created DRILL-7640:
--
Summary: EVF-based JSON Loader
Key: DRILL-7640
URL: https://issues.apache.org/jira/browse/DRILL-7640
Project: Apache Drill
Issue Type: Improvement
Affects
Paul Rogers created DRILL-7634:
--
Summary: Rollup of code cleanup changes
Key: DRILL-7634
URL: https://issues.apache.org/jira/browse/DRILL-7634
Project: Apache Drill
Issue Type: Improvement
Paul Rogers created DRILL-7633:
--
Summary: Fixes for union and repeated list accessors
Key: DRILL-7633
URL: https://issues.apache.org/jira/browse/DRILL-7633
Project: Apache Drill
Issue Type
1 - 100 of 3541 matches
Mail list logo