Re: Impala Column Masking Behavior Design

2019-11-12 Thread Todd Lipcon
release-3.1.2/ql/src/java/org/apache/hadoop/hive/ql/parse/TableMask.java#L86-L155 > > > and some experiments I did: > > > > > > > > > https://docs.google.com/document/d/1LYk2wxT3GMw4ur5y9JBBykolfAs31P3gWRStk21PomM/edit?usp=sharing > > > > > > Kurt mentions that traditional dbs like DB2 are in behavior (b). I > think > > we > > > need to decide which behavior we'd like to support. The pros for > behavior > > > (a) is no security leak. Because user X can't guess whether there are > > some > > > customers with phone number '123456789'. The pros for behavior (b) is > > users > > > don't need to rewrite their existing queries after admin applies column > > > masking policies. > > > > > > What do you think? > > > > > > Thanks, > > > Quanlong > > > > > > -- Todd Lipcon Software Engineer, Cloudera

Re: Ubuntu 18.04 in pre-merge tests?

2019-06-24 Thread Todd Lipcon
Perhaps others have some reason why it wouldn't work well, though. -Todd -- Todd Lipcon Software Engineer, Cloudera

Re: Ubuntu 18.04 in pre-merge tests?

2019-06-20 Thread Todd Lipcon
one already and I'm starting Kudu > build there in at attempt take a look at the issue later tonight. > > I'll keep you posted on my findings. > > > Kind regards, > > Alexey > > On Wed, Jun 19, 2019 at 2:53 PM Todd Lipcon wrote: > >> This same

Re: Ubuntu 18.04 in pre-merge tests?

2019-06-19 Thread Todd Lipcon
.0.0.1:7051: FATAL_UNAUTHORIZED: Not authorized: expected > TLS_HANDSHAKE step: SASL_INITIATE > W0612 04:24:57.910481 8897 heartbeater.cc:587] Failed to heartbeat to > 127.0.0.1:7051 (722 consecutive failures): Not authorized: Failed to ping > master at 127.0.0.1:7051: Client connection n

Re: Ubuntu 18.04 in pre-merge tests?

2019-06-17 Thread Todd Lipcon
Otherwise it's very easy to introduce code that uses features not available on el7, for example. > > On Wed, May 22, 2019 at 10:41 AM Todd Lipcon wrote: > > > On Mon, May 20, 2019 at 8:36 PM Jim Apple wrote: > > > > > Maybe now would be a good time to implemen

Re: Ubuntu 18.04 in pre-merge tests?

2019-05-22 Thread Todd Lipcon
sounds great to me :) Personally I don't develop on Ubuntu 18 and in my day job it's not a particularly important deployment platform, so I personally don't think I'll spend much time triaging that build. Todd > > On Mon, May 20, 2019 at 9:09 AM Todd Lipcon wrote: >

Re: Ubuntu 18.04 in pre-merge tests?

2019-05-20 Thread Todd Lipcon
ity members who have made this happen! > > Should we add Ubuntu 18.04 to our pre-merge Jenkins job, replace 16.04 with > 18.04 in our pre-merge Jenkins job, or neither? > > I propose adding 18.04 for now (ans so running both 16.04 and 18.04 on > merge) and removing 16.04 when it start

Re: Enabled backend tests for UBSAN

2019-05-06 Thread Todd Lipcon
at they were passing after he fixed the final set of issues > there. > > - Tim > -- Todd Lipcon Software Engineer, Cloudera

Re: Remote read testing in precommit

2019-04-25 Thread Todd Lipcon
rit.cloudera.org/c/12639/. The next > > step > > > is > > > > > to > > > > > > > get > > > > > > > > a > > > > > > > > Jenkins job running, which I've been working on. > > > > > > > > > > > > > > > > I'd like to run it regularly so we can catch any regressions. > > > > > Initially > > > > > > > > I'll just have it email me when it fails, but after it's > stable > > > > for a > > > > > > > week > > > > > > > > or two I'd like to make it part of the regular set of jobs. > > > > > > > > > > > > > > > > My preference is to run it as part of the precommit jobs, in > > > > parallel > > > > > > to > > > > > > > > the Ubuntu 16.04 tests. It should not extend the critical > path > > of > > > > > > > precommit > > > > > > > > because it only runs the end-to-end tests. We could > > alternatively > > > > run > > > > > > it > > > > > > > as > > > > > > > > a scheduled post-commit job, but that tends to create > > additional > > > > work > > > > > > > when > > > > > > > > it breaks. > > > > > > > > > > > > > > > > What do people think? > > > > > > > > > > > > > > > > - Tim > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > -- Todd Lipcon Software Engineer, Cloudera

Re: Remove 'hive-benchmark' workload?

2019-04-25 Thread Todd Lipcon
Posted a review: http://gerrit.cloudera.org:8080/13117 On Thu, Apr 25, 2019 at 8:37 AM Bharath Vissapragada wrote: > +1, never used it! > > On Thu, Apr 25, 2019 at 8:20 AM Tim Armstrong > wrote: > > > +1 - do it! > > > > On Thu, Apr 25, 2019 at 1:43 AM Todd

Remove 'hive-benchmark' workload?

2019-04-25 Thread Todd Lipcon
t did load, it looks like it's only a set of ~10 trivial queries which I doubt have any real benchmark interest in modern days. Anyone mind if I excise this cruft? -Todd -- Todd Lipcon Software Engineer, Cloudera

Re: A problem of left outer join when statement contains two aggregation for the same column

2019-04-04 Thread Todd Lipcon
; > on t1.a_id = t2.a_id > > > ) t; > > > +--+ > > > | count(1) | > > > +--+ > > > | 2| > > > +--+ > > > Here is the result of two subquery without count(1): > > > +--+-+-+ > > > | a_id | amount1 | amount2 | > > > +--+-+-+ > > > | 1| 30 | 30 | > > > | 2| NULL| NULL| > > > +--+-+-+ why the count(1) of this > > > resultset is 1? > > > +--+-+ > > > | a_id | amount1 | > > > +--+-+ > > > | 1| 30 | > > > | 2| NULL| > > > +--+-+ why the count(1) of this > > > resultset is 2? > > > I want to ask why the first sql return just 1, but second return 2,is > > this > > > correct or impala bug?How impala deal with count aggr.? > > > If I change the sum to other aggr. function like count/max/min, result > is > > > same. I test this on 2.12.0 and 3.1.0 version. > > > > > > > > > -- Todd Lipcon Software Engineer, Cloudera

Re: +2ability on gerrit

2019-03-29 Thread Todd Lipcon
Ah, try now. Seems I added you to the wrong group On Fri, Mar 29, 2019 at 10:33 AM Jim Apple wrote: > Hm, didn't seem to work. > > On Thu, Mar 28, 2019 at 9:28 AM Todd Lipcon wrote: > > > I think you're all set. Give it a shot? > > > > -Todd > >

Re: +2ability on gerrit

2019-03-28 Thread Todd Lipcon
I think you're all set. Give it a shot? -Todd On Tue, Mar 26, 2019 at 8:05 PM Jim Apple wrote: > Hello! I am now using "jbapple", not "jbapple-cloudera", as my gerrit > handle. Can someone with an admin login give me the auths to +2 changes? > > Thanks in

Re: Runtime filter publishing from a join node to its cousin

2019-01-30 Thread Todd Lipcon
l > > is useful.) > > > > I stared at that code for a bit, and I agree with you that it's > plausible. > > I'm also confused by the "bottom-up" comment of generateFilters(): it > seems > > like we walk the plan depth first, and the assignment

Runtime filter publishing from a join node to its cousin

2019-01-30 Thread Todd Lipcon
s "cousin" is beneficial. I think the only necessary restriction is that a RF should not be sent from a hash join node to any descendent of its right child. Keep in mind I'm very new to the Impala planner code and particularly to the runtime filter portion thereof, so I may have missed something. But does the above sound like a plausible bug/missed optimization? -Todd -- Todd Lipcon Software Engineer, Cloudera

Re: Adding profile info from Frontend

2018-09-04 Thread Todd Lipcon
I have a WIP patch up here in case anyone's interested in taking an early look: https://gerrit.cloudera.org/c/11388/ -Todd On Tue, Sep 4, 2018 at 5:02 PM, Todd Lipcon wrote: > On Tue, Sep 4, 2018 at 4:46 PM, Andrew Sherman > wrote: > >> Hi Todd, >> >> I'm

Re: Adding profile info from Frontend

2018-09-04 Thread Todd Lipcon
a line of code yet :-D -Todd > > > -Andrew > > On Tue, Sep 4, 2018 at 4:43 PM Todd Lipcon wrote: > > > On Tue, Sep 4, 2018 at 4:28 PM, Andrew Sherman > > wrote: > > > > > Hi Todd, > > > > > > I am making a simple fix for &g

Re: Adding profile info from Frontend

2018-09-04 Thread Todd Lipcon
coordinate which one goes in first? -Todd > > > On Tue, Sep 4, 2018 at 4:02 PM Todd Lipcon wrote: > > > Hey folks, > > > > I'm working on a patch to add some more diagnostics from the planning > > process into query profiles. > > > > Curr

Re: Adding profile info from Frontend

2018-09-04 Thread Todd Lipcon
or a set of well-known fields defined as a thrift enum instead of relying on strings.. but I dont want to bite off too much in one go here :) > > -- Philip > > > On Tue, Sep 4, 2018 at 4:02 PM Todd Lipcon wrote: > > > Hey folks, > > > > I'm working on a patc

Adding profile info from Frontend

2018-09-04 Thread Todd Lipcon
ith a full TRuntimeProfileNode. I'd then add some capability in the Java side to fill in counters, etc, in this structure. Any concerns with this approach before I go down this path? Are there any compatibility guarantees I need to uphold with the profile output of queries? -Todd -- Todd Lipcon Software Engineer, Cloudera

Re: Issues with query/fragment lifecycle on trunk?

2018-08-30 Thread Todd Lipcon
On Thu, Aug 30, 2018 at 2:48 PM, Todd Lipcon wrote: > On Thu, Aug 30, 2018 at 2:44 PM, Pooja Nilangekar < > pooja.nilange...@cloudera.com> wrote: > >> Hi Todd, >> >> I believe you are right. There are a couple of other race conditions in >> the >&g

Re: Issues with query/fragment lifecycle on trunk?

2018-08-30 Thread Todd Lipcon
scanner threads and scan node are hard to reason about and could be improved more generally. For now I'm trying to use a DebugAction to inject probabilistic failures into the soft memory limit checks to see if I can reproduce it more easily. -Todd > On Thu, Aug 30, 2018 at 1:27 PM Todd L

Re: Issues with query/fragment lifecycle on trunk?

2018-08-30 Thread Todd Lipcon
ay be interesting to look at: > - is there any scan node in the profile which doesn't finish any assigned > scan ranges ? > - if you happen to have a core, it may help to inspect the stack traces of > the scanner threads and the disk io mgr threads to understand their states. > >

Issues with query/fragment lifecycle on trunk?

2018-08-30 Thread Todd Lipcon
n. So, there are still two mysteries: - why did it get stuck in the first place? - why are my "number of running queries" counters stuck at non-zero values? Does anything above ring a bell for anyone? -Todd -- Todd Lipcon Software Engineer, Cloudera

Re: Breaking change for query hints?

2018-08-23 Thread Todd Lipcon
I'm curious: can you describe the view using Hive to see what the stored query consists of? On Thu, Aug 23, 2018, 7:44 PM Quanlong Huang wrote: > Hi all, > > After we upgrade Impala from 2.5 to 2.12, we found some queries on views > failed. The views contain a query hint which I think is the cau

Re: Improving latency of catalog update propagation?

2018-08-21 Thread Todd Lipcon
it would be fine for the update callback to take a long time, no? -Todd On Tue, Aug 21, 2018 at 11:09 AM, Todd Lipcon wrote: > Thanks, Tim. I'm guessing once we switch over these RPCs to KRPC instead > of Thrift we'll alleviate some of the scalability issues and maybe we can &g

Re: Improving latency of catalog update propagation?

2018-08-21 Thread Todd Lipcon
nt to the statestore to schedule the subscriber update > sooner. This would also work for admission control since coordinators could > notify the statestore when the first query was admitted after the previous > statestore update. > > On Tue, Aug 21, 2018 at 9:41 AM, Todd Lipcon wrote: >

Improving latency of catalog update propagation?

2018-08-21 Thread Todd Lipcon
s (4 seconds). Has anyone looked into optimizing this at all? It seems like we could have metadata changes trigger an immediate "collection" into the C++ side, and have the statestore update callback wait ("long poll" style) for an update rather than skip if there is nothing available. -Todd -- Todd Lipcon Software Engineer, Cloudera

java target version in FE pom?

2018-08-20 Thread Todd Lipcon
accordingly? -Todd -- Todd Lipcon Software Engineer, Cloudera

Update on catalog changes

2018-08-07 Thread Todd Lipcon
ink it has promise longer-term as it simplifies the overall architecture and can help with cross-system metadata consistency. But we can treat fetch-from-catalogd as a nice interim that should bring most of the performance and scalability benefits to users sooner and with less risk. I'll plan to update the original design document to reflect this in coming days. Thanks -Todd -- Todd Lipcon Software Engineer, Cloudera

Re: #pragma once?

2018-08-01 Thread Todd Lipcon
does seem cleaner and our GCC and Clang versions are modern > > enough to support it. > > > > What do people think about switching to that as the preferred way of > > including headers only once? > > > > - Tim > > > -- Todd Lipcon Software Engineer, Cloudera

Re: Re: Impala Incident Sharing

2018-08-01 Thread Todd Lipcon
rote: > >Hi Quanlong, > > > >Thank you for the incident note! You might be interested in > >https://gerrit.cloudera.org/#/c/10998/ which is adding some > instrumentation > >to make it easier to notice with monitoring tools that we're running out > of > >m

Re: Enabling automatic code review precommit job

2018-07-31 Thread Todd Lipcon
ged lines? Then we could more easily just run a single command to ensure that our patches are properly formatted before submitting to review. Or, at the very least, some instructions for running the same flake8-against-only-my-changed-lines that gerrit is running? -Todd > > > On Mon, Jul 30, 20

Re: Impala Incident Sharing

2018-07-31 Thread Todd Lipcon
ou for reading! > > > Further thoughts: > * We might need to mention in the docs that adding java UDFs may not use > the given jar file if the impala CLASSPATH already contains a jar file > containing the same class. > * We should avoid using Hive builtin UDFs and any other Java UDFs since > their memory is not tracked. > * How to track memory used in JVM? HBase, a pure java project, is able to > track its MemStore and BlockCache size. Can we learn from it? > > > Thanks, > Quanlong > -- > Quanlong Huang > Software Developer, Hulu -- Todd Lipcon Software Engineer, Cloudera

Re: Enabling automatic code review precommit job

2018-07-30 Thread Todd Lipcon
; test the basic mechanism for a bit now. > > > > It excludes docs/ commits. > > > > Let me know if you see any problems with it. > > > > Thanks, > > Tim > > > -- Todd Lipcon Software Engineer, Cloudera

Re: Order of error messages printed

2018-07-24 Thread Todd Lipcon
requirement. However, it's unclear to me if any client of Impala > > makes assumption about the ordering of the output in PrintErrorMap(). So, > > sending this email to the list in case anyone knows anything. > > > > -- > > Thanks, > > Michael > > > -- Todd Lipcon Software Engineer, Cloudera

Re: Broken Impala Build?

2018-07-18 Thread Todd Lipcon
, Pooja Nilangekar < pooja.nilange...@cloudera.com.invalid> wrote: > I am still having the same issue. (I haven't tried git clean either). > > Thanks, > Pooja > > > On Wed, Jul 18, 2018 at 5:30 PM Todd Lipcon > wrote: > > > Anyone else still having problems?

Re: Broken Impala Build?

2018-07-18 Thread Todd Lipcon
ble. > > On Wed, Jul 18, 2018 at 2:16 PM Fredy Wijaya wrote: > > > ​A CR to fix the issue: https://gerrit.cloudera.org/c/10981/ > > I'm running a dry-run to make sure everything is good with the new build > > number.​ > > > > On Wed, Jul 18, 2018 at 12:24

Re: Broken Impala Build?

2018-07-18 Thread Todd Lipcon
d338 > E0718 10:03:13.659101 23734 impala-server.cc:289] NoClassDefFoundError: > org/apache/hadoop/fs/Options$ChecksumCombineMode > E0718 10:03:13.659122 23734 impala-server.cc:292] Aborting Impala Server > startup due to improper configuration. Impalad exiting. > -- Todd Lipcon Software Engineer, Cloudera

Re: impalad running as superuser in tests

2018-07-18 Thread Todd Lipcon
On Tue, Jul 17, 2018 at 5:27 PM, Sailesh Mukil wrote: > On Tue, Jul 17, 2018 at 2:47 PM, Todd Lipcon > wrote: > > > Hey folks, > > > > I'm working on a regression test for IMPALA-7311 and found something > > interesting. It appears that in our normal minicl

impalad running as superuser in tests

2018-07-17 Thread Todd Lipcon
erent spoofed username so that the minicluster environment is more authentic to true cluster environments? We can do this easily by setting the HADOOP_USER_NAME environment variable or system property. -Todd -- Todd Lipcon Software Engineer, Cloudera

Re: Inconsistent handling of schema in Avro tables

2018-07-16 Thread Todd Lipcon
On Thu, Jul 12, 2018 at 5:07 PM, Bharath Vissapragada < bhara...@cloudera.com.invalid> wrote: > On Thu, Jul 12, 2018 at 12:03 PM Todd Lipcon > wrote: > > > > So, I think my proposal here is: > > > > 1. Query behavior on existing tables > > - If the tab

Re: Inconsistent handling of schema in Avro tables

2018-07-12 Thread Todd Lipcon
ormat is non-Avro, - AND the table contains column types incompatible with Avro (eg tinyint), - THEN disallow changing the file format of an existing partition to Avro -Todd On Wed, Jul 11, 2018 at 9:32 PM, Todd Lipcon wrote: > Turns out it's even a bit more messy. The presence of

Re: Inconsistent handling of schema in Avro tables

2018-07-11 Thread Todd Lipcon
though still I don't think that would iterate over all partitions in the case of a mixed table. -Todd On Wed, Jul 11, 2018 at 9:03 PM, Bharath Vissapragada < bhara...@cloudera.com.invalid> wrote: > Agreed. > > On Wed, Jul 11, 2018 at 8:55 PM Todd Lipcon > wrote: > > >

Re: Inconsistent handling of schema in Avro tables

2018-07-11 Thread Todd Lipcon
to do this. If > someone > > asked me to support a mixed avro/parquet table I would suggest they > create > > a view. If they kept insisting I would reply "Well it is your funeral." > > > > On Wed, Jul 11, 2018 at 7:51 PM, Todd Lipcon > > wrote: > >

Inconsistent handling of schema in Avro tables

2018-07-11 Thread Todd Lipcon
posed new behavior, we can avoid looking at all partitions. This is important for any metadata design which supports fine-grained loading of metadata to the coordinator. -Todd -- Todd Lipcon Software Engineer, Cloudera

Re: boost::scoped_ptr vs std::unique_ptr

2018-07-05 Thread Todd Lipcon
nfusion about what the best practice is, particularly > for > > > > people coming from other code bases. I personally like the > distinction, > > > but > > > > I don't feel that strongly about it. > > > > > > > > What do people think? Should we continue using scoped_ptr or move > away > > > from > > > > it. There is already a JIRA to make the change but we haven't done it > > > > because of the above reasons: > > > > https://issues.apache.org/jira/browse/IMPALA-3444 > > > > > > > > - Tim > > > > > > > > > > -- Todd Lipcon Software Engineer, Cloudera

Re: Proposal for a new approach to Impala metadata

2018-07-05 Thread Todd Lipcon
our 12,300 customers." Such a comment might be by a fellow Cloudera employee, or by someone at some other contributor. Happy to remove that heading if it seems like it's not inclusive. -Todd > On Fri, Jun 8, 2018 at 9:54 AM, Todd Lipcon wrote: > >> On Thu, Jun 7, 2018 at 9:

Re: Proposal for a new approach to Impala metadata

2018-06-08 Thread Todd Lipcon
the amount of load on source systems. The downsides, though, are: - extra hop between impalad and source system on a cache miss - extra complexity in failure cases (source system becomes a SPOF, and if we replicate it we have more complex consistency to worry about) - scalability benefits may only be re

Re: Palo, a fork of Impala (?), proposed for the incubator.

2018-06-08 Thread Todd Lipcon
wrote: > https://lists.apache.org/thread.html/74a3f3f945403b50515c658047d328 > 4955288a637207e4f97ecc15d1@%3Cgeneral.incubator.apache.org%3E > > I think it’s worth considering how the two communities could work together > for the benefit of all. > -- Todd Lipcon Software Engineer, Cloudera

Re: jenkins.impala.io account

2018-06-01 Thread Todd Lipcon
t-after-passing-build bit. > Will this give the proper "+1 verified" mark on a gerrit such that it can be committed? -Todd > On Fri, Jun 1, 2018 at 11:35 AM, Todd Lipcon wrote: > > > Hey folks, > > > > Would someone mind generating an account for me on

jenkins.impala.io account

2018-06-01 Thread Todd Lipcon
y review/commit privileges. Thanks -Todd -- Todd Lipcon Software Engineer, Cloudera

Re: Statically link Kudu client

2018-05-22 Thread Todd Lipcon
/kudu-table-sink.cc#L151 > However, when building with -static in ./buildall.sh, the kudu-client is > still linked dynamically (see `ldd be/build/latest/service/impalad`). Is > there a build option to link it statically? > > > Thanks, > Quanlong -- Todd Lipcon Software Engineer, Cloudera

Proposal for a new approach to Impala metadata

2018-05-22 Thread Todd Lipcon
Hey Impala devs, Over the past 3 weeks I have been investigating various issues with Impala's treatment of metadata. Based on data from a number of user deployments, and after discussing the issues with a number of Impala contributors and committers, I've come up with a proposal for a new design.

Re: Re: Best practice to compile and deploy Impala

2018-04-30 Thread Todd Lipcon
he jars. Vice versa, if I only updated something in the front end I'd only re-copy the FE jar to the cluster. That way you only pay the expensive deployment step the first time you set up your cluster. -Todd > At 2018-04-26 00:49:05, "Todd Lipcon" wrote: > >Hi Quanlong,

Re: Best practice to compile and deploy Impala

2018-04-25 Thread Todd Lipcon
really needed. This work is tedious and prone to errors. > > > Is there a best practice for packaging and distributing the binaries after > compiling? > > > Thanks, > Quanlong -- Todd Lipcon Software Engineer, Cloudera

Re: Help with fixing clang-diagnostic-over-aligned

2018-02-14 Thread Todd Lipcon
of alignment and the > default allocator only guarantees 16 bytes > [clang-diagnostic-over-aligned] > boost::shared_ptr impala_server(new > ImpalaServer(&exec_env)); > -- Todd Lipcon Software Engineer, Cloudera

Re: ORC scanner - points for discussion

2018-02-09 Thread Todd Lipcon
-experimental-flags. Users who decide to flip that on are explicitly acknowledging that they're treading on some unproven ground and they might get crashes, correctness issues, etc. Of course the goal should be that, if after a release or two of use the feedback from users is that it works well and few issues have been found, I'd expect it to be marked non-experimental. -Todd -- Todd Lipcon Software Engineer, Cloudera