Hi Gopal, Thanks for all the information and suggestion.
The Hive version is 2.0.1 and use Hive-on-MR as the execution engine.
I think I should create a intermediate table which includes all the
dimensions (including the serval kinds of ids), and then use spark-sql to
calculate the distinct value
> COUNT(DISTINCT monthly_user_id) AS monthly_active_users,
> COUNT(DISTINCT weekly_user_id) AS weekly_active_users,
…
> GROUPING_ID() AS gid,
> COUNT(1) AS dummy
There are two things which prevent Hive from optimize multiple count distincts.
Another aggregate like a count(1) or a Grouping sets li
-- Forwarded message --
From: panfei
Date: 2017-08-23 12:26 GMT+08:00
Subject: Fwd: How to optimize multiple count( distinct col) in Hive SQL
To: hive-...@hadoop.apache.org
-- Forwarded message --
From: panfei
Date: 2017-08-23 12:26 GMT+08:00
Subject: How to opt
Could you do recursive “ls” in your table or partition that you are trying to
read?
Most likely you have files that don’t follow expected naming convention
Eugene
From: Aviral Agarwal
Reply-To: "user@hive.apache.org"
Date: Tuesday, August 22, 2017 at 5:39 AM
To: "user@hive.apache.org"
Subjec
Dooh..thanx!
On Tue, Aug 22, 2017, 11:11 AM Alan Gates wrote:
> The address is at the top of the text description, even though it isn’t in
> the location field:
>
> 5470 Great America Parkway, Santa Clara, CA
>
> Alan.
>
> On Mon, Aug 21, 2017 at 5:50 PM, dan young wrote:
>
>> For us out of tow
The address is at the top of the text description, even though it isn’t in
the location field:
5470 Great America Parkway, Santa Clara, CA
Alan.
On Mon, Aug 21, 2017 at 5:50 PM, dan young wrote:
> For us out of town folks, where is the location of this meetup? Says
> Hortonworks but do you hav
Xuefu is planning to give a talk on Hive-on-Spark @Uber the user meetup
this week. We can check if can share the presentation on this list for
folks who can't attend the meetup.
https://www.meetup.com/Hive-User-Group-Meeting/events/242210487/
On Mon, Aug 21, 2017 at 11:44 PM, peter zhang
wrote:
Hi,
I am trying to read hive orc transaction table through Spark but I am
getting the following error
Caused by: java.lang.RuntimeException: serious problem
at
org.apache.hadoop.hive.ql.io.orc.OrcInputFormat.generateSplitsInfo(OrcInputFormat.java:1021)
at
org.apache.hadoop.hive.ql.io.orc.OrcInput
TL;DR - A Materialized view is a much more useful construct than trying to get
limited indexes to work.
That is pretty lively project which has been going on for a while with
Druid+LLAP
https://issues.apache.org/jira/browse/HIVE-14486
> This seems out of the blue but my initial benchmarks hav