RE: Unexpected query result

2017-08-21 Thread Frank Luo
One possibility is that count(*) gives a cached stat, while count(distinct field) actually read data and perform the logic. Try to set the below and test again: set hive.compute.query.using.stats=false; From: Igor Kuzmenko [mailto:f1she...@gmail.com] Sent: Monday, August 21, 2017 10:01 AM To:

RE: how to customize tez query app name

2017-01-20 Thread Frank Luo
So no one has a solution? From: Frank Luo Sent: Thursday, January 19, 2017 6:14 PM To: user@hive.apache.org Cc: Shylaja H. Nagenhalli Subject: how to customize tez query app name When running tez query, the “Applications” page shows the job as “HIVE-9c5a8bf1-911b-427a-8d16” for example, which

how to customize tez query app name

2017-01-19 Thread Frank Luo
When running tez query, the “Applications” page shows the job as “HIVE-9c5a8bf1-911b-427a-8d16” for example, which is not helpful, especially when a ton of jobs running at the same time. So is there any way to customize the app name? “mapreduce.job.name” works for M/R queries, not Tez. Thx!

hive.client.stats.publishers doesn't work in TEZ???

2016-10-19 Thread Frank Luo
I have set hive.client.stats.publishers=com.xxx.MyPublisher; The publisher is called in M/R but not in TEZ. Does anyone have an idea why or how to fix it? I am using Hive 1.2.1 Thx! Access the Q2 2016 Digital Marketing Report for a fresh set of trends and benchmarks in digital marketing<

how to connect to kerberosed beeline as myself instead of as 'hive'?

2016-10-13 Thread Frank Luo
I run beeline with connection string as “jdbc:hive2://…; principal=hive/_h...@realm.com”, meaning that the principle is “hive” instead of myself. I understand that when the actual job is launched, it finds the real user from Kerberos tickets, which is grea

how to dynamically find out hivesever2's host name?

2016-09-28 Thread Frank Luo
I am trying to use one set of scripts for different Hadoop clusters in different environments, for example DEV/QA/PROD environments with corresponding clusters. The difficulty I am facing is that the host name of hiveserver2 is a part of the connection url, which has to vary between environment

RE: How to obtain concurrent query executions

2016-09-28 Thread Frank Luo
If you are using Hadoop 2.7 or newer, you can use mapreduce.job.running.map.limit and mapreduce.job.running.reduce.limit to restrict map and reduce tasks at each job level. Another way is to use Scheduler to limit queue size. From: Jose Rozanec [mailto:jose.roza...@mercadolibre.com] Sent: Tuesd

RE: multiple selects on a left join give incorrect result

2016-05-03 Thread Frank Luo
8dfc2aac3731e4e5f0e8bd1b442be0e2 From: Frank Luo [mailto:j...@merkleinc.com] Sent: Wednesday, May 04, 2016 1:58 AM To: user@hive.apache.org<mailto:user@hive.apache.org> Cc: Rebecca Yang mailto:yiy...@merkleinc.com>> Subject: multiple selects on a left join give incorrect result All, I have found that when doin

multiple selects on a left join give incorrect result

2016-05-03 Thread Frank Luo
All, I have found that when doing a multiple selects on a left join, the “on” clause seems to be ignored!!! (It is hard to believe). Below is a very simple test case and please tell me I am crazy. I am on hdp 2.3.4.7. CREATE TABLE T_A ( idSTRING, val STRING ); CREATE TABLE T_B ( idS

how to convert a map to a string

2016-03-03 Thread Frank Luo
All, Is there a build-in UDF to convert a map to a string? I found str_to_map but could not find one to convert back. Thx in advance. Frank Luo This email and any attachments transmitted with it are intended for use by the intended recipient(s) only. If you have received this email in error

RE: /tmp/hive/hive is exceeded: limit=1048576 items=1048576

2016-02-22 Thread Frank Luo
Actual, the dir got nothing but 1MM empty sub-dirs. I am sure the name node won’t like it. Ø hadoop fs -du -s /tmp/hive/hive 0 /tmp/hive/hive From: Frank Luo Sent: Monday, February 22, 2016 12:26 PM To: user@hive.apache.org Subject: /tmp/hive/hive is exceeded: limit=1048576 items=1048576 Is

/tmp/hive/hive is exceeded: limit=1048576 items=1048576

2016-02-22 Thread Frank Luo
Is there a setting somewhere to automatically remove old temp files from /tmp/hive/hive? Otherwise, every hive user will be facing the problem and everyone has to develop something to fix it, righ? This email and any attachments transmitted with it are intended for use by the intended recipien

RE: bloom filter used in 0.14?

2016-02-03 Thread Frank Luo
Thank you all for this discussion. Very helpful. -Original Message- From: Gopal Vijayaraghavan [mailto:go...@hortonworks.com] On Behalf Of Gopal Vijayaraghavan Sent: Thursday, January 28, 2016 7:43 PM To: user@hive.apache.org Subject: Re: bloom filter used in 0.14? > So I am questioning

bloom filter used in 0.14?

2016-01-28 Thread Frank Luo
All, I have a huge table that I periodically want to do select on some particular value. For example, supposing I have a table for the entire world population. Then I know the id of “1234” is criminal, hence I want to pull out his information from the table. Without any optimization, I have to

how to set a job name for hive queries

2016-01-19 Thread Frank Luo
We are in a multi-tenant environment wanting to add a client’s name into each job name hence they can be informed/involved when job fails. We can easily do that with M/R jobs, but I haven’t figure out a way to do so for hive job. I googled and found the answer below, but I couldn’t get it to wor

RE: how to get counts as a byproduct of a query

2015-12-03 Thread Frank Luo
anual+WindowingAndAnalytics Select suba.X, suba.Y, suba.countA, subb.Z, subb.countB FROM (SELECT x, y, count(1) OVER (PARTITION BY X) as countA) suba JOIN (SELECT x, z, count(1) OVER (PARTITION BY X) as countB) subb ON (suba.X = subb.X) From: Frank Luo [mailto:j...@merkleinc.com] Sent: Wednesday, Decemb

RE: how to get counts as a byproduct of a query

2015-12-02 Thread Frank Luo
using SQL in hive? On 02 Dec 2015, at 21:26, Frank Luo mailto:j...@merkleinc.com>> wrote: Didn’t get any response, so trying one more time. I cannot believe I am the only one facing the problem. From: Frank Luo Sent: Tuesday, December 01, 2015 10:40 PM To: user@hive.apache.org<ma

RE: how to get counts as a byproduct of a query

2015-12-02 Thread Frank Luo
Didn’t get any response, so trying one more time. I cannot believe I am the only one facing the problem. From: Frank Luo Sent: Tuesday, December 01, 2015 10:40 PM To: user@hive.apache.org Subject: how to get counts as a byproduct of a query Very often I need to run a query against a table(s

how to get counts as a byproduct of a query

2015-12-01 Thread Frank Luo
Very often I need to run a query against a table(s), then collect some counts. I am wondering if there is a way to kill two birds by scanning the table once. (I don’t mind to save the counts as a separate file or something like that) For example, I got a table A and B. I need to do an inner join

Hive timeout while loading hashtable file?

2015-05-19 Thread Frank Luo
I got a pretty straight forward multi-table join that constantly time out on 300 secs limit without any other error. The last several lines in the log are as below, any hint what went wrong? From the log, it looks out failing on loading "hashtable file from tmp file". 19 12:36:37,332 INFO [main

resources used by a hive query

2015-05-18 Thread Frank Luo
Does anyone know how to find out how much resources used by a hive query in an automatically way? Looking at "Resource Manager UI", I might be able to find out how much one particular job takes in term of total time/# of MR/memory/cpu/etc. But if my query requires 20 phases, some M/R and some l

RE: MapredContext not available when tez enabled

2015-04-24 Thread Frank Luo
Cause found. I had "... limit 5" in the query. Once I take it out, the query runs fine. -Original Message----- From: Frank Luo [mailto:j...@merkleinc.com] Sent: Wednesday, April 22, 2015 10:50 AM To: user@hive.apache.org Subject: RE: MapredContext not available when tez enab

RE: MapredContext not available when tez enabled

2015-04-22 Thread Frank Luo
Gopal, Here is basically my code and I can clearly see configure() was not called and JavaCode on GenericUDF#configure reads: "This is only called in runtime of MapRedTask.". Also based on my observation, the query is not executed as a M/R because Yarn monitoring knows nothing about the job.

MapredContext not available when tez enabled

2015-04-21 Thread Frank Luo
We have a UDF to collect some counts during Hive execution. It has been working fine until tez is enabled. A bit digging shows that GenericUDF#configure method was not called. So in this case, is it possible to get counters through other means, or we have to implement Counter concept ourselves?

access counter info at the end of query execution

2014-09-30 Thread Frank Luo
All, I developed a UDF to increase counters in certain situations. However, I am not able to find a way to read the counter at the end a query run. I have looked at HiveDriverRunHook and ExecuteWithHookContext. Both class don't allow me to access counters. Is there a way to get around of this

need to know how to join tables with datatype of "map"

2014-06-20 Thread Frank Luo
I got two hive tables external to two HTables. Because one of HTable has sparse columns, I have to use Map as the datatype in Hive for that table. I'd like to know how to join two tables. Here is my sample: CREATE EXTERNAL TABLE RECORD(RecordID string, BatchDate string, Country string) STORED BY

RE: read a Hive Map without knowing keys

2014-03-18 Thread Frank Luo
at 5:06 PM, Frank Luo mailto:j...@merkleinc.com>> wrote: That is helpful, Thx! More dumb questions. What is "select *?" ? Also, any hints on how to find whether the map_keys contains a substring? For example, supposing the map_keys contains emails, I want to see if one of

RE: read a Hive Map without knowing keys

2014-03-17 Thread Frank Luo
On Mon, Mar 17, 2014 at 3:05 PM, Frank Luo mailto:j...@merkleinc.com>> wrote: Is there a way to read Hive Map datatype without knowing keys? According to Hive document, the only way to read a Map is to access through keys, ie: myMap['myKey']. However, in many cases, the ke

read a Hive Map without knowing keys

2014-03-17 Thread Frank Luo
Is there a way to read Hive Map datatype without knowing keys? According to Hive document, the only way to read a Map is to access through keys, ie: myMap['myKey']. However, in many cases, the keys are unknown, for example, HTable sparse columns, so in that kind of situation, what is the ways

RE: how does hive find where is MR job tracker

2013-05-28 Thread Frank Luo
ts the JobTracker from the mapred-site.xml specified within your $HADOOP_HOME/conf. Is your $HADOOP_HOME/conf/mapred-site.xml on the node that runs hive have the correct value for jobtracker? If not changing that to the right one might resolve your issue. Regards Bejoy KS Sent from remote device, P

how does hive find where is MR job tracker

2013-05-28 Thread Frank Luo
I have a cloudera cluster, version 4.2.0. In the hive configuration, I have "MapReduce Service" set to "mapreduce1", which is my MR service. However, without setting "mapred.job.tracker", whenever I run hive command, it always sends the job to a wrong job tracker. Here is the error: java.net.

how to limit mappers for a hive job

2013-04-24 Thread Frank Luo
I am trying to query a huge file with 370 blocks, but it errors out with message of "number of mappers exceeds limit" and my cluster has a "mapred.tasktracker.map.tasks.maximum" set to 50. I have tried to set parameters such as hive.exec.mappers.max/ mapred.tasktracker.tasks/ apred.tasktracker