Re: Total length of orc clustered table is always 2^31 in TezSplitGrouper

2018-07-24 Thread Gopal Vijayaraghavan
> Search ’Total length’ in log sys_dag_xxx, it is 2147483648. This is the INT_MAX “placeholder” value for uncompacted ACID tables. This is because with ACIDv1 there is no way to generate splits against uncompacted files, so this gets “an empty bucket + unknown number of inserts + updates” plac

Total length of orc clustered table is always 2^31 in TezSplitGrouper

2018-07-24 Thread 何宝宁
Hi, When I was tuning initial mapper number with Hive+Tez, found if orc table is clustered, total length return by estimator is always 2^31. Hive: 2.3.3 Tez: 0.8.4 (TezSplitGrouper.java:197) How to replicate: create table test (f1 string, f2 string) clustered by (f1) into 1 buckets stored as

Re: Ranger for standalone hive metastore

2018-07-24 Thread Vihang Karajgaonkar
I am not an expert in Ranger, but as long as Ranger uses HMS public APIs it should work. Some of the HMS APIs (get_partitions_by_expr) may not work if you don't have hive jars in the metastore classpath. However, this API is only used by Hive so I don't think that could be the cause of your issue.

Re: Ranger for standalone hive metastore

2018-07-24 Thread Sandhya Agarwal
Thank you so much for the response. However, I do not see any errors in both HMS / Ranger logs. But, just to clarify, I am assuming Ranger is supported even with the standalone hive metastore. On Mon, Jul 23, 2018 at 11:09 PM Vihang Karajgaonkar wrote: > I am not super-familiar with Ranger but d

Re: UDFJson cannot Make Progress and Looks Like Deadlock

2018-07-24 Thread Proust (Feng Guizhou) [FDS Payment and Marketing]
Just FYI, I'm able to make a custom UDF to apply the thread-safe code changes. Thanks a lot for your help Guizhou From: Proust (Feng Guizhou) [FDS Payment and Marketing] Sent: Tuesday, July 24, 2018 4:34:49 PM To: user@hive.apache.org Subject: Re: UDFJson cannot

Re: UDFJson cannot Make Progress and Looks Like Deadlock

2018-07-24 Thread Proust (Feng Guizhou) [FDS Payment and Marketing]
Thanks a lot for pointing this out, it makes the problem clear. For a quick workaround and low cost without upgrading, I'm considering to reimplement the UDF get_json_object to a new name to avoid the problem. Thanks Guizhou From: Peter Vary Sent: Tuesday, Jul

Re: UDFJson cannot Make Progress and Looks Like Deadlock

2018-07-24 Thread Peter Vary
Hi Guizhou, I would guess, that this is caused by: HIVE-16196 UDFJson having thread-safety issues Try to upgrade to a CDH version where this patch is already included (5.12.0 or later) Regards, Peter > On Jul 24, 2018, at 10:15, Proust (Feng

UDFJson cannot Make Progress and Looks Like Deadlock

2018-07-24 Thread Proust (Feng Guizhou) [FDS Payment and Marketing]
Hi, Hive Community We are running Hive on Spark with CDH Cluster: Apache Hive (version 1.1.0-cdh5.10.1) Sometimes(High Frequency) Hive Query could hang and does not make progress within UDFJson.evaluate An example Executor thread dump looks below 3 threads hang within java.util.HashMap$Tree

Re: what's the best practice to create an external hive table based on a csv file on HDFS with 618 columns in header?

2018-07-24 Thread Furcy Pin
Hello, To load your data as parquet, you can either: A. use spark: https://docs.databricks.com/spark/latest/data-sources/read-csv.html and write it directly as a parquet file (df.write.format("parquet").saveAsTable("parquet_table")) B. Load it as a csv file in Hive, and perform a CREATE TABLE par

?????? Using snappy compresscodec in hive

2018-07-24 Thread Zhefu Peng
Hi Gopal, Thanks for your reply! One more question, does the effect of using pure-java version is the same as that of using SnappyCodec? Or, in other words, is there any difference between these two methods, about the compression result and effect? Looking forward to your reply and help. B