Column Statistics with Parquet

2014-07-24 Thread Suma Shivaprasad
I am trying to enable Column statistics usage with Parquet tables. This is the query I am executing. However on explain, I see that even though *Basic stats: COMPLETE *is seen *Column stats *is seen as* NONE.* Can someone please explain what else I need to debug/fix this. set

Column Statistics with Parquet

2014-07-24 Thread Suma Shivaprasad
I am trying to enable Column statistics usage with Parquet tables. This is the query I am executing. However on explain, I see that even though *Basic stats: COMPLETE *is seen *Column stats *is seen as* NONE.* Can someone please explain what else I need to debug/fix this. set

Re: Hive UDF gives duplicate result regardless of parameters, when nested in a subquery

2014-07-24 Thread Navis류승우
Looks like it's caused by HIVE-7314. Could you try that with hive.cache.expr.evaluation=false? Thanks, Navis 2014-07-24 14:34 GMT+09:00 丁桂涛(桂花) dinggui...@baixing.com: Yes. The output is correct: [tp,p,sp]. I developed the UDF using JAVA in eclipse and exported the jar file into the auxlib

Column Stats with parquet

2014-07-24 Thread Suma Shivaprasad
I am trying to enable Column statistics usage with Parquet tables. This is the query I am executing. However on explain, I see that even though *Basic stats: COMPLETE *is seen *Column stats *is seen as* NONE.* Can someone please explain what else I need to debug/fix this. set

create table / data type syntax for csv files with comma in the column

2014-07-24 Thread Vidya Sujeet
Hello, I have a csv file that has columns which contains commas within a string enclosed with a . ex: column name:*'Issue' *value:*Other (phone, health club, etc)* *Question:* What should the data type of 'Issue' be? Or how should I format the table (row format delimited terminated by) so that

[HELP]Hive Statistics

2014-07-24 Thread Navdeep Agrawal
Stuck .need help I created a small table with multiple partition desc (id int ,term int) partitioned by id ,whenever I run analyze on any id I am getting perfectly good answers . I am unable to figure out the difference each file is making . New table Table Parameters:

Re: Hive UDF gives duplicate result regardless of parameters, when nested in a subquery

2014-07-24 Thread 丁桂涛(桂花)
Yeah. After setting hive.cache.expr.evaluation=false, all queries output expected results. And I found that it's related to the getDisplayString function in the UDF. At first the function returns a string regardless of its parameters. And I had to set hive.cache.expr.evaluation = false. But

Fwd: Column Stats with parquet

2014-07-24 Thread Suma Shivaprasad
I am trying to enable Column statistics usage with Parquet tables. This is the query I am executing. However on explain, I see that even though *Basic stats: COMPLETE *is seen *Column stats *is seen as* NONE.* Can someone please explain what else I need to debug/fix this. set

Re: Hive shell code exception, urgent help needed

2014-07-24 Thread Sarfraz Ramay
Can anyone please help with this ? [image: Inline image 1] i followed the advice here http://stackoverflow.com/questions/20390217/mapreduce-job-in-headless-environment-fails-n-times-due-to-am-container-exceptio and added to mapred-site.xml following properties but still getting the same error.

A question about SessionManager

2014-07-24 Thread Zhanghe (D)
Hey Guys, I'm working with HiveServer2. I know the HiveServer holds a session for each client, and close it when the client execute 'CloseSession'. But if the client is forced to terminate, like Ctrl+Z or kill -9, the session in HiveServer will not be closed. Does there exists a

Column Statistics with Parquet

2014-07-24 Thread Suma Shivaprasad
I am trying to enable Column statistics usage with Parquet tables. This is the query I am executing. However on explain, I see that even though *Basic stats: COMPLETE *is seen *Column stats *is seen as* NONE.* Can someone please explain what else I need to debug/fix this. set

Re: A question about SessionManager

2014-07-24 Thread Navis류승우
https://issues.apache.org/jira/browse/HIVE-5799 is for that kind of cases, but not included in releases yet. Thanks, Navis 2014-07-24 20:04 GMT+09:00 Zhanghe (D) crane.zh...@huawei.com: Hey Guys, I'm working with HiveServer2. I know the HiveServer holds a session for each client, and

Reg:Column Statistics with Parquet

2014-07-24 Thread Sandeep Samudrala
I am trying to enable Column statistics usage with Parquet tables. This is the query I am executing. However on explain, I see that even though *Basic stats: COMPLETE *is seen *Column stats *is seen as*NONE.* Can someone please explain what else I need to debug/fix this. set

RE: [HELP]Hive Statistics

2014-07-24 Thread Navdeep Agrawal
Well the problem exactly didn’t get solved but I observed this kind of behavior is persistent when I partition my table by date type otherwise its working . may its worth a issue . Thank you From: Navdeep Agrawal [mailto:navdeep_agra...@symantec.com] Sent: Thursday, July 24, 2014 1:22 PM To:

HIVE 0.12 SUM() returning NULL for decimal values

2014-07-24 Thread Abhishek Gayakwad
I am trying to aggregate one column of decimal type, which is returning me null. If I cast this column to double it returns me some value. following are the steps to recreate this scenario. CREATE TABLE salestemp(sku int, sales decimal); LOAD DATA LOCAL INPATH

python UDF and Avro tables

2014-07-24 Thread Kevin Weiler
Hi All, I hope I’m not duplicating a previous question, but I couldn’t find any search functionality for the user list archives. I have written a relatively simple python script that is meant to take a field from a hive query and transform it (just some string processing through a dict) given

Re: Reg:Column Statistics with Parquet

2014-07-24 Thread Prasanth Jayachandran
You have to explicit specifics column list in analyze command for gathering columns stats. This command will only collect basic stats like number of rows, total file size, raw data size, number of files. analyze table user_table partition(dt='2014-06-01',hour='00') compute statistics; To

does the HBase-Hive integration support using HBase index (primary key or secondary index) in the JOIN implementatoin?

2014-07-24 Thread Yang
if I do a join of a table based on txt file and a table based on HBase, and say the latter is very large, is HIVE smart enough to utilize the HBase table's index to do the join, instead of implementing this as a regular map reduce job, where each table is scanned fully, bucketed on join keys, and

Re: does the HBase-Hive integration support using HBase index (primary key or secondary index) in the JOIN implementatoin?

2014-07-24 Thread Yang
kind of found this http://hortonworks.com/blog/hbase-via-hive-part-1/ From a performance perspective, there are things Hive can do today (ie, not dependent on data types) to take advantage of HBase. There’s also the possibility of an HBase-aware Hive to make use of HBase tables as intermediate

RE: python UDF and Avro tables

2014-07-24 Thread java8964
Are you trying to read the Avro file directly in your UDF? If so, that is not the correct way to do it in UDF. Hive can support Avro file natively. Don't know your UDF requirement, but here is normally what I will do: Create the table in hive as using AvroContainerInputFormat create external

doing upsert possible?

2014-07-24 Thread Yang
if we have a huge table, and every 1 hour only 1% of that has some updates, it would be a huge waste to slurp in the whole table through MR job and write out the new table. instead, if we store this table in HBASE, and use the current HBase+Hive integration, as long as we can do upsert, then we

Re: doing upsert possible?

2014-07-24 Thread Juan Martin Pampliega
Hi Yang. That's correct. You should check out the HBase UDFs in Klout's Brickhouse library https://github.com/klout/brickhouse/tree/master/src/main/java/brickhouse/hbase On Jul 24, 2014 8:07 PM, Yang tedd...@gmail.com wrote: if we have a huge table, and every 1 hour only 1% of that has some

RE: does the HBase-Hive integration support using HBase index (primary key or secondary index) in the JOIN implementatoin?

2014-07-24 Thread java8964
I don't think Hbase-Hive integration part is that smart, be able to utilize the index existing in the HBase. But I think it depends on the version you are using. From my experience, there are a lot of improvement space in the Hbase-hive integration, especially push down logic into HBase engine.

CREATE TABLE throwing error for large number of columns (very long script)

2014-07-24 Thread azaz.rasool
I am trying to Create a table in Hive. It's a very long script contained large number of columns and also contains complex fields like STRUCT, ARRAY etc. * Cannot create full table in one shot using CREATE TABLE statement so need to first run CREATE and then ALTER * If fields

Re: CREATE TABLE throwing error for large number of columns (very long script)

2014-07-24 Thread Prasanth Jayachandran
What version of hive are you using? What file format are you using? Thanks Prasanth Jayachandran On Jul 24, 2014, at 5:03 PM, azaz.ras...@wipro.com azaz.ras...@wipro.com wrote: I am trying to Create a table in Hive. It’s a very long script contained large number of columns and also contains

Re: CREATE TABLE throwing error for large number of columns (very long script)

2014-07-24 Thread Juan Martin Pampliega
Are you using MySQL or Postgres for the Metastore database? On Jul 24, 2014 9:08 PM, Prasanth Jayachandran pjayachand...@hortonworks.com wrote: What version of hive are you using? What file format are you using? Thanks Prasanth Jayachandran On Jul 24, 2014, at 5:03 PM,

Re: does the HBase-Hive integration support using HBase index (primary key or secondary index) in the JOIN implementatoin?

2014-07-24 Thread Juan Martin Pampliega
The following article about using Klout's Brickhouse library to access an HBase table as a map through its key might be useful. http://brickhouseconfessions.wordpress.com/2013/08/06/squash-the-long-tail-with-brickhouses-hbase-udfs/ On Jul 24, 2014 8:56 PM, Andrew Mains andrew.ma...@kontagent.com

Re: Hive shell code exception, urgent help needed

2014-07-24 Thread Juan Martin Pampliega
Hi, The actual useful part of the error is: Execution Error, return code 2 from org.apache.hadoop.hive.ql.exec.mr.MapRedTask If you do a search for this plus EC2 in Google you will find a couple of results that point to memory exhaustion issues. You should try increasing the configurated memory

Re: HIVE 0.12 SUM() returning NULL for decimal values

2014-07-24 Thread 丁桂涛(桂花)
try select sum(sales) from salestemp where sales is not null; On Thu, Jul 24, 2014 at 11:10 PM, Abhishek Gayakwad a.gayak...@gmail.com wrote: I am trying to aggregate one column of decimal type, which is returning me null. If I cast this column to double it returns me some value. following

Re: Errors while creating a new table using existing table schema

2014-07-24 Thread Vidya Sujeet
thanks all. I created a new database and it works fine there.. On Sat, Jul 19, 2014 at 1:37 PM, Lefty Leverenz leftylever...@gmail.com wrote: And now it's documented in the DDL wiki: - Use Database

Re: Hive shell code exception, urgent help needed

2014-07-24 Thread Sarfraz Ramay
Hi, Thanks for your reply. Have been following links for the past two days now. Finally got hadoop natively compiled. Let's see if that solves the problem. Yes, increasing the memory was on my list but i think i tried that, didn't work. Memory can be issue as it is working perfectly fine for