Re: Container out of memory: ORC format with many dynamic partitions

2016-04-29 Thread Jörn Franke
I would still need some time to dig deeper in this. Are you using a specific distribution? Would it be possible to upgrade to a more recent Hive version? However, having so many small partitions is a bad practice which seriously affects performance. Each partition should at least contain several

Re: Hive configuration parameter hive.enforce.bucketing does not exist in Hive 2

2016-04-29 Thread Mich Talebzadeh
Ok thanks Lefty Dr Mich Talebzadeh LinkedIn * https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw * http://talebzadehmich.wordpress.com On 30 April 2016 at 02:23, Lefty Leverenz

Container out of memory: ORC format with many dynamic partitions

2016-04-29 Thread Matt Olson
Hi all, I am using Hive 1.0.1 and trying to do a simple insert into an ORC table, creating dynamic partitions. I am selecting from a table partitioned by dt and category, and inserting into a table partitioned by dt, title, and title_type. Other than the partitioning, the tables have the same sche

Re: Hive configuration parameter hive.enforce.bucketing does not exist in Hive 2

2016-04-29 Thread Lefty Leverenz
FYI, the removal of hive.enforce.bucketing is documented in the wiki ( hive.enforce.bucketing ) and the JIRA issue that removed it is HIVE-12331

Re: Disable Hive autogather optimization

2016-04-29 Thread Udit Mehta
thanks Mich. I will test this out and get back to you! On Fri, Apr 29, 2016 at 4:42 PM, Mich Talebzadeh wrote: > apologies should read "Udit" > > Dr Mich Talebzadeh > > > > LinkedIn * > https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw >

Re: Disable Hive autogather optimization

2016-04-29 Thread Mich Talebzadeh
apologies should read "Udit" Dr Mich Talebzadeh LinkedIn * https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw * http://talebzadehmich.wordpress.com On 30 April 2016 at 00:35, M

Re: Disable Hive autogather optimization

2016-04-29 Thread Mich Talebzadeh
Hi Unit, *For new tables* Disable stats autogathering in Hive when creating a new table and populating it SET hive.stats.autogather=false; *Already existing tables* As a work-around you can try this on the already existing tables by manually alter the numRows to -1 ALTER TABLE PARTITION S

Re: Disable Hive autogather optimization

2016-04-29 Thread Udit Mehta
Hi, Thanks for the replies. We have a scenario where we have an ETL job inserting into a table with thousands of partitions using dynamic partitioning. We have certain SLA's within which we would like the job to finish and sometimes there are scenarios where they are missed (extra data or a busy c

Re: Hive configuration parameter hive.enforce.bucketing does not exist in Hive 2

2016-04-29 Thread Mich Talebzadeh
Unfortunately that needs to be done or better the whole line removed in every hql code where it is set as true . Thanks Dr Mich Talebzadeh LinkedIn * https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw

Re: Hive configuration parameter hive.enforce.bucketing does not exist in Hive 2

2016-04-29 Thread Sergey Shelukhin
You can set hive.conf.validation to false to disable this :) From: Mich Talebzadeh mailto:mich.talebza...@gmail.com>> Reply-To: "user@hive.apache.org" mailto:user@hive.apache.org>> Date: Friday, April 29, 2016 at 11:16 To: user mailto:user@hive.apache.org>> Subject:

Re: Hive configuration parameter hive.enforce.bucketing does not exist in Hive 2

2016-04-29 Thread Mich Talebzadeh
Well having it in the old code causes the query to crash as well! Dr Mich Talebzadeh LinkedIn * https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw * http://talebzadehmich.wordpres

Re: Disable Hive autogather optimization

2016-04-29 Thread Pengcheng Xiong
Hi Udit, Could u be more specific about your problem? Like, what settings you have, what query you run and what is the result and what result do you expect? From what you said, my understanding is that, you want to wipe out the basic stats for existing tables? And, could u also let us kno

Re: Hive configuration parameter hive.enforce.bucketing does not exist in Hive 2

2016-04-29 Thread Sergey Shelukhin
This parameter has indeed been removed; it is treated as always true now, because setting it to false just produced incorrect tables. From: Mich Talebzadeh mailto:mich.talebza...@gmail.com>> Reply-To: "user@hive.apache.org" mailto:user@hive.apache.org>> Date: Friday

Analyzing Bitcoin blockchain data with Hive

2016-04-29 Thread Jörn Franke
Dear all, I prepared a small Serde to analyze Bitcoin blockchain data with Hive: https://snippetessay.wordpress.com/2016/04/28/hive-bitcoin-analytics-on-blockchain-data-with-sql/ There are some example queries, but I will add some in the future. Additionally, more unit tests will be added. Let m

Re: Disable Hive autogather optimization

2016-04-29 Thread Mich Talebzadeh
Hi Is this what is detailed in the following Jira Description Hive will collect table stats when set hive.stats.autogather=true during the INSERT OVERWRITE command. And then the users need to collect the column stats themselves using "Analyze" co

Re: Issue with correlated subqueries being case-sensitive

2016-04-29 Thread Mich Talebzadeh
yes sounds like a bug in parser. I am using Hive 2 hive> select count(1) from smallsales where exists(select 1 from sales_staging where smallsales.PROD_ID = sales_staging.PROD_ID); FAILED: SemanticException [Error 10250]: Line 1:59 Invalid SubQuery expression 'PROD_ID': For Exists/Not Exists opera

Re: Issue with correlated subqueries being case-sensitive

2016-04-29 Thread jack
Hi, It is just a string literal I used as an example. Replacing it with "1" does not affect the issue in any way Best regards, /jack On 04/29/2016 03:24 PM, Mich Talebzadeh wrote: Why not just try the standard way SELECT * FROM P WHERE EXISTS(SELECT 1 FROM B WHERE P.ID = B.ID

Re: Issue with correlated subqueries being case-sensitive

2016-04-29 Thread Mich Talebzadeh
Why not just try the standard way SELECT * FROM P WHERE EXISTS(SELECT 1 FROM B WHERE P.ID = B.ID) You don't need '*' that is not standard SQL as far as I know HTH Dr Mich Talebzadeh LinkedIn * https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw

Issue with correlated subqueries being case-sensitive

2016-04-29 Thread jack
Hi, I am having an issue with correlated sub-queries such as the following SELECT * FROM P WHERE EXISTS (SELECT '*' FROM B WHERE P.ID = B.ID) Both Beeline and Java JDBC client falied with the following message Error: Error while compiling statement: FAILED: SemanticException [Error 10250]: Li

solution structure followed in regular hive projects

2016-04-29 Thread mahender bigdata
HI, We are building Hive Project, we would like to know is there any project hierarchy of script maintained in repository. Currently We see huge list of HQL files. It is becoming unmanageable. If there is solution structure or project structure followed in regular hive projects. Thanks

Hive configuration parameter hive.enforce.bucketing does not exist in Hive 2

2016-04-29 Thread Mich Talebzadeh
Is the parameter --set hive.enforce.bucketing = true; depreciated in Hive 2 as it causes hql code not to work? hive> set hive.enforce.bucketing = true; Query returned non-zero code: 1, cause: hive configuration hive.enforce.bucketing does not exists. Dr Mich Talebzadeh LinkedIn * https://w