RE: Hive on Spark Engine versus Spark using Hive metastore

2016-02-03 Thread Mich Talebzadeh
Hi Jeff, I only have a two node cluster. Is there anyway one can simulate additional parallel runs in such an environment thus having more than two maps? thanks Dr Mich Talebzadeh LinkedIn https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw

RE: Hive on Spark Engine versus Spark using Hive metastore

2016-02-03 Thread Mich Talebzadeh
Hi all, Thanks for all the comments on this thread. The question I put was simply to rectify technically the approaches with Spark on Hive metastore and Hive using Spark engine. The fact that we have both the benefits of Hive and Spark is tremendous. They both offer in their own

RE: Hive on Spark Engine versus Spark using Hive metastore

2016-02-03 Thread Mich Talebzadeh
I just did some further tests joining a 5 million rows FACT tables with 2 DIMENSION tables. SELECT t.calendar_month_desc, c.channel_desc, SUM(s.amount_sold) AS TotalSales FROM sales s, times t, channels c WHERE s.time_id = t.time_id AND s.channel_id = c.channel_id GROUP BY

Re: Hive on Spark Engine versus Spark using Hive metastore

2016-02-03 Thread Edward Capriolo
Thank you for the speech. There is an infinite list of things hive does not do/cant to well. There is an infinite list of things spark does not do /cant do well. Some facts: 1) spark has a complete fork of hive inside it. So its hard to trash hive without at least noting the fact that its a

Re: Hive on Spark Engine versus Spark using Hive metastore

2016-02-03 Thread Koert Kuipers
1) spark bundles hive-metastore and hive-exec to get access to the metastore and serdes. and i am pretty sure they would like to reduce this if they could given the kitchensink of dependencies that hive is, but that is not easy since hive was never written as re-usable java libraries. i imagine

RE: Hive on Spark Engine versus Spark using Hive metastore

2016-02-03 Thread Mich Talebzadeh
OK thanks. These are my new ENV settings based upon the availability of resources export SPARK_EXECUTOR_CORES=12 ##, Number of cores for the workers (Default: 1). export SPARK_EXECUTOR_MEMORY=5G ## , Memory per Worker (e.g. 1000M, 2G) (Default: 1G) export SPARK_DRIVER_MEMORY=2G ## , Memory

Re: Hive on Spark Engine versus Spark using Hive metastore

2016-02-03 Thread Stephen Sprague
i refuse to take anybody seriously who has a sig file longer than one line and that there is just plain repugnant. On Wed, Feb 3, 2016 at 1:47 PM, Mich Talebzadeh wrote: > I just did some further tests joining a 5 million rows FACT tables with 2 > DIMENSION tables. > > > >

HiveException: No type found for column type entry

2016-02-03 Thread Dave Maughan
Hi, We're currently experiencing an issue with a query against a table backed by ORC. Nothing special - any query causes it. We're currently using HDP 2.2.4.x, so Hive 0.14.0.2.2.4.x The error we're seeing in the logs is: Caused by: java.lang.RuntimeException: Error creating a batch at

Hive optimizer

2016-02-03 Thread Ashok Kumar
  Hi, Is Hive optimizer a cost based Optimizer (CBO) or a rule based optimizer (CBO) or none of them. thanks

Re: Hive optimizer

2016-02-03 Thread John Pullokkaran
Its both. Some of the optimizations are rule based and some are cost based. John From: Ashok Kumar > Reply-To: "user@hive.apache.org" >, Ashok Kumar

Re: GenericUDF

2016-02-03 Thread Jason Dere
Same flow - initialize() is called on the GenericUDF very soon after construction, by the same methods that created the UDF. Take a look at TypeCheckProcFactory or ExprNodeGenericFuncDesc From: Anirudh Paramshetti Sent: Tuesday, February

RE: bloom filter used in 0.14?

2016-02-03 Thread Frank Luo
Thank you all for this discussion. Very helpful. -Original Message- From: Gopal Vijayaraghavan [mailto:go...@hortonworks.com] On Behalf Of Gopal Vijayaraghavan Sent: Thursday, January 28, 2016 7:43 PM To: user@hive.apache.org Subject: Re: bloom filter used in 0.14? > So I am questioning