Re: Hive 2.x usage

2016-09-14 Thread Mich Talebzadeh
Yep I agree with what Stephen said. I use Hive 2.0.1 and do not see an issue so far. We also use Hive on Spark engine and of course we can switch to MR at one command within the script. I do not subscribe to use open source and run for cover if things don't work. If you are knots and bolts type,

Re: Hive 2.x usage

2016-09-14 Thread Stephen Sprague
> * Are you using Hive-2.x at your org and at what scale? yes. we're using 2.1.0. 1.5PB. 30 node cluster. ~1000 jobs a day.And yeah hive 2.1.0 has some issues and can require some finesse wrt the hive-site.xml settings. > * Is the release stable enough? Did you notice any correctness issue

Re: Hive or Pig - Which one gives best performance for reading HBase data

2016-09-14 Thread Nagabhushanam Bheemisetty
Thanks Franke. Will try that On Wed, Sep 14, 2016 at 2:05 PM Jörn Franke wrote: > They should be rather similar, you may gain some performance using Tez or > Spark as an execution engine but in an export scenario do not expect much > performance improvements. > In any scenario avoid to have only

Re: Hive 2.x usage

2016-09-14 Thread Jörn Franke
If you are using a distribution (which you should if you go to production - Apache releases should not be used due to the maintainability, complexity and interaction with other components, such as Hadoop etc) then wait until a distribution with 2.x is out. As far as i am aware there is currently

Hive 2.x usage

2016-09-14 Thread RD
Hi Folks, We (at my org) are currently planning our move to Hive-2.x. As part of this I wanted to get a sense of how stable the Hive-2.x release is. I thought it would be good to conduct a brief survey on this. I've added a few questions below. It would really be a ton of help if folks could pr

Re: Hive or Pig - Which one gives best performance for reading HBase data

2016-09-14 Thread Jörn Franke
They should be rather similar, you may gain some performance using Tez or Spark as an execution engine but in an export scenario do not expect much performance improvements. In any scenario avoid to have only one reducer, but use several ones, e.g. by exporting to multiple output files instead o

Re: Hive On Spark - ORC Table - Hive Streaming Mutation API

2016-09-14 Thread Benjamin Schaff
Hi, Thanks for the answer. I am running on a custom build of spark 1.6.2 meaning the one given in the hive documentation so without hive jars. I set it up in hive-env.sh. I created the istari table like in the documentation and I run INSERT on it then a GROUP BY. Everything went on spark standal

Re: Hive On Spark - ORC Table - Hive Streaming Mutation API

2016-09-14 Thread Mich Talebzadeh
Hi, You are using Hive 2. What is the Spark version that runs as Hive execution engine? I cannot see spark.home in your hive-site.xml so I cannot figure it out. BTW you are using Spark standalone as the mode. I tend to use yarn-client. Now back to the above issue. Do other queries work OK with

RE: What's the best way to find the nearest neighbor in Hive? Any windowing function?

2016-09-14 Thread Markovitz, Dudu
It seems you’ll have to go with JOIN. Here are 2 options. Dudu select t0.id as id_0 ,min (named_struct ("dist",abs((t1.price - t0.price)/100) + abs((t1.number - t0.number)/

Hive On Spark - ORC Table - Hive Streaming Mutation API

2016-09-14 Thread Benjamin Schaff
Hi, After several days trying to figure out the problem I'm stuck with a class cast exception when running a query with hive on spark on orc tables that I updated with the streaming mutation api of hive 2.0. The context is the following: For hive: The version is the latest available from the we

Hive or Pig - Which one gives best performance for reading HBase data

2016-09-14 Thread Nagabhushanam Bheemisetty
Hi, I have a situation where I need to read data from huge HBase table and dump it into other location as a flat file. I am not interested in all the columns rather I need only lets 10 out of 100+ columns. So which technology Hive/Pig gives better performance. I believe both of them will use serde

Re: ACID transactions on data added from Spark not working

2016-09-14 Thread Mich Talebzadeh
Hi, I believe this is an issue with Spark handing transactional tables in Hive. When you add rows from Spark to ORC transactional table, Hive metadata tables HIVE_LOCKS and TXNS tables are not updated. This does not happen with Hive itself. As a result these new rows are left in an inconsistent s

ACID transactions on data added from Spark not working

2016-09-14 Thread Jack Wenger
Hi there, I'm trying to use ACID transactions in Hive but I have a problem when the data are added with Spark. First, I created a table with the following statement : __ CREATE TABLE testdb.test(id string, col1 s

Re: Sqoop: SQL Server to Hive import

2016-09-14 Thread Mich Talebzadeh
In all probability sqoop loses connection to the path of file on hdfs. if the file is there then you can create a hive external table to it and do an insert/select from that table to the target hive table. you can also bcp out data from MSSQL table scp the file into HDFS and load it from there in

Sqoop: SQL Server to Hive import

2016-09-14 Thread Priyanka Raghuvanshi
We are importing SQL server data into hive using Sqoop.Usually, it works but in a scenario throws following exception: FAILED: SemanticException Line 2:17 Invalid path ''hdfs://server_name.local:8020/user/root/_STC_CurrentLocation'': No files matching path hdfs://server_name.local:8020/user/r