Re: How to run large Hive queries in PySpark 1.2.1

2016-05-26 Thread Nikolay Voronchikhin
Hi Jörn, We will be upgrading to MapR 5.1, Hive 1.2, and Spark 1.6.1 at the end of June. In the meantime, still can this be done with these versions? There is not a firewall issue since we have edge nodes and cluster nodes hosted in the same location with the same NFS mount. On Thu, May 26, 20

Re: Hive and using Pooled Connections

2016-05-26 Thread Mich Talebzadeh
Thanks Alan. My Hive is version 2 transactional and its metastore is on Oracle. I saw this note stating .Using Oracle as the Metastore DB and "datanucleus.connectionPoolingType=BONECP" may generate intermittent "No such lock

Re: Copying all Hive tables from Prod to UAT

2016-05-26 Thread Mich Talebzadeh
That is a good point Jorn with regard to JDBC and Hive data I believe you can use JDBC to get a compressed data from an Oraclle or Sybase database cause decompression happens at the time of data access much like using a sqlplus or isql tool. However, it is worth trying what happens when one acces

Re: How to run large Hive queries in PySpark 1.2.1

2016-05-26 Thread Jörn Franke
Both have outdated versions, usually one can support you better if you upgrade to the newest. Firewall could be an issue here. > On 26 May 2016, at 10:11, Nikolay Voronchikhin > wrote: > > Hi PySpark users, > > We need to be able to run large Hive queries in PySpark 1.2.1. Users are > runni

Fwd: How to run large Hive queries in PySpark 1.2.1

2016-05-26 Thread Nikolay Voronchikhin
Hi PySpark users, We need to be able to run large Hive queries in PySpark 1.2.1. Users are running PySpark on an Edge Node, and submit jobs to a Cluster that allocates YARN resources to the clients. We are using MapR as the Hadoop Distribution on top of Hive 0.13 and Spark 1.2.1. Currently, our