Re: How to run large Hive queries in PySpark 1.2.1

2016-05-26 Thread Nikolay Voronchikhin
Hi Jörn, We will be upgrading to MapR 5.1, Hive 1.2, and Spark 1.6.1 at the end of June. In the meantime, still can this be done with these versions? There is not a firewall issue since we have edge nodes and cluster nodes hosted in the same location with the same NFS mount. On Thu, May 26,

Re: Hive and using Pooled Connections

2016-05-26 Thread Mich Talebzadeh
Thanks Alan. My Hive is version 2 transactional and its metastore is on Oracle. I saw this note stating .Using Oracle as the Metastore DB and "datanucleus.connectionPoolingType=BONECP" may generate intermittent "No such

Re: Copying all Hive tables from Prod to UAT

2016-05-26 Thread Mich Talebzadeh
That is a good point Jorn with regard to JDBC and Hive data I believe you can use JDBC to get a compressed data from an Oraclle or Sybase database cause decompression happens at the time of data access much like using a sqlplus or isql tool. However, it is worth trying what happens when one

Re: How to run large Hive queries in PySpark 1.2.1

2016-05-26 Thread Jörn Franke
Both have outdated versions, usually one can support you better if you upgrade to the newest. Firewall could be an issue here. > On 26 May 2016, at 10:11, Nikolay Voronchikhin > wrote: > > Hi PySpark users, > > We need to be able to run large Hive queries in PySpark

Fwd: How to run large Hive queries in PySpark 1.2.1

2016-05-26 Thread Nikolay Voronchikhin
Hi PySpark users, We need to be able to run large Hive queries in PySpark 1.2.1. Users are running PySpark on an Edge Node, and submit jobs to a Cluster that allocates YARN resources to the clients. We are using MapR as the Hadoop Distribution on top of Hive 0.13 and Spark 1.2.1. Currently, our

Re: Copying all Hive tables from Prod to UAT

2016-05-26 Thread Elliot West
Hello, I've been looking at this recently for moving Hive tables from on-premise clusters to the cloud, but the principle should be the same for your use-case. If you wish to do this in an automated way, some tools worth considering are: - Hive's built in replication framework:

Re: Copying all Hive tables from Prod to UAT

2016-05-26 Thread Jörn Franke
Or use Falcon ... The Spark JDBC I would try to avoid. Jdbc is not designed for these big data bulk operations, eg data has to be transferred uncompressed and there is the serialization/deserialization issue query result -> protocol -> Java objects -> writing to specific storage format etc