Re: Hive From Spark: Jdbc VS sparkContext

2017-11-22 Thread Nicolas Paris
Hey Finally I improved a lot the spark-hive sql performances. I had some problem with some topology_script.py that made huge log error trace and reduced spark performances in python mode. I just corrected the python2 scripts to be python3 ready. I had some problem with broadcast variable while jo

Re: Hive From Spark: Jdbc VS sparkContext

2017-11-05 Thread ayan guha
Yes, my thought exactly. Kindly let me know if you need any help to port in pyspark. On Mon, Nov 6, 2017 at 8:54 AM, Nicolas Paris wrote: > Le 05 nov. 2017 à 22:46, ayan guha écrivait : > > Thank you for the clarification. That was my understanding too. However > how to > > provide the upper bou

Re: Hive From Spark: Jdbc VS sparkContext

2017-11-05 Thread Nicolas Paris
Le 05 nov. 2017 à 22:46, ayan guha écrivait : > Thank you for the clarification. That was my understanding too. However how to > provide the upper bound as it changes for every call in real life. For example > it is not required for sqoop.  True. AFAIK sqoop begins with doing a "select min(colu

Re: Hive From Spark: Jdbc VS sparkContext

2017-11-05 Thread ayan guha
Thank you for the clarification. That was my understanding too. However how to provide the upper bound as it changes for every call in real life. For example it is not required for sqoop. On Mon, 6 Nov 2017 at 8:20 am, Nicolas Paris wrote: > Le 05 nov. 2017 à 22:02, ayan guha écrivait : > > Can

Re: Hive From Spark: Jdbc VS sparkContext

2017-11-05 Thread Nicolas Paris
Le 05 nov. 2017 à 22:02, ayan guha écrivait : > Can you confirm if JDBC DF Reader actually loads all data from source to > driver > memory and then distributes to the executors? apparently yes when not using partition column > And this is true even when a > partition column is provided? No, in

Re: Hive From Spark: Jdbc VS sparkContext

2017-11-05 Thread ayan guha
Hi Can you confirm if JDBC DF Reader actually loads all data from source to driver memory and then distributes to the executors? And this is true even when a partition column is provided? Best Ayan On Mon, Nov 6, 2017 at 3:00 AM, David Hodeffi < david.hode...@niceactimize.com> wrote: > Testing

RE: Hive From Spark: Jdbc VS sparkContext

2017-11-05 Thread David Hodeffi
Testing Spark group e-mail Confidentiality: This communication and any attachments are intended for the above-named persons only and may be confidential and/or legally privileged. Any opinions expressed in this communication are not necessarily those of NICE Actimize. If this communication has

Re: Hive From Spark: Jdbc VS sparkContext

2017-11-05 Thread Nicolas Paris
Le 05 nov. 2017 à 14:11, Gourav Sengupta écrivait : > thanks a ton for your kind response. Have you used SPARK Session ? I think > that > hiveContext is a very old way of solving things in SPARK, and since then new > algorithms have been introduced in SPARK.  I will give a try out sparkSession.

Re: Hive From Spark: Jdbc VS sparkContext

2017-11-05 Thread Gourav Sengupta
Hi Nicolas, thanks a ton for your kind response. Have you used SPARK Session ? I think that hiveContext is a very old way of solving things in SPARK, and since then new algorithms have been introduced in SPARK. It will be a lot of help, given how kind you have been by sharing your experience, if

Re: Hive From Spark: Jdbc VS sparkContext

2017-11-05 Thread Nicolas Paris
Hi After some testing, I have been quite disapointed with hiveContext way of accessing hive tables. The main problem is resource allocation: I have tons of users and they get a limited subset of workers. Then this does not allow to query huge datasetsn because to few memory allocated (or maybe I

Re: Hive From Spark: Jdbc VS sparkContext

2017-10-15 Thread Gourav Sengupta
Hi Nicolas, without the hive thrift server, if you try to run a select * on a table which has around 10,000 partitions, SPARK will give you some surprises. PRESTO works fine in these scenarios, and I am sure SPARK community will soon learn from their algorithms. Regards, Gourav On Sun, Oct 15,

Re: Hive From Spark: Jdbc VS sparkContext

2017-10-15 Thread Nicolas Paris
> I do not think that SPARK will automatically determine the partitions. > Actually > it does not automatically determine the partitions. In case a table has a few > million records, it all goes through the driver. Hi Gourav Actualy spark jdbc driver is able to deal direclty with partitions. Spa

Re: Hive From Spark: Jdbc VS sparkContext

2017-10-15 Thread Nicolas Paris
Hi Gourav > what if the table has partitions and sub-partitions? well this also work with multiple orc files having same schema: val people = sqlContext.read.format("orc").load("hdfs://cluster/people*") Am I missing something? > And you do not want to access the entire data? This works for sta

Re: Hive From Spark: Jdbc VS sparkContext

2017-10-15 Thread Gourav Sengupta
Hi Nicolas, what if the table has partitions and sub-partitions? And you do not want to access the entire data? Regards, Gourav On Sun, Oct 15, 2017 at 12:55 PM, Nicolas Paris wrote: > Le 03 oct. 2017 à 20:08, Nicolas Paris écrivait : > > I wonder the differences accessing HIVE tables in two

Re: Hive From Spark: Jdbc VS sparkContext

2017-10-15 Thread Nicolas Paris
Le 03 oct. 2017 à 20:08, Nicolas Paris écrivait : > I wonder the differences accessing HIVE tables in two different ways: > - with jdbc access > - with sparkContext Well there is also a third way to access the hive data from spark: - with direct file access (here ORC format) For example: val sq

Re: Hive From Spark: Jdbc VS sparkContext

2017-10-13 Thread Kabeer Ahmed
My take on this might sound a bit different. Here are few points to consider below: 1. Going through Hive JDBC means that the application is restricted by the # of queries that can be compiled. HS2 can only compile one SQL at a time and if users have bad SQL, it can take a long time just to co

Re: Hive From Spark: Jdbc VS sparkContext

2017-10-13 Thread Nicolas Paris
> In case a table has a few > million records, it all goes through the driver. This sounds clear in JDBC mode, the driver get all the rows and then it spreads the RDD over the executors. I d'say that most use cases deal with SQL to aggregate huge datasets, and retrieve small amount of rows to be

Re: Hive From Spark: Jdbc VS sparkContext

2017-10-10 Thread Gourav Sengupta
. Regards, Gourav Sengupta On Tue, Oct 10, 2017 at 10:14 PM, weand wrote: > Is Hive from Spark via JDBC working for you? In case it does, I would be > interested in your setup :-) > > We can't get this working. See bug here, especially my last comment: > https://issues.apache.org/jir

RE: Hive From Spark: Jdbc VS sparkContext

2017-10-10 Thread Walia, Reema
To: user@spark.apache.org Subject: Re: Hive From Spark: Jdbc VS sparkContext [ External Email ] Is Hive from Spark via JDBC working for you? In case it does, I would be interested in your setup :-) We can't get this working. See bug here, especially my last comment: https://issues.apach

Re: Hive From Spark: Jdbc VS sparkContext

2017-10-10 Thread weand
Is Hive from Spark via JDBC working for you? In case it does, I would be interested in your setup :-) We can't get this working. See bug here, especially my last comment: https://issues.apache.org/jira/browse/SPARK-21063 Regards Andreas -- Sent from: http://apache-spark-user-list.10015

Re: Hive From Spark: Jdbc VS sparkContext

2017-10-10 Thread ayan guha
That is not correct, IMHO. If I am not wrong, Spark will still load data in executor, by running some stats on the data itself to identify partitions On Tue, Oct 10, 2017 at 9:23 PM, 郭鹏飞 wrote: > > > 在 2017年10月4日,上午2:08,Nicolas Paris 写道: > > > > Hi > > > > I wonder the differences accessing

Re: Hive From Spark: Jdbc VS sparkContext

2017-10-10 Thread 郭鹏飞
> 在 2017年10月4日,上午2:08,Nicolas Paris 写道: > > Hi > > I wonder the differences accessing HIVE tables in two different ways: > - with jdbc access > - with sparkContext > > I would say that jdbc is better since it uses HIVE that is based on > map-reduce / TEZ and then works on disk. > Using spark

Re: Hive From Spark: Jdbc VS sparkContext

2017-10-04 Thread ayan guha
Well the obvious point is security. Ranger and Sentry can secure jdbc endpoints only. For performance aspect, I am equally curious 🤓 On Wed, 4 Oct 2017 at 10:30 pm, Gourav Sengupta wrote: > Hi, > > I am genuinely curious to see whether any one responds to this question. > > Its very hard to shak

Re: Hive From Spark: Jdbc VS sparkContext

2017-10-04 Thread Gourav Sengupta
Hi, I am genuinely curious to see whether any one responds to this question. Its very hard to shake off JAVA, OOPs and JDBC's :) Regards, Gourav Sengupta On Tue, Oct 3, 2017 at 7:08 PM, Nicolas Paris wrote: > Hi > > I wonder the differences accessing HIVE tables in two different ways: > - w

Hive From Spark: Jdbc VS sparkContext

2017-10-03 Thread Nicolas Paris
Hi I wonder the differences accessing HIVE tables in two different ways: - with jdbc access - with sparkContext I would say that jdbc is better since it uses HIVE that is based on map-reduce / TEZ and then works on disk. Using spark rdd can lead to memory errors on very huge datasets. Anybody

Re: Error trying to connect to Hive from Spark (Yarn-Cluster Mode)

2016-09-17 Thread Mich Talebzadeh
all.) > > tcp1 0 53.244.194.223:2561253.244.194.221:10001 > CLOSE_WAIT - > > > > Thanks > > Anupama > > > > *From:* Mich Talebzadeh [mailto:mich.talebza...@gmail.com] > *Sent:* Saturday, September 17, 2016 12:36 AM > *To:* Gangadhar, A

RE: Error trying to connect to Hive from Spark (Yarn-Cluster Mode)

2016-09-17 Thread anupama . gangadhar
to connect to Hive from Spark (Yarn-Cluster Mode) Is your Hive Thrift Server up and running on port jdbc:hive2://10001? Do the following netstat -alnp |grep 10001 and see whether it is actually running HTH Dr Mich Talebzadeh LinkedIn https://www.linkedin.com/profile/view

RE: Error trying to connect to Hive from Spark (Yarn-Cluster Mode)

2016-09-17 Thread anupama . gangadhar
Hi, @Deepak I have used a separate user keytab(not hadoop services keytab) and able to connect to Hive via simple java program. I am able to connect to Hive from spark-shell as well. However when I submit a spark job using this same keytab, I see the issue. Do cache have a role to play here? In

Re: Error trying to connect to Hive from Spark (Yarn-Cluster Mode)

2016-09-16 Thread Deepak Sharma
Hi Anupama To me it looks like issue with the SPN with which you are trying to connect to hive2 , i.e. hive@hostname. Are you able to connect to hive from spark-shell? Try getting the tkt using any other user keytab but not hadoop services keytab and then try running the spark submit. Thanks

Re: Error trying to connect to Hive from Spark (Yarn-Cluster Mode)

2016-09-16 Thread Mich Talebzadeh
s technical content is explicitly disclaimed. The author will in no case be liable for any monetary damages arising from such loss, damage or destruction. On 16 September 2016 at 19:53, wrote: > Hi, > > > > I am trying to connect to Hive from Spark application in Kerborized >

Error trying to connect to Hive from Spark (Yarn-Cluster Mode)

2016-09-16 Thread anupama . gangadhar
Hi, I am trying to connect to Hive from Spark application in Kerborized cluster and get the following exception. Spark version is 1.4.1 and Hive is 1.2.1. Outside of spark the connection goes through fine. Am I missing any configuration parameters? ava.sql.SQLException: Could not open

Unable To access Hive From Spark

2016-04-15 Thread Amit Singh Hora
Hi All, I am trying to access hive from Spark but getting exception The root scratch dir: /tmp/hive on HDFS should be writable. Current permissions are: rw-rw-rw- Code :- String logFile = "hdfs://hdp23ha/logs"; // Should be som

create table in hive from spark-sql

2015-09-23 Thread Mohit Singh
Probably a noob question. But I am trying to create a hive table using spark-sql. Here is what I am trying to do: hc = HiveContext(sc) hdf = hc.parquetFile(output_path) data_types = hdf.dtypes schema = "(" + " ,".join(map(lambda x: x[0] + " " + x[1], data_types)) +")" hc.sql(" CREATE TABLE IF

Re: No suitable driver found error, Create table in hive from spark sql

2015-02-19 Thread Todd Nist
n internet. > I updated spark/bin/compute-classpath.sh and added database connector jar > into classpath. > CLASSPATH="$CLASSPATH:/data/mysql-connector-java-5.1.14-bin.jar" > > > > -- > View this message in context: > http://apache-spark-user-list.1001560.n3.nabble.

Re: No suitable driver found error, Create table in hive from spark sql

2015-02-18 Thread Dhimant
nabble.com/No-suitable-driver-found-error-Create-table-in-hive-from-spark-sql-tp21714p21715.html Sent from the Apache Spark User List mailing list archive at Nabble.com. - To unsubscribe, e-mail: user-unsubscr...@spark.apac

No suitable driver found error, Create table in hive from spark sql

2015-02-18 Thread Dhimant
No suitable driver found error, Create table in hive from spark sql. I am trying to execute following example. SPARKGIT: spark/examples/src/main/scala/org/apache/spark/examples/sql/hive/HiveFromSpark.scala My setup :- hadoop 1.6,spark 1.2, hive 1.0, mysql server (installed via yum install

RE: Hive From Spark

2014-08-25 Thread Andrew Lee
def equals(that: Any): Boolean = { > >>> > if( !that.isInstanceOf[MyRecord] ) > >>> > false > >>> > else { > >>> > val other = that.asInstanceOf[MyRecord] > >>> > this.getWritable == other.getWritable > >>&g

Re: Hive From Spark

2014-08-25 Thread Du Li
t;for >>> Spark job integrating with Hive? >>> How does people use spark-sql? I'm trying to understand the rationale >>>and >>> motivation behind this script, any idea? >>> >>> >>>> Date: Thu, 21 Aug 2014 16:31:08 -0700 >>&g

Re: Hive From Spark

2014-08-22 Thread Du Li
l? Is this more of a debugging tool >>for >> Spark job integrating with Hive? >> How does people use spark-sql? I'm trying to understand the rationale >>and >> motivation behind this script, any idea? >> >> >>> Date: Thu, 21 Aug 2

RE: Hive From Spark

2014-08-22 Thread Jeremy Chambers
: Hive From Spark Hopefully there could be some progress on SPARK-2420. It looks like shading may be the voted solution among downgrading. Any idea when this will happen? Could it happen in Spark 1.1.1 or Spark 1.1.2? By the way, regarding bin/spark-sql? Is this more of a debugging tool for Spark

Re: Hive From Spark

2014-08-22 Thread Marcelo Vanzin
understand the rationale and > motivation behind this script, any idea? > > >> Date: Thu, 21 Aug 2014 16:31:08 -0700 > >> Subject: Re: Hive From Spark >> From: van...@cloudera.com >> To: l...@yahoo-inc.com.invalid >> CC: user@spark.apache.org; u...@spark.

RE: Hive From Spark

2014-08-22 Thread Andrew Lee
with Hive? How does people use spark-sql? I'm trying to understand the rationale and motivation behind this script, any idea? > Date: Thu, 21 Aug 2014 16:31:08 -0700 > Subject: Re: Hive From Spark > From: van...@cloudera.com > To: l...@yahoo-inc.com.invalid > CC: user

Re: Hive From Spark

2014-08-21 Thread Marcelo Vanzin
tes = sc.sequenceFile(path, classOf[NullWritable], > classOf[BytesWritable]).first._2 > assert(rec.getWritable() == bytes) > > sc.stop() > System.clearProperty("spark.driver.port") > } > } > > > From: Andrew Lee > Reply-To: "user@spark.a

Re: Hive From Spark

2014-08-21 Thread Du Li
k.apache.org>> Date: Monday, July 21, 2014 at 10:27 AM To: "user@spark.apache.org<mailto:user@spark.apache.org>" mailto:user@spark.apache.org>>, "u...@spark.incubator.apache.org<mailto:u...@spark.incubator.apache.org>" mailto:u...@spark.incubator.apac

RE: Hive From Spark

2014-07-22 Thread Andrew Lee
rd to seeing Hive on Spark to work soon. Please let me know if there's any help or feedback I can provide. Thanks Sean. > From: so...@cloudera.com > Date: Mon, 21 Jul 2014 18:36:10 +0100 > Subject: Re: Hive From Spark > To: user@spark.apache.org > > I haven't seen an

Re: Hive From Spark

2014-07-21 Thread Sean Owen
I haven't seen anyone actively 'unwilling' -- I hope not. See discussion at https://issues.apache.org/jira/browse/SPARK-2420 where I sketch what a downgrade means. I think it just hasn't gotten a looking over. Contrary to what I thought earlier, the conflict does in fact cause problems in theory,

RE: Hive From Spark

2014-07-21 Thread Andrew Lee
Submit.scala:303) at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:55) at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala) > From: hao.ch...@intel.com > To: user@spark.apache.org; u...@spark.incubator.apache.org > Subject: RE: Hive From Spark > Date: Mon, 21 Jul

RE: Hive From Spark

2014-07-20 Thread Cheng, Hao
u...@spark.incubator.apache.org Subject: RE: Hive From Spark Hi Cheng Hao, Thank you very much for your reply. Basically, the program runs on Spark 1.0.0 and Hive 0.12.0 . Some setups of the environment are done by running "SPARK_HIVE=true sbt/sbt assembly/assembly", including t

Re: Hive From Spark

2014-07-19 Thread Silvio Fiorito
apache-spark-user-list.1001560.n3.nabble.com/Hive-From-Spark-tp10110.html Sent from the Apache Spark User List mailing list archive at Nabble.com.

RE: Hive From Spark

2014-07-18 Thread JiajiaJing
-- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Hive-From-Spark-tp10110p10215.html Sent from the Apache Spark User List mailing list archive at Nabble.com.

RE: Hive From Spark

2014-07-17 Thread Cheng, Hao
To: u...@spark.incubator.apache.org Subject: Hive From Spark Hello Spark Users, I am new to Spark SQL and now trying to first get the HiveFromSpark example working. However, I got the following error when running HiveFromSpark.scala program. May I get some help on this please? ERROR ME

Hive From Spark

2014-07-17 Thread JiajiaJing
1001560.n3.nabble.com/Hive-From-Spark-tp10110.html Sent from the Apache Spark User List mailing list archive at Nabble.com.