Re: PySpark 2.1 Not instantiating properly

2017-10-20 Thread Jagat Singh
Do you have winutils in your system relevant for your system. This SO post has infomation related https://stackoverflow.com/questions/34196302/the-root-scratch-dir-tmp-hive-on-hdfs-should-be-writable-current-permissions On 21 October 2017 at 03:16, Marco Mistroni wrote:

Re: Spark Job trigger in production

2016-07-18 Thread Jagat Singh
You can use following options * spark-submit from shell * some kind of job server. See spark-jobserver for details * some notebook environment See Zeppelin for example On 18 July 2016 at 17:13, manish jaiswal wrote: > Hi, > > > What is the best approach to trigger

Re: Broadcast hash join implementation in Spark

2016-07-09 Thread Jagat Singh
Hi, Please see the property spark.sql.autoBroadcastJoinThreshold here http://spark.apache.org/docs/latest/sql-programming-guide.html#other-configuration-options Thanks, Jagat Singh On Sat, Jul 9, 2016 at 9:50 AM, Lalitha MV <lalitham...@gmail.com> wrote: > Hi, > > 1. What

Re: spark 1.6.0 connect to hive metastore

2016-02-09 Thread Jagat Singh
Hi, I am using by telling Spark about hive version we are using. This is done by setting following properties spark.sql.hive.version spark.sql.hive.metastore.jars Thanks On Wed, Feb 10, 2016 at 7:39 AM, Koert Kuipers wrote: > hey thanks. hive-site is on classpath in conf

Stop Spark yarn-client job

2015-11-26 Thread Jagat Singh
Hi, What is the correct way to stop fully the Spark job which is running as yarn-client using spark-submit. We are using sc.stop in the code and can see the job still running (in yarn resource manager) after final hive insert is complete. The code flow is start context do somework insert to

Re: Spark and Spring Integrations

2015-11-15 Thread Jagat Singh
Not direct answer to your question. But It might be useful for you to check Spring XD Spark integration. https://github.com/spring-projects/spring-xd-samples/tree/master/spark-streaming-wordcount-java-processor On Mon, Nov 16, 2015 at 6:14 AM, Muthu Jayakumar wrote: > I

Re: Spark thrift service and Hive impersonation.

2015-10-05 Thread Jagat Singh
Hello Steve, Thanks for confirmation. Is there any work planned work on this. Thanks, Jagat Singh On Wed, Sep 30, 2015 at 9:37 PM, Vinay Shukla <vinayshu...@gmail.com> wrote: > Steve is right, > The Spark thing server does not profs page end user identity down

Spark thrift service and Hive impersonation.

2015-09-29 Thread Jagat Singh
Hi, I have started the Spark thrift service using spark user. Does each user needs to start own thrift server to use it? Using beeline i am able to connect to server and execute show tables; However when we try to execute some real query it runs as spark user and HDFS permissions does not

Re: Spark thrift service and Hive impersonation.

2015-09-29 Thread Jagat Singh
is trying to read as spark user , using which we started thrift server. Since spark user does not have actual read access we get the error. However the beeline is used by end user not spark user and throws error. Thanks, Jagat Singh On Wed, Sep 30, 2015 at 11:24 AM, Mohammed Guller <moham...@

Re: Spark 1.5.0 java.lang.OutOfMemoryError: PermGen space

2015-09-12 Thread Jagat Singh
Sorry to answer your question fully. The job starts tasks and few of them fail and some are successful. The failed one have that PermGen error in logs. But ultimately full job is marked fail and session quits. On Sun, Sep 13, 2015 at 10:48 AM, Jagat Singh <jagatsi...@gmail.com> wrote:

Re: Spark 1.5.0 java.lang.OutOfMemoryError: PermGen space

2015-09-12 Thread Jagat Singh
luster or after ran > some queries? > > Is this in local mode or cluster mode? > > On Fri, Sep 11, 2015 at 3:00 AM, Jagat Singh <jagatsi...@gmail.com> wrote: > > Hi, > > > > We have queries which were running fine on 1.4.1 system. > > > > We are tes

Spark 1.5.0 java.lang.OutOfMemoryError: PermGen space

2015-09-11 Thread Jagat Singh
Hi, We have queries which were running fine on 1.4.1 system. We are testing upgrade and even simple query like val t1= sqlContext.sql("select count(*) from table") t1.show This works perfectly fine on 1.4.1 but throws OOM error in 1.5.0 Are there any changes in default memory settings from

Re: insert Hive table with RDD

2015-03-03 Thread Jagat Singh
Will this recognize the hive partitions as well. Example insert into specific partition of hive ? On Tue, Mar 3, 2015 at 11:42 PM, Cheng, Hao hao.ch...@intel.com wrote: Using the SchemaRDD / DataFrame API via HiveContext Assume you're using the latest code, something probably like: val hc

Spark based ETL pipelines

2015-02-11 Thread Jagat Singh
Hi, I want to work on some use case something like below. Just want to know if something similar has been already done which can be reused. Idea is to use Spark for ETL / Data Science / Streaming pipeline. So when data comes inside the cluster front door we will do following steps 1) Upload

Re: Why RDD is not cached?

2014-10-28 Thread Jagat Singh
What setting you are using for persist() or cache() http://spark.apache.org/docs/latest/programming-guide.html#rdd-persistence On Tue, Oct 28, 2014 at 6:18 PM, shahab shahab.mok...@gmail.com wrote: Hi, I have a standalone spark , where the executor is set to have 6.3 G memory , as I am

Re: Problem with running LogisticRegression in spark cluster mode

2014-04-09 Thread Jagat Singh
Hi Jenny, How are you packaging your jar. Can you please confirm if you have included the Mlib jar inside the fat jar you have created for your code. libraryDependencies += org.apache.spark % spark-mllib_2.9.3 % 0.8.1-incubating Thanks, Jagat Singh On Thu, Apr 10, 2014 at 8:05 AM, Jenny