Re: Create an Empty dataframe

2018-07-08 Thread रविशंकर नायर
>From Stackoverflow: from pyspark.sql.types import StructType from pyspark.sql.types import StructField from pyspark.sql.types import StringType sc = SparkContext(conf=SparkConf()) spark = SparkSession(sc) # Need to use SparkSession(sc) to createDataFrame schema = StructType([ StructFiel

Re: spark-shell gets stuck in ACCEPTED state forever when ran in YARN client mode.

2018-07-08 Thread रविशंकर नायर
Are you able to run a simple Map Reduce job on yarn without any issues? If you have any issues: I had this problem on Mac. Use CSRUTIL in Mac, to disable it. Then add a softlink sudo ln –s /usr/bin/java/bin/java The new versions of Mac from EL Captain does not allow softlinks in /bin/java.

Re: Spark Structured Streaming is giving error “org.apache.spark.sql.AnalysisException: Inner join between two streaming DataFrames/Datasets is not supported;”

2018-05-15 Thread रविशंकर नायर
Hi Jacek, If we use RDD instead of Dataframe, can we accomplish the same? I mean, is joining between RDDS allowed in Spark streaming ? Best, Ravi On Sun, May 13, 2018 at 11:18 AM Jacek Laskowski wrote: > Hi, > > The exception message should be self-explanatory and says that you cannot > join

Re: Spark Structured Streaming is giving error “org.apache.spark.sql.AnalysisException: Inner join between two streaming DataFrames/Datasets is not supported;”

2018-05-12 Thread रविशंकर नायर
Perhaps this link might help you. https://stackoverflow.com/questions/48699445/inner-join-not-working-in-dataframe-using-spark-2-1 Best, Passion On Sat, May 12, 2018, 10:57 AM ThomasThomas wrote: > Hi There, > > Our use case is like this. > > We have a nested(multiple) JSON message flowing thr

Spark Streaming for more file types

2018-04-27 Thread रविशंकर नायर
All, I have the following methods in my scala code, currently executed on demand val files = sc.binaryFiles ("file:///imocks/data/ocr/raw") //Abive line takes all PDF files files.map(myconveter(_)).count myconverter signature: def myconverter ( file: (String, org.apach

Re: Open sourcing Sparklens: Qubole's Spark Tuning Tool

2018-03-21 Thread रविशंकर नायर
Excellent. You filled a missing link. Best, Passion On Wed, Mar 21, 2018 at 11:36 PM, Rohit Karlupia wrote: > Hi, > > Happy to announce the availability of Sparklens as open source project. It > helps in understanding the scalability limits of spark applications and > can be a useful guide on

Spark production scenario

2018-03-08 Thread रविशंकर नायर
Hi all, We are going to move to production with an 8 node Spark cluster. Request some help for below We are running on YARN cluster manager.That means YARN is installed with SSH between the nodes. When we run a standalone Spark program with spark-submit, YARN initializes a resource manager follow

Re: Spark StreamingContext Question

2018-03-07 Thread रविशंकर नायर
t mean you need to put them all in the same job. You can > (and should) still submit different jobs for different application concerns. > > kind regards, Gerard. > > > > On Wed, Mar 7, 2018 at 4:56 AM, ☼ R Nair (रविशंकर नायर) < > ravishankar.n...@gmail.com> wrote: >

Spark StreamingContext Question

2018-03-06 Thread रविशंकर नायर
Hi all, Understand from documentation that, only one streaming context can be active in a JVM at the same time. Hence in an enterprise cluster, how can we manage/handle multiple users are having many different streaming applications, one may be ingesting data from Flume, another from Twitter etc?

Re: Spark Dataframe and HIVE

2018-02-11 Thread रविशंकर नायर
) >> >> So the table looks good but this needs to be fixed before you can query >> the data in hive. >> >> Thanks >> Deepak >> >> On Sun, Feb 11, 2018 at 1:45 PM, ☼ R Nair (रविशंकर नायर) < >> ravishankar.n...@gmail.com> wrote: >> &

Re: Spark Dataframe and HIVE

2018-02-11 Thread रविशंकर नायर
Feb 11, 2018 at 1:38 PM, Deepak Sharma > wrote: > >> Try this in hive: >> alter table mine set locations "hdfs://localhost:8020/user/hi >> ve/warehouse/mine"; >> >> Thanks >> Deepak >> >> On Sun, Feb 11, 2018 at 1:24 PM, ☼ R Nair (रवि

Re: Spark Dataframe and HIVE

2018-02-11 Thread रविशंकर नायर
t;> >> >> >> LinkedIn * >> https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw >> <https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>* >> >> >> >> http://talebzadehmich.wordp

Re: Spark Dataframe and HIVE

2018-02-11 Thread रविशंकर नायर
, ☼ R Nair (रविशंकर नायर) < ravishankar.n...@gmail.com> wrote: > I have created it using Spark SQL. Then I want to retrieve from HIVE. > Thats where the issue is. I can , still retrieve from Spark. No problems. > Why HIVE is not giving me the data?? > > > > On Sun, Feb

Re: Spark Dataframe and HIVE

2018-02-11 Thread रविशंकर नायर
AAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw > <https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>* > > > > http://talebzadehmich.wordpress.com > > > *Disclaimer:* Use it at your own risk. Any and all responsibility for any > loss, damage or dest

Re: Spark Dataframe and HIVE

2018-02-10 Thread रविशंकर नायर
\":0}},{\"name\":\"hu_site_deductible\",\"type\":\"integer\",\"nullable\":true,\"metadata\":{\"name\":\"hu_site_deductible\",\"scale\":0}},{\"name\":\"fl_site_deductible\",\"type\":\&

Re: Spark Dataframe and HIVE

2018-02-10 Thread रविशंकर नायर
No, No luck. Thanks On Sun, Feb 11, 2018 at 12:48 AM, Deepak Sharma wrote: > In hive cli: > msck repair table 《table_name》; > > Thanks > Deepak > > On Feb 11, 2018 11:14, "☼ R Nair (रविशंकर नायर)" < > ravishankar.n...@gmail.com> wrote: > >> N

Re: Spark Dataframe and HIVE

2018-02-10 Thread रविशंकर नायर
gt; > Thanks > Deepak > > On Feb 11, 2018 11:06, "☼ R Nair (रविशंकर नायर)" < > ravishankar.n...@gmail.com> wrote: > >> All, >> >> Thanks for the inputs. Again I am not successful. I think, we need to >> resolve this, as this is a very comm

Re: Spark Dataframe and HIVE

2018-02-10 Thread रविशंकर नायर
\",\"type\":\"string\",\"nullable\":true,\"metadata\":{\"name\":\"line\",\"scale\":0}},{\"name\":\"construction\",\"type\":\"string\",\"nullable\":true,\"metadata\":{\&qu

Re: Spark Dataframe and HIVE

2018-02-09 Thread रविशंकर नायर
hanks On Fri, Feb 9, 2018 at 9:49 AM, ☼ R Nair (रविशंकर नायर) < ravishankar.n...@gmail.com> wrote: > All, > > It has been three days continuously I am on this issue. Not getting any > clue. > > Environment: Spark 2.2.x, all configurations are correct. hive-site.xml is &

Spark Dataframe and HIVE

2018-02-09 Thread रविशंकर नायर
All, It has been three days continuously I am on this issue. Not getting any clue. Environment: Spark 2.2.x, all configurations are correct. hive-site.xml is in spark's conf. 1) Step 1: I created a data frame DF1 reading a csv file. 2) Did manipulations on DF1. Resulting frame is passion_df.

Re: Spark Tuning Tool

2018-01-23 Thread रविशंकर नायर
Very good job, intact a missing link has been addressed. Any plan to porting to GITHUB, I would like to contribute. Best, RS On Tue, Jan 23, 2018 at 12:01 AM, Rohit Karlupia wrote: > Hi, > > I have been working on making the performance tuning of spark applications > bit easier. We have just

Re: Save hive table from spark in hive 2.1.0

2017-12-10 Thread रविशंकर नायर
Hi, Good try. As you can see, when you run upgrade using schematool, there is a duplicate column error. Can you please look the script generated and edit to avoid duplicate column? Not sure why the Hive guys made it complicated, I did face same issues like you. Can anyone else give a clean and b

Re: Save hive table from spark in hive 2.1.1

2017-12-09 Thread रविशंकर नायर
There is an option in Hive site.xml to ignore metadata validation. I mean try making below as false and try. Hive schematool also can help. hive.metastore.schema.verification true Best, Ravion On Dec 9, 2017 5:56 PM, "konu" wrote: Iam using Spark 2.2 with scala, hive 2.1.1 and zeppelin o

Re: Question on using pseudo columns in spark jdbc options

2017-12-07 Thread रविशंकर नायर
first question - could you just check and provide us the answer? :) > > Cheers, > Tomasz > > 2017-12-03 7:39 GMT+01:00 ☼ R Nair (रविशंकर नायर) < > ravishankar.n...@gmail.com>: > >> Hi all, >> >> I am using a query to fetch data from MYSQL as follows: >&g

Question on using pseudo columns in spark jdbc options

2017-12-02 Thread रविशंकर नायर
Hi all, I am using a query to fetch data from MYSQL as follows: var df = spark.read. format("jdbc"). option("url", "jdbc:mysql://10.0.0.192:3306/retail_db"). option("driver" ,"com.mysql.jdbc.Driver"). option("user", "retail_dba"). option("password", "cloudera"). option("dbtable", "orders"). optio

sqlContext vs spark.

2017-02-03 Thread रविशंकर नायर
All, In Spark 1.6.0, we used val jdbcDF = sqlContext.read.format(-) for creating a data frame through hsbc. In Spark 2.1.x, we have seen this is val jdbcDF = *spark*.read.format(-) Does that mean we should not be using sqlContext going forward? Also, we see that sqlContext is not auto

No Reducer scenarios

2017-01-29 Thread रविशंकर नायर
Dear all, 1) When we don't set the reducer class in driver program, IdentityReducer is invoked. 2) When we set setNumReduceTasks(0), no reducer, even IdentityReducer is invoked. Now, in the second scenario, we observed that the output is part-m-xx format(instead of part-r-xx format) , which sho

Re: Dataframe caching

2017-01-20 Thread रविशंकर नायर
Thanks, Will look into this. Best regards, Ravion -- Forwarded message -- From: "Muthu Jayakumar" Date: Jan 20, 2017 10:56 AM Subject: Re: Dataframe caching To: "☼ R Nair (रविशंकर नायर)" Cc: "user@spark.apache.org" I guess, this may help in your c

Dataframe caching

2017-01-20 Thread रविशंकर नायर
Dear all, Here is a requirement I am thinking of implementing in Spark core. Please let me know if this is possible, and kindly provide your thoughts. A user executes a query to fetch 1 million records from , let's say a database. We let the user store this as a dataframe, partitioned across the

Re: Spark build failure with com.oracle:ojdbc6:jar:11.2.0.1.0

2016-05-02 Thread रविशंकर नायर
Oracle jdbc is not part of Maven repository, are you keeping a downloaded file in your local repo? Best, RS On May 2, 2016 8:51 PM, "Hien Luu" wrote: > Hi all, > > I am running into a build problem with com.oracle:ojdbc6:jar:11.2.0.1.0. > It kept getting "Operation timed out" while building Spa

New Spark User Group in Florida

2016-03-06 Thread रविशंकर नायर
Hi Organizer, We have just started a new user group for Spark in Florida. Can you please add this entry in Spark community ? Thanks Florida Spark Meetup Best regards, R Nair.