Re: Yarn containers getting killed, error 52, multiple joins

2017-04-13 Thread Chen, Mingrui
1.5TB is incredible high. It doesn't seem to be a configuration problem. Could you paste the code snippet doing the loop and join task on the dataset? Best regards, From: rachmaninovquartet Sent: Thursday, April 13, 2017 10:08:40

Yarn containers getting killed, error 52, multiple joins

2017-04-13 Thread rachmaninovquartet
Hi, I have a spark 1.6.2 app (tested previously in 2.0.0 as well). It is requiring a ton of memory (1.5TB) for a small dataset (~500mb). The memory usage seems to jump, when I loop through and inner join to make the dataset 12 times as wide. The app goes down during or after this loop, when I try

Fwd: ERROR Dropping SparkListenerEvent

2017-04-13 Thread Patrick Gomes
Hey all, I was wondering if anyone could point me where to start debugging the following error: ERROR Dropping SparkListenerEvent because no remaining room in event queue. This likely means one of the SparkListeners is too slow and cannot keep up with the rate at which tasks are being started by

how to master cache and chekpoint for pyspark

2017-04-13 Thread issues solution
hi can ask you to give me example (complete) where : you use udf multiple time one after one and cache after that your data frame or you checkpoint dataframe according to appropriate steps (cache or checkpoint) thanks

Number of column in data frame

2017-04-13 Thread issues solution
Hi , the number of columns that spark can handle without fuss regards

How to coorect code after java.lang.stackoverflow

2017-04-13 Thread issues solution
Hi , i wonder if we have solution to correct code after getting stackoverflow error i mean you have df.<- transformation 1 df.<- transformation 12 df.<- transformation 3 df.<- transformation 4 . . . df.<- transformation 1n and : df.<- transformation n+1 get error stack overflow error how

commons.lang3.time incompatible

2017-04-13 Thread Mars Xu
Hi users, I got this error "java.io.InvalidClassException: org.apache.commons.lang3.time.FastDateParser; local class incompatible: stream classdesc serialVersionUID = 3, local class serialVersionUID = 2” when run a spark application to read from and write to a cvs file. my spark

Re: why we can t apply udf on rdd ???

2017-04-13 Thread Andrés Ivaldi
Hi, what Spark version are you using? Did you register the UDF? How are you using the UDF? Does the UDF support that data type as parameter? What I do with Spark 2.0 is -Create the UDF for each dataType I need -Register the UDF to sparkContext -I use UDF over dataFrame not RDD, you can convert it

checkpoint how to use correctly checkpoint with udf

2017-04-13 Thread issues solution
Hi , somone can explain me how i can use inPYSPAK not in scala chekpoint , Because i have lot of udf to apply on large data frame and i dont understand how i can use checkpoint to break lineag to prevent from java.lang.stackoverflow regrads

Re: checkpoint

2017-04-13 Thread ayan guha
Looks like your udf expects numeric data but you are sending string type. Suggest to cast to numeric. On Thu, 13 Apr 2017 at 7:03 pm, issues solution wrote: > Hi > I am newer in spark and i want ask you what wrang with checkpoint On > pyspark 1.6.0 > > i dont

why we can t apply udf on rdd ???

2017-04-13 Thread issues solution
hi what kind of orgine of this error ??? java.lang.UnsupportedOperationException: Cannot evaluate expression: PythonUDF#Grappra(input[410, StringType]) regrads

Hive Context and SQL Context interoperability

2017-04-13 Thread Deepak Sharma
Hi All, I have registered temp tables using hive context and sql context both. Now when i try to join these 2 temp tables , 1 of the tables complain about not being found. Is there any setting or option so the tables in these 2 different contexts are visible to each other? -- Thanks Deepak

checkpoint

2017-04-13 Thread issues solution
Hi I am newer in spark and i want ask you what wrang with checkpoint On pyspark 1.6.0 i dont unertsand what happen after i try to use it under datframe : dfTotaleNormalize24 = dfTotaleNormalize23.select([i if i not in listrapcot else udf_Grappra(F.col(i)).alias(i) for i in

Re: Why dataframe can be more efficient than dataset?

2017-04-13 Thread DB Tsai
There is a JIRA and prototype which analyzes the JVM bytecode in the black box, and convert the closures into catalyst expressions. https://issues.apache.org/jira/browse/SPARK-14083 This potentially can address the issue discussed here. Sincerely, DB Tsai

unsubscribe

2017-04-13 Thread tian zhang