Lifecycle of a map function

2020-04-07 Thread Vadim Vararu
Hi all, I'm trying to guess understand what is the lifecycle of a map function in spark/yarn context. My understanding is that function is instantiated on the master and then passed to each executor (serialized/deserialized). What I'd like to confirm is that the function is initiali

Pyspark error when converting string to timestamp in map function

2018-08-17 Thread Keith Chapman
Hi all, I'm trying to create a dataframe enforcing a schema so that I can write it to a parquet file. The schema has timestamps and I get an error with pyspark. The following is a snippet of code that exhibits the problem, df = sqlctx.range(1000) schema = StructType([StructField('a', TimestampTyp

Re: Issue with map function in Spark 2.2.0

2018-04-11 Thread ayan guha
As the error says clearly, column FL Date has a different format that you are expecting. Modify you date format mask appropriately On Wed, 11 Apr 2018 at 5:12 pm, @Nandan@ wrote: > Hi , > I am not able to use .map function in Spark. > > My codes are as below :- > > *1) Cre

Issue with map function in Spark 2.2.0

2018-04-11 Thread @Nandan@
Hi , I am not able to use .map function in Spark. My codes are as below :- *1) Create Parse function:-* from datetime import datetime from collections import namedtuple fields = ('date','airline','flightnum','origin','dest','dep'

Resource manage inside map function

2018-03-30 Thread Huiliang Zhang
Hi, I have a spark job which needs to access HBase inside a mapToPair function. The question is that I do not want to connect to HBase and close connection each time. As I understand, PairFunction is not designed to manage resources with setup() and close(), like Hadoop reader and writer. Does s

Re: Spark SQL within a DStream map function

2017-06-16 Thread Burak Yavuz
}) > > JavaEsSparkStreaming.saveToEs(parsedMetadataAndRows, > "index/type" // ... > > > But it appears I cannot get access to the sqlContext when I run this on > the spark cluster because that code is executing in the executor not in the > driver. > &g

Spark SQL within a DStream map function

2017-06-16 Thread Mike Hugo
"index/type" // ... But it appears I cannot get access to the sqlContext when I run this on the spark cluster because that code is executing in the executor not in the driver. Is there a way I can access or create a SqlContext to be able to pull the file down from S3 in my map function?

Re: create column with map function apply to dataframe

2017-04-14 Thread Ankur Srivastava
If I understand your question you should look at withColumn of dataframe api. df.withColumn("len", len("l")) Thanks Ankur On Fri, Apr 14, 2017 at 6:07 AM, issues solution wrote: > Hi , > how you can create column inside map function > > > lik

create column with map function apply to dataframe

2017-04-14 Thread issues solution
Hi , how you can create column inside map function like that : df.map(lambd l : len(l) ) . but instead return rdd we create column insde data frame .

Re: Reference External Variables in Map Function (Inner class)

2016-12-19 Thread mbayebabacar
Hello Marcelo, Finally what was the solution, I face the same problem. Thank you -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Reference-External-Variables-in-Map-Function-Inner-class-tp11990p28237.html Sent from the Apache Spark User List mailing list

Re: Why foreachPartition function make duplicate invocation to map function for every message ? (Spark 2.0.2)

2016-12-16 Thread Cody Koeninger
Please post a minimal complete code example of what you are talking about On Thu, Dec 15, 2016 at 6:00 PM, Michael Nguyen wrote: > I have the following sequence of Spark Java API calls (Spark 2.0.2): > > Kafka stream that is processed via a map function, which returns the string >

Why foreachPartition function make duplicate invocation to map function for every message ? (Spark 2.0.2)

2016-12-15 Thread Michael Nguyen
I have the following sequence of Spark Java API calls (Spark 2.0.2): 1. Kafka stream that is processed via a map function, which returns the string value from tuple2._2() for JavaDStream as in return tuple2._2(); 1. The returned JavaDStream is then processed by foreachPartition

Re: [DataFrames] map function - 2.0

2016-12-15 Thread Michael Armbrust
e of when using > this for production applications. Also is there a "non-experimental" way of > using map function on Dataframe in Spark 2.0 > > Thanks, > Ninad >

[DataFrames] map function - 2.0

2016-12-15 Thread Ninad Shringarpure
t;non-experimental" way of using map function on Dataframe in Spark 2.0 Thanks, Ninad

Re: How to return a case class in map function?

2016-11-03 Thread Yan Facai
version of Spark are you running? Have you tried >> 2.0.2? >> >> On Wed, Nov 2, 2016 at 12:01 AM, 颜发才(Yan Facai) wrote: >> >>> Hi, all. >>> When I use a case class as return value in map function, spark always >>> raise a ClassCastException. >

Re: How to return a case class in map function?

2016-11-03 Thread Yan Facai
; Hi, all. >> When I use a case class as return value in map function, spark always >> raise a ClassCastException. >> >> I write an demo, like: >> >> scala> case class Record(key: Int, value: String) >> >> scala> case class ID(key: Int) >

Re: How to return a case class in map function?

2016-11-02 Thread Michael Armbrust
Thats a bug. Which version of Spark are you running? Have you tried 2.0.2? On Wed, Nov 2, 2016 at 12:01 AM, 颜发才(Yan Facai) wrote: > Hi, all. > When I use a case class as return value in map function, spark always > raise a ClassCastException. > > I write an demo, like: > &

How to return a case class in map function?

2016-11-02 Thread Yan Facai
Hi, all. When I use a case class as return value in map function, spark always raise a ClassCastException. I write an demo, like: scala> case class Record(key: Int, value: String) scala> case class ID(key: Int) scala> val df = Seq(Record(1, "a"), Record(2, "b")

Re: Is there a way to dynamic load files [ parquet or csv ] in the map function?

2016-07-08 Thread Deepak Sharma
Yes .You can do something like this : .map(x=>mapfunction(x)) Thanks Deepak On 9 Jul 2016 9:22 am, "charles li" wrote: > > hi, guys, is there a way to dynamic load files within the map function. > > i.e. > > Can I code as b

Is there a way to dynamic load files [ parquet or csv ] in the map function?

2016-07-08 Thread charles li
hi, guys, is there a way to dynamic load files within the map function. i.e. Can I code as bellow: ​ thanks a lot. ​ -- *___* ​ ​ Quant | Engineer | Boy *___* *blog*:http://litaotao.github.io *github*: www.github.com/litaotao

Re: DataFrame Find/Filter Based on Input - Inside Map function

2015-07-02 Thread ayan guha
ted as below >> >> options.put("dbtable", "(select * from user) as account"); >> DataFrame accountRdd = >> sqlContext.read().format("jdbc").options(options).load(); >> >> and i have another RDD which contains login name and i want to find the

Re: DataFrame Find/Filter Based on Input - Inside Map function

2015-07-01 Thread Mailing List
t("jdbc").options(options).load(); >> >> and i have another RDD which contains login name and i want to find the >> userid from above DF RDD and return it >> >> Not sure how can i do that as when i apply a map function and say filter on >> DF i get Null pointor exception. >> >> Please help.

Re: DataFrame Find/Filter Based on Input - Inside Map function

2015-07-01 Thread ayan guha
Context.read().format("jdbc").options(options).load(); > > and i have another RDD which contains login name and i want to find the > userid from above DF RDD and return it > > Not sure how can i do that as when i apply a map function and say filter > on DF i get Null pointor exception. > > Please help. > > >

DataFrame Find/Filter Based on Input - Inside Map function

2015-07-01 Thread Ashish Soni
e userid from above DF RDD and return it Not sure how can i do that as when i apply a map function and say filter on DF i get Null pointor exception. Please help.

RE: Can a map function return null

2015-04-19 Thread Evo Eftimov
In fact you can return “NULL” from your initial map and hence not resort to Optional at all From: Evo Eftimov [mailto:evo.efti...@isecc.com] Sent: Sunday, April 19, 2015 9:48 PM To: 'Steve Lewis' Cc: 'Olivier Girardot'; 'user@spark.apache.org' Subject: RE:

RE: Can a map function return null

2015-04-19 Thread Evo Eftimov
Spark exception THEN as far as I am concerned, chess-mate From: Steve Lewis [mailto:lordjoe2...@gmail.com] Sent: Sunday, April 19, 2015 8:16 PM To: Evo Eftimov Cc: Olivier Girardot; user@spark.apache.org Subject: Re: Can a map function return null So you imagine something like this

Re: Can a map function return null

2015-04-19 Thread Steve Lewis
t from Samsung Mobile > > > Original message > From: Olivier Girardot > Date:2015/04/18 22:04 (GMT+00:00) > To: Steve Lewis ,user@spark.apache.org > Subject: Re: Can a map function return null > > You can return an RDD with null values inside, and afterward

Re: Can a map function return null

2015-04-19 Thread Evo Eftimov
Sent from Samsung Mobile Original message From: Olivier Girardot Date:2015/04/18 22:04 (GMT+00:00) To: Steve Lewis ,user@spark.apache.org Subject: Re: Can a map function return null You can return an RDD with null values inside, and afterwards filter on "item != null

Re: Can a map function return null

2015-04-18 Thread Olivier Girardot
false > } > }); > > My question is can I do a map returning the transformed data and null if > nothing is to be returned. as shown below - what does a Spark do with a map > function returning null > > JavaRDD words = original.map(new MapFunction() { >

Can a map function return null

2015-04-18 Thread Steve Lewis
t.add(transform(s)); return ret; // contains 0 items if isUsed is false } }); My question is can I do a map returning the transformed data and null if nothing is to be returned. as shown below - what does a Spark do with a map function returning null Ja

Re: using sparkContext from within a map function (from spark streaming app)

2015-03-08 Thread Sean Owen
ss.mapToPair(JavaDStreamLike.scala:146) > at > org.apache.spark.streaming.api.java.JavaPairDStream.mapToPair(JavaPairDStream.scala:46) > at streamReader.App.main(App.java:66) > > Is using the sparkContext from inside a map function wrong ? > > This is the code we are using: > S

using sparkContext from within a map function (from spark streaming app)

2015-03-08 Thread Daniel Haviv
at org.apache.spark.streaming.api.java.JavaDStreamLike$class.mapToPair(JavaDStreamLike.scala:146) at org.apache.spark.streaming.api.java.JavaPairDStream.mapToPair(JavaPairDStream.scala:46) at streamReader.App.main(App.java:66) Is using the sparkContext from inside a map function wro

Using sparkContext in inside a map function

2015-03-08 Thread danielil
at org.apache.spark.streaming.api.java.JavaDStreamLike$class.mapToPair(JavaDStreamLike.scala:146) at org.apache.spark.streaming.api.java.JavaPairDStream.mapToPair(JavaPairDStream.scala:46) at streamReader.App.main(App.java:66) Is using the sparkContext from inside a ma

Using sparkContext in inside a map function

2015-03-08 Thread danielil
at org.apache.spark.streaming.api.java.JavaDStreamLike$class.mapToPair(JavaDStreamLike.scala:146) at org.apache.spark.streaming.api.java.JavaPairDStream.mapToPair(JavaPairDStream.scala:46) at streamReader.App.main(App.java:66) Is using the sparkContext from inside a map function wro

Re: Manually trigger RDD map function without action

2015-01-12 Thread Sven Krasser
> -- > View this message in context: > http://apache-spark-user-list.1001560.n3.nabble.com/Manually-trigger-RDD-map-function-without-action-tp21094p21110.html > Sent from the Apache Spark User List mailing list archive at Nabble.com. > > --

Re: Manually trigger RDD map function without action

2015-01-12 Thread Kevin Jung
ion is the most efficient way. And a winner is foreach(a => a) according to everyone's expectations. Collect can cause OOM from driver and count is very slower than the others. Thanks all. -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Manually-trigge

Re: Manually trigger RDD map function without action

2015-01-12 Thread Cody Koeninger
regards >> Kevin >> >> -- >> If you reply to this email, your message will be added to the >> discussion below: >> >> http://apache-spark-user-list.1001560.n3.nabble.com/Manually-trigger-RDD-map-functi

Re: Manually trigger RDD map function without action

2015-01-12 Thread kevinkim
e will be added to the discussion > below: > > http://apache-spark-user-list.1001560.n3.nabble.com/Manually-trigger-RDD-map-function-without-action-tp21094.html > To unsubscribe from Apache Spark User List, click here > <http://apache-spark-user-list.1001560.n3.nabble.com/template/Na

Manually trigger RDD map function without action

2015-01-12 Thread Kevin Jung
Hi all Is there efficient way to trigger RDD transformations? I'm now using count action to achieve this. Best regards Kevin -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Manually-trigger-RDD-map-function-without-action-tp21094.html Sent from the A

Re: map function

2014-12-04 Thread Yifan LI
mio Windows Phone > Da: Yifan LI <mailto:iamyifa...@gmail.com> > Inviato: ‎04/‎12/‎2014 09:27 > A: user@spark.apache.org <mailto:user@spark.apache.org> > Oggetto: map function > > Hi, > > I have a RDD like below: > (1, (10, 20)) > (2, (30, 40, 10)) > (3,

R: map function

2014-12-04 Thread Paolo Platter
apache.org<mailto:user@spark.apache.org> Oggetto: map function Hi, I have a RDD like below: (1, (10, 20)) (2, (30, 40, 10)) (3, (30)) … Is there any way to map it to this: (10,1) (20,1) (30,2) (40,2) (10,2) (30,3) … generally, for each element, it might be mapped to multiple. Thanks in advance! Best, Yifan LI

Re: map function

2014-12-04 Thread Mark Hamstra
rdd.flatMap { case (k, coll) => coll.map { elem => (elem, k) } } On Thu, Dec 4, 2014 at 1:26 AM, Yifan LI wrote: > Hi, > > I have a RDD like below: > (1, (10, 20)) > (2, (30, 40, 10)) > (3, (30)) > … > > Is there any way to map it to this: > (10,1) > (20,1) > (30,2) > (40,2) > (10,2) > (30,3) >

map function

2014-12-04 Thread Yifan LI
Hi, I have a RDD like below: (1, (10, 20)) (2, (30, 40, 10)) (3, (30)) … Is there any way to map it to this: (10,1) (20,1) (30,2) (40,2) (10,2) (30,3) … generally, for each element, it might be mapped to multiple. Thanks in advance! Best, Yifan LI

Re: Keep state inside map function

2014-10-28 Thread Koert Kuipers
doing cleanup in an iterator like that assumes the iterator always gets fully read, which is not necessary the case (for example RDD.take does not). instead i would use mapPartitionsWithContext, in which case you can write a function of the form. f: (TaskContext, Iterator[T]) => Iterator[U] now

Re: How to use multi thread in RDD map function ?

2014-09-27 Thread Sean Owen
any idea to increase the concurrency ? > > Thanks for all > > > > > > -- > View this message in context: > http://apache-spark-user-list.1001560.n3.nabble.com/How-to-use-multi-thread-in-RDD-map-function-tp15286.html > Sent from the Apache Spark User List mailing l

Re: How to use multi thread in RDD map function ?

2014-09-27 Thread myasuka
ark-submit, there are two options which may be helpful > to your problem, they are "--total-executor-cores NUM"(standalone and > mesos only), "--executor-cores"(yarn only) > > > qinwei >  From: myasukaDate: 2014-09-28 11:44To: userSubject: How to use multi > thre

Re: How to use multi thread in RDD map function ?

2014-09-27 Thread qinwei
in the options of spark-submit, there are two options which may be helpful to your problem, they are "--total-executor-cores NUM"(standalone and mesos only), "--executor-cores"(yarn only) qinwei  From: myasukaDate: 2014-09-28 11:44To: userSubject: How to use mult

How to use multi thread in RDD map function ?

2014-09-27 Thread myasuka
D transformation. Is there any one can provide some example code to use Java multi thread in RDD transformation, or give any idea to increase the concurrency ? Thanks for all -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/How-to-use-multi-thread-in-

Re: Access file name in map function

2014-09-26 Thread Cheng Lian
On 9/26/14 6:45 PM, Shekhar Bansal wrote: Hi In one of our usecase, filename contains timestamp and we have to append it in the record for aggregation. How can I access filename in map function? Thanks! ​

Access file name in map function

2014-09-26 Thread Shekhar Bansal
Hi In one of our usecase, filename contains timestamp and we have to append it in the record for aggregation. How can I access filename in map function? Thanks!

Re: Reference External Variables in Map Function (Inner class)

2014-08-12 Thread Sunny Khatri
> >> I don't think static members are going to be serialized in the > >> closure? the instance of Parse will be looking at its local > >> SampleOuterClass, which is maybe not initialized on the remote JVM. > >> > >> On Tue, Aug 12, 2014 at 6:02 PM, Su

Re: Reference External Variables in Map Function (Inner class)

2014-08-12 Thread Marcelo Vanzin
lass, which is maybe not initialized on the remote JVM. >> >> On Tue, Aug 12, 2014 at 6:02 PM, Sunny Khatri >> wrote: >> > I have a class defining an inner static class (map function). The inner >> > class tries to refer the variable instantiated in th

Re: Reference External Variables in Map Function (Inner class)

2014-08-12 Thread Sunny Khatri
looking at its local > SampleOuterClass, which is maybe not initialized on the remote JVM. > > On Tue, Aug 12, 2014 at 6:02 PM, Sunny Khatri > wrote: > > I have a class defining an inner static class (map function). The inner > > class tries to refer the variable instant

Re: Reference External Variables in Map Function (Inner class)

2014-08-12 Thread Sean Owen
I don't think static members are going to be serialized in the closure? the instance of Parse will be looking at its local SampleOuterClass, which is maybe not initialized on the remote JVM. On Tue, Aug 12, 2014 at 6:02 PM, Sunny Khatri wrote: > I have a class defining an inner static cl

Reference External Variables in Map Function (Inner class)

2014-08-12 Thread Sunny Khatri
I have a class defining an inner static class (map function). The inner class tries to refer the variable instantiated in the outside class, which results in a NullPointerException. Sample Code as follows: class SampleOuterClass { private static ArrayList someVariable

Re: Keep state inside map function

2014-07-31 Thread Sean Owen
On Thu, Jul 31, 2014 at 2:11 AM, Tobias Pfeiffer wrote: > rdd.mapPartitions { partition => >// Some setup code here >val result = partition.map(yourfunction) > >// Some cleanup code here >result > } Yes, I realized that after I hit send. You definitely have to store and return the

Re: Keep state inside map function

2014-07-30 Thread Tobias Pfeiffer
Hi, On Thu, Jul 31, 2014 at 2:23 AM, Sean Owen wrote: > > ... you can run setup code before mapping a bunch of records, and > after, like so: > > rdd.mapPartitions { partition => >// Some setup code here >partition.map(yourfunction) >// Some cleanup code here > } > Please be careful

Re: Keep state inside map function

2014-07-30 Thread Kevin
#x27;t share state across Mappers, or Mappers and Reducers in > Hadoop. (At least there was no direct way.) Same here. But you can > maintain state across many map calls. > > On Wed, Jul 30, 2014 at 6:07 PM, Kevin wrote: > > Hi, > > > > Is it possible to maintain state

Re: Keep state inside map function

2014-07-30 Thread Sean Owen
nside a Spark map function? With Hadoop > MapReduce, Mappers and Reducers are classes that can have their own state > using instance variables. Can this be done with Spark? Are there any > examples? > > Most examples I have seen do a simple operating on the value passed into the > map fun

Re: Keep state inside map function

2014-07-30 Thread aaronjosephs
use mapPartitions to get the equivalent functionality to hadoop -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Keep-state-inside-map-function-tp10968p10969.html Sent from the Apache Spark User List mailing list archive at Nabble.com.

Keep state inside map function

2014-07-30 Thread Kevin
Hi, Is it possible to maintain state inside a Spark map function? With Hadoop MapReduce, Mappers and Reducers are classes that can have their own state using instance variables. Can this be done with Spark? Are there any examples? Most examples I have seen do a simple operating on the value

Re: Map Function does not seem to be executing over RDD

2014-07-09 Thread Yana Kadiyska
Does this line println("Retuning "+string) from the hash function print what you expect? If you're not seeing that output in the executor log I'd also put some debug statements in "case other", since your match in the "interesting" case is conditioned on if( fieldsList.contains(index)) -- maybe th

Map Function does not seem to be executing over RDD

2014-07-09 Thread Raza Rehman
Hello every one I am having some problem with a simple Scala/ Spark Code in which I am trying to replaces certain fields in a csv with their hashes class DSV (var line:String="",fieldsList:Seq[Int], var delimiter:String=",") extends Serializable { def hash(s:String):String={

Access original filename in a map function

2014-03-18 Thread Uri Laserson
Hi spark-folk, I have a directory full of files that I want to process using PySpark. There is some necessary metadata in the filename that I would love to attach to each record in that file. Using Java MapReduce, I would access (FileSplit) context.getInputSplit()).getPath().getName() in the s

Create a new object in pyspark map function

2014-02-28 Thread Kaden(Xiaozhe) Wang
Hi all, I try to create new object in the map function. But pyspark report a lot of error information. Is it legal to do so? Here is my codes: class Node(object): def __init__(self, A, B, C): self.A = A self.B = B self.C = C def make_vertex(pair): A, (B, C) = pair