Hi all,
I'm trying to guess understand what is the lifecycle of a map function in
spark/yarn context. My understanding is that function is instantiated on the
master and then passed to each executor (serialized/deserialized).
What I'd like to confirm is that the function is
initiali
Hi all,
I'm trying to create a dataframe enforcing a schema so that I can write it
to a parquet file. The schema has timestamps and I get an error with
pyspark. The following is a snippet of code that exhibits the problem,
df = sqlctx.range(1000)
schema = StructType([StructField('a', TimestampTyp
As the error says clearly, column FL Date has a different format that you
are expecting. Modify you date format mask appropriately
On Wed, 11 Apr 2018 at 5:12 pm, @Nandan@
wrote:
> Hi ,
> I am not able to use .map function in Spark.
>
> My codes are as below :-
>
> *1) Cre
Hi ,
I am not able to use .map function in Spark.
My codes are as below :-
*1) Create Parse function:-*
from datetime import datetime
from collections import namedtuple
fields =
('date','airline','flightnum','origin','dest','dep'
Hi,
I have a spark job which needs to access HBase inside a mapToPair function. The
question is that I do not want to connect to HBase and close connection
each time.
As I understand, PairFunction is not designed to manage resources with
setup() and close(), like Hadoop reader and writer.
Does s
})
>
> JavaEsSparkStreaming.saveToEs(parsedMetadataAndRows,
> "index/type" // ...
>
>
> But it appears I cannot get access to the sqlContext when I run this on
> the spark cluster because that code is executing in the executor not in the
> driver.
>
&g
"index/type" // ...
But it appears I cannot get access to the sqlContext when I run this on the
spark cluster because that code is executing in the executor not in the
driver.
Is there a way I can access or create a SqlContext to be able to pull the
file down from S3 in my map function?
If I understand your question you should look at withColumn of dataframe
api.
df.withColumn("len", len("l"))
Thanks
Ankur
On Fri, Apr 14, 2017 at 6:07 AM, issues solution
wrote:
> Hi ,
> how you can create column inside map function
>
>
> lik
Hi ,
how you can create column inside map function
like that :
df.map(lambd l : len(l) ) .
but instead return rdd we create column insde data frame .
Hello Marcelo,
Finally what was the solution, I face the same problem.
Thank you
--
View this message in context:
http://apache-spark-user-list.1001560.n3.nabble.com/Reference-External-Variables-in-Map-Function-Inner-class-tp11990p28237.html
Sent from the Apache Spark User List mailing list
Please post a minimal complete code example of what you are talking about
On Thu, Dec 15, 2016 at 6:00 PM, Michael Nguyen
wrote:
> I have the following sequence of Spark Java API calls (Spark 2.0.2):
>
> Kafka stream that is processed via a map function, which returns the string
>
I have the following sequence of Spark Java API calls (Spark 2.0.2):
1. Kafka stream that is processed via a map function, which returns the
string value from tuple2._2() for JavaDStream as in
return tuple2._2();
1.
The returned JavaDStream is then processed by foreachPartition
e of when using
> this for production applications. Also is there a "non-experimental" way of
> using map function on Dataframe in Spark 2.0
>
> Thanks,
> Ninad
>
t;non-experimental" way of
using map function on Dataframe in Spark 2.0
Thanks,
Ninad
version of Spark are you running? Have you tried
>> 2.0.2?
>>
>> On Wed, Nov 2, 2016 at 12:01 AM, 颜发才(Yan Facai) wrote:
>>
>>> Hi, all.
>>> When I use a case class as return value in map function, spark always
>>> raise a ClassCastException.
>
; Hi, all.
>> When I use a case class as return value in map function, spark always
>> raise a ClassCastException.
>>
>> I write an demo, like:
>>
>> scala> case class Record(key: Int, value: String)
>>
>> scala> case class ID(key: Int)
>
Thats a bug. Which version of Spark are you running? Have you tried 2.0.2?
On Wed, Nov 2, 2016 at 12:01 AM, 颜发才(Yan Facai) wrote:
> Hi, all.
> When I use a case class as return value in map function, spark always
> raise a ClassCastException.
>
> I write an demo, like:
>
&
Hi, all.
When I use a case class as return value in map function, spark always raise
a ClassCastException.
I write an demo, like:
scala> case class Record(key: Int, value: String)
scala> case class ID(key: Int)
scala> val df = Seq(Record(1, "a"), Record(2, "b")
Yes .You can do something like this :
.map(x=>mapfunction(x))
Thanks
Deepak
On 9 Jul 2016 9:22 am, "charles li" wrote:
>
> hi, guys, is there a way to dynamic load files within the map function.
>
> i.e.
>
> Can I code as b
hi, guys, is there a way to dynamic load files within the map function.
i.e.
Can I code as bellow:
thanks a lot.
--
*___*
Quant | Engineer | Boy
*___*
*blog*:http://litaotao.github.io
*github*: www.github.com/litaotao
ted as below
>>
>> options.put("dbtable", "(select * from user) as account");
>> DataFrame accountRdd =
>> sqlContext.read().format("jdbc").options(options).load();
>>
>> and i have another RDD which contains login name and i want to find the
t("jdbc").options(options).load();
>>
>> and i have another RDD which contains login name and i want to find the
>> userid from above DF RDD and return it
>>
>> Not sure how can i do that as when i apply a map function and say filter on
>> DF i get Null pointor exception.
>>
>> Please help.
Context.read().format("jdbc").options(options).load();
>
> and i have another RDD which contains login name and i want to find the
> userid from above DF RDD and return it
>
> Not sure how can i do that as when i apply a map function and say filter
> on DF i get Null pointor exception.
>
> Please help.
>
>
>
e
userid from above DF RDD and return it
Not sure how can i do that as when i apply a map function and say filter on
DF i get Null pointor exception.
Please help.
In fact you can return “NULL” from your initial map and hence not resort to
Optional at all
From: Evo Eftimov [mailto:evo.efti...@isecc.com]
Sent: Sunday, April 19, 2015 9:48 PM
To: 'Steve Lewis'
Cc: 'Olivier Girardot'; 'user@spark.apache.org'
Subject: RE:
Spark exception THEN
as far as I am concerned, chess-mate
From: Steve Lewis [mailto:lordjoe2...@gmail.com]
Sent: Sunday, April 19, 2015 8:16 PM
To: Evo Eftimov
Cc: Olivier Girardot; user@spark.apache.org
Subject: Re: Can a map function return null
So you imagine something like this
t from Samsung Mobile
>
>
> Original message
> From: Olivier Girardot
> Date:2015/04/18 22:04 (GMT+00:00)
> To: Steve Lewis ,user@spark.apache.org
> Subject: Re: Can a map function return null
>
> You can return an RDD with null values inside, and afterward
Sent from Samsung Mobile
Original message From: Olivier Girardot
Date:2015/04/18 22:04 (GMT+00:00)
To: Steve Lewis ,user@spark.apache.org
Subject: Re: Can a map function return null
You can return an RDD with null values inside, and afterwards filter on
"item != null
false
> }
> });
>
> My question is can I do a map returning the transformed data and null if
> nothing is to be returned. as shown below - what does a Spark do with a map
> function returning null
>
> JavaRDD words = original.map(new MapFunction() {
>
t.add(transform(s));
return ret; // contains 0 items if isUsed is false
}
});
My question is can I do a map returning the transformed data and null if
nothing is to be returned. as shown below - what does a Spark do with a map
function returning null
Ja
ss.mapToPair(JavaDStreamLike.scala:146)
> at
> org.apache.spark.streaming.api.java.JavaPairDStream.mapToPair(JavaPairDStream.scala:46)
> at streamReader.App.main(App.java:66)
>
> Is using the sparkContext from inside a map function wrong ?
>
> This is the code we are using:
> S
at
org.apache.spark.streaming.api.java.JavaDStreamLike$class.mapToPair(JavaDStreamLike.scala:146)
at
org.apache.spark.streaming.api.java.JavaPairDStream.mapToPair(JavaPairDStream.scala:46)
at streamReader.App.main(App.java:66)
Is using the sparkContext from inside a map function wro
at
org.apache.spark.streaming.api.java.JavaDStreamLike$class.mapToPair(JavaDStreamLike.scala:146)
at
org.apache.spark.streaming.api.java.JavaPairDStream.mapToPair(JavaPairDStream.scala:46)
at streamReader.App.main(App.java:66)
Is using the sparkContext from inside a ma
at
org.apache.spark.streaming.api.java.JavaDStreamLike$class.mapToPair(JavaDStreamLike.scala:146)
at
org.apache.spark.streaming.api.java.JavaPairDStream.mapToPair(JavaPairDStream.scala:46)
at streamReader.App.main(App.java:66)
Is using the sparkContext from inside a map function wro
> --
> View this message in context:
> http://apache-spark-user-list.1001560.n3.nabble.com/Manually-trigger-RDD-map-function-without-action-tp21094p21110.html
> Sent from the Apache Spark User List mailing list archive at Nabble.com.
>
> --
ion is the most
efficient way. And a winner is foreach(a => a) according to everyone's
expectations. Collect can cause OOM from driver and count is very slower
than the others. Thanks all.
--
View this message in context:
http://apache-spark-user-list.1001560.n3.nabble.com/Manually-trigge
regards
>> Kevin
>>
>> --
>> If you reply to this email, your message will be added to the
>> discussion below:
>>
>> http://apache-spark-user-list.1001560.n3.nabble.com/Manually-trigger-RDD-map-functi
e will be added to the discussion
> below:
>
> http://apache-spark-user-list.1001560.n3.nabble.com/Manually-trigger-RDD-map-function-without-action-tp21094.html
> To unsubscribe from Apache Spark User List, click here
> <http://apache-spark-user-list.1001560.n3.nabble.com/template/Na
Hi all
Is there efficient way to trigger RDD transformations? I'm now using count
action to achieve this.
Best regards
Kevin
--
View this message in context:
http://apache-spark-user-list.1001560.n3.nabble.com/Manually-trigger-RDD-map-function-without-action-tp21094.html
Sent from the A
mio Windows Phone
> Da: Yifan LI <mailto:iamyifa...@gmail.com>
> Inviato: 04/12/2014 09:27
> A: user@spark.apache.org <mailto:user@spark.apache.org>
> Oggetto: map function
>
> Hi,
>
> I have a RDD like below:
> (1, (10, 20))
> (2, (30, 40, 10))
> (3,
apache.org<mailto:user@spark.apache.org>
Oggetto: map function
Hi,
I have a RDD like below:
(1, (10, 20))
(2, (30, 40, 10))
(3, (30))
…
Is there any way to map it to this:
(10,1)
(20,1)
(30,2)
(40,2)
(10,2)
(30,3)
…
generally, for each element, it might be mapped to multiple.
Thanks in advance!
Best,
Yifan LI
rdd.flatMap { case (k, coll) => coll.map { elem => (elem, k) } }
On Thu, Dec 4, 2014 at 1:26 AM, Yifan LI wrote:
> Hi,
>
> I have a RDD like below:
> (1, (10, 20))
> (2, (30, 40, 10))
> (3, (30))
> …
>
> Is there any way to map it to this:
> (10,1)
> (20,1)
> (30,2)
> (40,2)
> (10,2)
> (30,3)
>
Hi,
I have a RDD like below:
(1, (10, 20))
(2, (30, 40, 10))
(3, (30))
…
Is there any way to map it to this:
(10,1)
(20,1)
(30,2)
(40,2)
(10,2)
(30,3)
…
generally, for each element, it might be mapped to multiple.
Thanks in advance!
Best,
Yifan LI
doing cleanup in an iterator like that assumes the iterator always gets
fully read, which is not necessary the case (for example RDD.take does not).
instead i would use mapPartitionsWithContext, in which case you can write a
function of the form.
f: (TaskContext, Iterator[T]) => Iterator[U]
now
any idea to increase the concurrency ?
>
> Thanks for all
>
>
>
>
>
> --
> View this message in context:
> http://apache-spark-user-list.1001560.n3.nabble.com/How-to-use-multi-thread-in-RDD-map-function-tp15286.html
> Sent from the Apache Spark User List mailing l
ark-submit, there are two options which may be helpful
> to your problem, they are "--total-executor-cores NUM"(standalone and
> mesos only), "--executor-cores"(yarn only)
>
>
> qinwei
> From: myasukaDate: 2014-09-28 11:44To: userSubject: How to use multi
> thre
in the options of spark-submit, there are two options which may be helpful to
your problem, they are "--total-executor-cores NUM"(standalone and mesos only),
"--executor-cores"(yarn only)
qinwei
From: myasukaDate: 2014-09-28 11:44To: userSubject: How to use mult
D transformation.
Is there any one can provide some example code to use Java multi thread
in RDD transformation, or give any idea to increase the concurrency ?
Thanks for all
--
View this message in context:
http://apache-spark-user-list.1001560.n3.nabble.com/How-to-use-multi-thread-in-
On 9/26/14 6:45 PM, Shekhar Bansal wrote:
Hi
In one of our usecase, filename contains timestamp and we have to
append it in the record for aggregation.
How can I access filename in map function?
Thanks!
Hi
In one of our usecase, filename contains timestamp and we have to append it in
the record for aggregation.
How can I access filename in map function?
Thanks!
> >> I don't think static members are going to be serialized in the
> >> closure? the instance of Parse will be looking at its local
> >> SampleOuterClass, which is maybe not initialized on the remote JVM.
> >>
> >> On Tue, Aug 12, 2014 at 6:02 PM, Su
lass, which is maybe not initialized on the remote JVM.
>>
>> On Tue, Aug 12, 2014 at 6:02 PM, Sunny Khatri
>> wrote:
>> > I have a class defining an inner static class (map function). The inner
>> > class tries to refer the variable instantiated in th
looking at its local
> SampleOuterClass, which is maybe not initialized on the remote JVM.
>
> On Tue, Aug 12, 2014 at 6:02 PM, Sunny Khatri
> wrote:
> > I have a class defining an inner static class (map function). The inner
> > class tries to refer the variable instant
I don't think static members are going to be serialized in the
closure? the instance of Parse will be looking at its local
SampleOuterClass, which is maybe not initialized on the remote JVM.
On Tue, Aug 12, 2014 at 6:02 PM, Sunny Khatri wrote:
> I have a class defining an inner static cl
I have a class defining an inner static class (map function). The inner
class tries to refer the variable instantiated in the outside class, which
results in a NullPointerException. Sample Code as follows:
class SampleOuterClass {
private static ArrayList someVariable
On Thu, Jul 31, 2014 at 2:11 AM, Tobias Pfeiffer wrote:
> rdd.mapPartitions { partition =>
>// Some setup code here
>val result = partition.map(yourfunction)
>
>// Some cleanup code here
>result
> }
Yes, I realized that after I hit send. You definitely have to store
and return the
Hi,
On Thu, Jul 31, 2014 at 2:23 AM, Sean Owen wrote:
>
> ... you can run setup code before mapping a bunch of records, and
> after, like so:
>
> rdd.mapPartitions { partition =>
>// Some setup code here
>partition.map(yourfunction)
>// Some cleanup code here
> }
>
Please be careful
#x27;t share state across Mappers, or Mappers and Reducers in
> Hadoop. (At least there was no direct way.) Same here. But you can
> maintain state across many map calls.
>
> On Wed, Jul 30, 2014 at 6:07 PM, Kevin wrote:
> > Hi,
> >
> > Is it possible to maintain state
nside a Spark map function? With Hadoop
> MapReduce, Mappers and Reducers are classes that can have their own state
> using instance variables. Can this be done with Spark? Are there any
> examples?
>
> Most examples I have seen do a simple operating on the value passed into the
> map fun
use mapPartitions to get the equivalent functionality to hadoop
--
View this message in context:
http://apache-spark-user-list.1001560.n3.nabble.com/Keep-state-inside-map-function-tp10968p10969.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.
Hi,
Is it possible to maintain state inside a Spark map function? With Hadoop
MapReduce, Mappers and Reducers are classes that can have their own state
using instance variables. Can this be done with Spark? Are there any
examples?
Most examples I have seen do a simple operating on the value
Does this line println("Retuning "+string) from the hash function
print what you expect? If you're not seeing that output in the
executor log I'd also put some debug statements in "case other", since
your match in the "interesting" case is conditioned on if(
fieldsList.contains(index)) -- maybe th
Hello every one
I am having some problem with a simple Scala/ Spark Code in which I am
trying to replaces certain fields in a csv with their hashes
class DSV (var line:String="",fieldsList:Seq[Int], var
delimiter:String=",") extends Serializable {
def hash(s:String):String={
Hi spark-folk,
I have a directory full of files that I want to process using PySpark.
There is some necessary metadata in the filename that I would love to
attach to each record in that file. Using Java MapReduce, I would access
(FileSplit) context.getInputSplit()).getPath().getName()
in the s
Hi all,
I try to create new object in the map function. But pyspark report a lot of
error information. Is it legal to do so? Here is my codes:
class Node(object):
def __init__(self, A, B, C):
self.A = A
self.B = B
self.C = C
def make_vertex(pair):
A, (B, C) = pair
65 matches
Mail list logo