} {$pii} {$content-type} {$srctitle} {$document-type} {$document-subtype} {$publication-date} {$article-title} {$issn} {$isbn}
{$lang} {$tables} return
xml-to-json($retval)
Darin.
On Tuesday, March 13, 2018, 8:52:42 AM
This issue on stackoverflow maybe help
https://stackoverflow.com/questions/42641573/why-does-memory-usage-of-spark-worker-increases-with-time/42642233#42642233
--
View this message in context:
I add this code in foreachRDD block .
```
rdd.persist(StorageLevel.MEMORY_AND_DISK)
```
This exception no occur agein.But many executor dead showing in spark
streaming UI .
```
User class threw exception: org.apache.spark.SparkException: Job aborted due
to stage failure: Task 21 in stage 1194.0
Hi,
I got this exception when streaming program run some hours.
```
*User class threw exception: org.apache.spark.SparkException: Job aborted
due to stage failure: Task 21 in stage 1194.0 failed 4 times, most recent
failure: Lost task 21.3 in stage 1194.0 (TID 2475, 2.dev3, executor 66):
I think your sets not works
try add `ulimit -n 10240 ` in spark-env.sh
--
View this message in context:
http://apache-spark-user-list.1001560.n3.nabble.com/SparkSQL-too-many-open-files-although-ulimit-set-to-1048576-tp28490p28491.html
Sent from the Apache Spark User List mailing list archive
ner
I get
Option[org.apache.spark.Partitioner] = None
I would have thought it would be HashPartitioner.
Does anyone know why this would be None and not HashPartitioner?
Thanks.
Darin.
be appreciated.
Thanks.
Darin.
--
View this message in context:
http://apache-spark-user-list.1001560.n3.nabble.com/How-to-find-the-partitioner-for-a-Dataset-tp27672.html
Sent from the Apache Spark User List mailing list archive at Nabble.com
.
Thanks.
Darin.
-
To unsubscribe e-mail: user-unsubscr...@spark.apache.org
he size of my cluster) but I also want to clarify
my understanding of whole-stage code generation.
Any thought/suggestions would be appreciated. Also, if anyone has found good
resources that further explain the details of the DAG and whole-stage code
generation,
is that it
returns a string. So, you have to be a little creative when returning multiple
values (such as delimiting the values with a special character and then
splitting on this delimiter).
Darin.
From: Diwakar Dhanuskodi <diwakar.dhanusk...@gmail.com>
To: Darin McBeath &
, or xslt to transform, extract, or
filter.
Like mentioned below, you want to initialize the parser in a mapPartitions call
(one of the examples shows this).
Hope this is helpful.
Darin.
From: Hyukjin Kwon <gurwls...@gmail.com>
To: Jörn Franke <jornfra...@
use cases (such as the one I'm using above).
Am I doing something wrong or is this to be expected?
Thanks.
Darin.
-
To unsubscribe e-mail: user-unsubscr...@spark.apache.org
repartition
should work or if this is a bug. Thanks Jacek for starting to dig into this.
Darin.
- Original Message -
From: Darin McBeath <ddmcbe...@yahoo.com.INVALID>
To: Jacek Laskowski <ja...@japila.pl>
Cc: user <user@spark.apache.org>
Sent: Friday, March 11, 2016 1
rviceInit call");
log.info("SimpleStorageServiceInit call arg1: "+ arg1);
log.info("SimpleStorageServiceInit call arg2:"+ arg2);
log.info("SimpleStorageServiceInit call arg3: "+ arg3);
SimpleStorageService.init(this.arg1, this.arg2, this.arg3);
}
}
___
dies.
Darin.
From: Jacek Laskowski <ja...@japila.pl>
To: Darin McBeath <ddmcbe...@yahoo.com>
Cc: user <user@spark.apache.org>
Sent: Friday, March 11, 2016 1:24 PM
Subject: Re: spark 1.6 foreachPartition only appears to be running on one
executor
-submit.
Thanks.
Darin.
-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org
of my files can have embedded
newlines). But, I wonder how either of these would behave if I passed
literally a million (or more) 'filenames'.
Before I spend time exploring, I wanted to seek some input.
Any thoughts would be appreciated.
Darin
to view the application history
for both jobs.
Has anyone else noticed this issue? Any suggestions?
Thanks.
Darin.
-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h
Thanks.
I already set the following in spark-defaults.conf so I don't think that is
going to fix my problem.
spark.eventLog.dir file:///root/spark/applicationHistory
spark.eventLog.enabled true
I suspect my problem must be something else.
Darin.
From: Don
e new getInstance function and some more information on
the various features.
Darin.
From: Darin McBeath <ddmcbe...@yahoo.com.INVALID>
To: "user@spark.apache.org" <user@spark.apache.org>
Sent: Tuesday, December 1, 2015 11:51 AM
Subject: Re: Turning off D
) or you could have it find 'local' versions (on
the workers or in S3 and then cache them locally for performance).
I will post an update when the code has been adjusted.
Darin.
- Original Message -
From: Shivalik <shivalik.malho...@outlook.com>
To: user@spark.apache.org
Sent: T
or.getInstance(xpath,namespaces)
recsIter.map(rec => proc.evaluateString(rec._2).toInt)
}).sum
There is more documentation on the spark-xml-utils github site. Let me know if
the documentation is not clear or if you have any questions.
Darin.
__
http://www.meetup.com/Cincinnati-Apache-Spark-Meetup/
Thanks.
Darin.
, EntityType: String,
CustomerId: String, EntityURI: String, NumDocs: Long)
val entities = sc.textFile(s3n://darin/Entities.csv)
val entitiesArr = entities.map(v = v.split('|'))
val dfEntity = entitiesArr.map(arr = Entity(arr(0).toLong, arr(1).toLong,
arr(2), arr(3), arr(4), arr(5).toLong)).toDF
.
Thanks.
Darin.
-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org
is for hadoop2.0.0 (and I think Cloudera).
Is there a way that I can force the install of the same assembly to the
cluster that comes with the 1.2 download of spark?
Thanks.
Darin.
-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
Thanks for you quick reply. Yes, that would be fine. I would rather wait/use
the optimal approach as opposed to hacking some one-off solution.
Darin.
From: Kostas Sakellis kos...@cloudera.com
To: Darin McBeath ddmcbe...@yahoo.com
Cc: User user
a partition that I might over count, but
perhaps that is an acceptable trade-off.
I'm guessing that others have ran into this before so I would like to learn
from the experience of others and how they have addressed this.
Thanks.
Darin
);
If anyone has any tips for what I should look into it would be appreciated.
Thanks.
Darin.
-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org
Just to close the loop in case anyone runs into the same problem I had.
By setting --hadoop-major-version=2 when using the ec2 scripts, everything
worked fine.
Darin.
- Original Message -
From: Darin McBeath ddmcbe...@yahoo.com.INVALID
To: Mingyu Kim m...@palantir.com; Aaron Davidson
is an interface of type
org.apache.hadoop.mapred.JobContext.
Is there something obvious that I might be doing wrong (or messed up in the
translation from Scala to Java) or something I should look into? I'm using
Spark 1.2 with hadoop 2.4.
Thanks.
Darin.
From
it and post a response.
- Original Message -
From: Mingyu Kim m...@palantir.com
To: Darin McBeath ddmcbe...@yahoo.com; Aaron Davidson ilike...@gmail.com
Cc: user@spark.apache.org user@spark.apache.org
Sent: Monday, February 23, 2015 3:06 PM
Subject: Re: Which OutputCommitter to use for S3?
Cool
1.2 on a stand-alone cluster on ec2. To get the counts
for the records, I'm using the .count() for the RDD.
Thanks.
Darin.
-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h
Thanks Imran. That's exactly what I needed to know.
Darin.
From: Imran Rashid iras...@cloudera.com
To: Darin McBeath ddmcbe...@yahoo.com
Cc: User user@spark.apache.org
Sent: Tuesday, February 17, 2015 8:35 PM
Subject: Re: How do you get the partitioner
());
log.info(Number of baseline records: + baselinePairRDD.count());
Both hsfBaselinePairRDD and baselinePairRDD have 1024 partitions.
Any insights would be appreciated.
I'm using Spark 1.2.0 in a stand-alone cluster.
Darin
really want to do in the first place.
Thanks again for your insights.
Darin.
From: Imran Rashid iras...@cloudera.com
To: Darin McBeath ddmcbe...@yahoo.com
Cc: User user@spark.apache.org
Sent: Tuesday, February 17, 2015 3:29 PM
Subject: Re: MapValues and Shuffle
of getting this information.
I'm curious if anyone has any suggestions for how I might go about finding how
an RDD is partitioned in a Java program.
Thanks.
Darin.
-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
0.3 s 323.4 MB
If anyone has some suggestions please let me know. I've tried playing around
with various configuration options but I've found nothing yet that will fix the
underlying issue.
Thanks.
Darin
,30);
From: Sven Krasser kras...@gmail.com
To: Darin McBeath ddmcbe...@yahoo.com
Cc: User user@spark.apache.org
Sent: Friday, January 23, 2015 5:12 PM
Subject: Re: Problems saving a large RDD (1 TB) to S3 as a sequence file
Hey Darin,
Are you running
about is why there is even any 'shuffle read' when
constructing the baselinePairRDD. If anyone could shed some light on this it
would be appreciated.
Thanks.
Darin.
-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
confused about is why there is even any 'shuffle read' when
constructing the baselinePairRDD. If anyone could shed some light on this it
would be appreciated.
Thanks.
Darin.
Take a look at the O'Reilly Learning Spark (Early Release) book. I've found
this very useful.
Darin.
From: Saurabh Agrawal saurabh.agra...@markit.com
To: user@spark.apache.org user@spark.apache.org
Sent: Thursday, November 20, 2014 9:04 AM
Subject: Please help me get started on Apache
be appreciated.
Thanks.
Darin.
Here is where I'm setting the 'timeout' in my spark job.
SparkConf conf = new SparkConf().setAppName(SparkSync
Application).set(spark.serializer,
org.apache.spark.serializer.KryoSerializer).set(spark.rdd.compress,true)
.set
for /mnt/spark and
/mnt2/spark do not exist.
Am I missing something? Has anyone else noticed this?
A colleague was started a cluster (using the ec2 scripts) but for m3.xlarge
machines and both /mnt/spark and /mnt2/spark directories were created.
Thanks.
Darin.
Assume the following where both updatePairRDD and deletePairRDD are both
HashPartitioned. Before the union, each one of these has 512 partitions. The
new created updateDeletePairRDD has 1024 partitions. Is this the
general/expected behavior for a union (the number of partitions to double)?
. But, based on other logging statements
throughout my application, I don't believe this is the case.
Thanks.
Darin.
14/11/10 22:35:27 INFO scheduler.DAGScheduler: Failed to run count at
SparkSync.java:78514/11/10 22:35:27 WARN scheduler.TaskSetManager: Lost task
0.3 in stage 40.0 (TID 10674, ip-10
Let me know if you are interested in participating in a meet up in Cincinnati,
OH to discuss Apache Spark.
We currently have 4-5 different companies expressing interest but would like a
few more.
Darin.
-shell as well as from a Java application. Feel free
to use, contribute, and/or let us know how this library can be improved. Let
me know if you have any questions.
Darin.
.
Thanks.
Darin.
ok. after reading some documentation, it would appear the issue is the default
number of partitions for a join (200).
After doing something like the following, I was able to change the value.
From: Darin McBeath ddmcbe...@yahoo.com.INVALID
To: User user@spark.apache.org
Sent: Wednesday
Sorry, hit the send key a bitt too early.
Anyway, this is the code I set.
sqlContext.sql(set spark.sql.shuffle.partitions=10);
From: Darin McBeath ddmcbe...@yahoo.com
To: Darin McBeath ddmcbe...@yahoo.com; User user@spark.apache.org
Sent: Wednesday, October 29, 2014 2:47 PM
Subject: Re
with XPathProcessor.init. I have code in place to
make sure this is not an issue. But, I was wondering if there is a better way
to accomplish something like this.
Thanks.
Darin.
using 297MB for this
RDD (when it was only 156MB in S3). I get that there could be some differences
between the serialized storage format and what is then used in memory, but I'm
curious as to whether I'm missing something and/or should be doing things
differently.
Thanks.
Darin.
ec2 startup script. But, that is purely a guess on my
part.
I'm wondering if anyone else has noticed this issue and if so has a workaround.
Thanks.
Darin.
within spark-shell, this really
isn't an option (unless I'm missing something).
Thanks.
Darin.
to this problem.
Thanks.
Darin.
java.lang.NoSuchMethodError:
org.apache.http.impl.conn.DefaultClientConnectionOperator.init(Lorg/apache/http/conn/scheme/SchemeRegistry;Lorg/apache/http/conn/DnsResolver;)V
at
org.apache.http.impl.conn.PoolingClientConnectionManager.createConnectionOperator
.
Darin.
to executors)
root@ip-10-233-73-204 spark]$ ./bin/spark-shell --jars
lib/uber-SparkUtils-0.1.jar
## Bring in the sequence file (2 million records)
scala val xmlKeyPair =
sc.sequenceFile[String,String](s3n://darin/xml/part*).cache()
## Test values against an xpath expression (need to import the the class
=.08 -z us-east-1e --worker-instances=2
my-cluster
From: Daniel Siegmann daniel.siegm...@velos.io
To: Darin McBeath ddmcbe...@yahoo.com
Cc: Daniel Siegmann daniel.siegm...@velos.io; user@spark.apache.org
user@spark.apache.org
Sent: Thursday, July 31, 2014 10:04
for the running application but this had no effect. Perhaps,
this is ignored for a 'filter' and the default is the total number of cores
available.
I'm fairly new with Spark so maybe I'm just missing or misunderstanding
something fundamental. Any help would be appreciated.
Thanks.
Darin.
the documentation states). What
would I want that value to be based on my configuration below? Or, would I
leave that alone?
From: Daniel Siegmann daniel.siegm...@velos.io
To: user@spark.apache.org; Darin McBeath ddmcbe...@yahoo.com
Sent: Wednesday, July 30, 2014
61 matches
Mail list logo