and it fixed the problem.
On Wed, Aug 3, 2016 at 10:04 AM, Utkarsh Sengar
wrote:
> After an upgrade from 1.5.1 to 2.0, one of the tasks never completes and
> keeps spilling data to disk overtime.
> long count = resultRdd.count();
> LOG.info("TOTAL in res
After an upgrade from 1.5.1 to 2.0, one of the tasks never completes and
keeps spilling data to disk overtime.
long count = resultRdd.count();
LOG.info("TOTAL in resultRdd: " + count);
resultRdd is has a rather complex structure:
JavaPairRDD>
resultRdd = myRdd
I don't think its a related problem, although by setting
"spark.sql.warehouse.dir"=/tmp in spark config fixed it.
On Tue, Aug 2, 2016 at 5:02 PM, Utkarsh Sengar
wrote:
> Do we have a workaround for this problem?
> Can I overwrite that using some config?
>
> On Tue, A
eeds an update.
>
> On Tue, Aug 2, 2016 at 4:47 PM, Utkarsh Sengar
> wrote:
> > Upgraded to spark2.0 and tried to load a model:
> > LogisticRegressionModel model = LogisticRegressionModel.load(sc.sc(),
> > "s3a://cruncher/c/models/lr/");
&g
Upgraded to spark2.0 and tried to load a model:
LogisticRegressionModel model = LogisticRegressionModel.load(sc.sc(),
"s3a://cruncher/c/models/lr/");
Getting this error: Exception in thread "main"
java.lang.IllegalArgumentException: Wrong FS: file://spark-warehouse,
expected: file:///
Full stacktr
We are intermittently getting this error when spark tried to load data from
S3:Caused by: sun.security.provider.certpath.SunCertPathBuilderException:
unable to find valid certification path to requested target.
https://gist.githubusercontent.com/utkarsh2012/1c4cd2dc82c20c6f389b783927371bd7/raw/a1b
g Apache Spark http://bit.ly/mastering-apache-spark
> Follow me at https://twitter.com/jaceklaskowski
>
>
> On Thu, Jun 16, 2016 at 6:45 AM, Utkarsh Sengar
> wrote:
> > This SO question was asked about 1yr ago.
> >
> http://stackoverflow.com/questions/31799755/how-to-
This SO question was asked about 1yr ago.
http://stackoverflow.com/questions/31799755/how-to-deal-with-tasks-running-too-long-comparing-to-others-in-job-in-yarn-cli
I answered this question with a suggestion to try speculation but it
doesn't quite do what the OP expects. I have been running into t
The problem turned out to be corrupt parquet data, the error was a bit
misleading by spark though.
On Mon, Feb 8, 2016 at 3:41 PM, Utkarsh Sengar
wrote:
> I am storing a model in s3 in this path:
> "bucket_name/p1/models/lr/20160204_0410PM/ser" and the structure of the
> s
I am storing a model in s3 in this path:
"bucket_name/p1/models/lr/20160204_0410PM/ser" and the structure of the
saved dir looks like this:
1. bucket_name/p1/models/lr/20160204_0410PM/ser/data -> _SUCCESS,
_metadata, _common_metadata
and part-r-0-ebd3dc3c-1f2c-45a3-8793-c8f0cb8e7d01.gz.parquet
.
-Utkarsh
On Mon, Feb 1, 2016 at 3:40 PM, Holden Karau wrote:
> I wouldn't use accumulators for things which could get large, they can
> become kind of a bottle neck. Do you have a lot of string messages you want
> to bring back or only a few?
>
> On Mon, Feb 1, 2016 at 3:2
I am trying to debug code executed in executors by logging. Even when I add
log4j's LOG.info(..) inside .map() I don't see it in mesos task logs in the
corresponding slaves.
Its anyway inefficient to keep checking multiple slaves for logs.
One way to deal with this is to push logs to a central loc
error I shared in my previous email.
Not sure what's going on, has anyone successfully used JPMML to import a
PMML file in spark?
On Wed, Dec 9, 2015 at 11:01 AM, Utkarsh Sengar
wrote:
> I am trying to load a PMML file in a spark job. Instantiate it only once
> and pass i
I am trying to load a PMML file in a spark job. Instantiate it only once
and pass it to the executors. But I get a NotSerializableException for
org.xml.sax.helpers.LocatorImpl which is used inside jpmml.
I have this class Prediction.java:
public class Prediction implements Serializable {
priva
Hi Tim,
Any way I can provide more info on this?
On Thu, Oct 1, 2015 at 4:21 PM, Utkarsh Sengar
wrote:
> Not sure what you mean by that, I shared the data which I see in spark UI.
> Can you point me to a location where I can precisely get the data you need?
>
> When I run the
ode look like?
>
> Tim
>
> On Thu, Oct 1, 2015 at 4:05 PM, Utkarsh Sengar
> wrote:
>
>> Bumping it up, its not really a blocking issue.
>> But fine grain mode eats up uncertain number of resources in mesos and
>> launches tons of tasks, so I would prefer usi
Bumping it up, its not really a blocking issue.
But fine grain mode eats up uncertain number of resources in mesos and
launches tons of tasks, so I would prefer using the coarse grained mode if
only it didn't run out of memory.
Thanks,
-Utkarsh
On Mon, Sep 28, 2015 at 2:24 PM, Utkarsh S
; And when the job is running could you open the Spark webui and get stats
> about the heap size and other java settings?
>
> Tim
>
> On Thu, Sep 24, 2015 at 10:56 PM, Utkarsh Sengar
> wrote:
>
>> Bumping this one up, any suggestions on the stacktrace?
>> spark.meso
Bumping this one up, any suggestions on the stacktrace?
spark.mesos.coarse=true is not working and the driver crashed with the
error.
On Wed, Sep 23, 2015 at 3:29 PM, Utkarsh Sengar
wrote:
> Missed to do a reply-all.
>
> Tim,
>
> spark.mesos.coarse = true doesn't work an
originally set coarse to false but then to true? Or is
> it the other way around?
>
> Also what's the exception/stack trace when the driver crashed?
>
> Coarse grain mode per-starts all the Spark executor backends, so has the
> least overhead comparing to fine grain. T
If broadcast variable doesn't fit in memory, I think is not the right fit
for you.
You can think about fitting it with an RDD as a tuple with other data you
are working on.
Say you are working on RDD (rdd in your case), run a map/reduce
to convert it to RDD> so now you have
relevant data from the
We are using "spark-1.4.1-bin-hadoop2.4" on mesos (not EMR) with s3 to read
and write data and haven't noticed any inconsistencies with it, so 1
(mostly) and 2 definitely should not be a problem.
Regarding 3, are you setting the file system impl in spark config?
sparkContext.hadoopConfiguration().
I am running Spark 1.4.1 on mesos.
The spark job does a "cartesian" of 4 RDDs (aRdd, bRdd, cRdd, dRdd) of size
100, 100, 7 and 1 respectively. Lets call it prouctRDD.
Creation of "aRdd" needs data pull from multiple data sources, merging it
and creating a tuple of JavaRdd, finally aRDD looks some
ect is, but it's plausible that the process of creating an array
> of 6.5 million of them is causing you to run out of memory.
>
> I think the reason you don't see anything in the executor logs is that the
> exception is occurring before the work is tasked to the executors.
I am trying to run this, a basic mapToPair and then count() to trigger an
action.
4 executors are launched but I don't see any relevant logs on those
executors.
It looks like the the driver is pulling all the data and it runs out of
memory, the dataset is big, so it won't fit on 1 machine.
So wha
I am working on code which uses executor service to parallelize tasks
(think machine learning computations done over small dataset over and over
again).
My goal is to execute some code as fast as possible, multiple times and
store the result somewhere (total executions will be on the order of 100M
PM, Utkarsh Sengar
> wrote:
> > So do I need to manually copy these 2 jars on my spark executors?
>
> Yes. I can think of a way to work around that if you're using YARN,
> but not with other cluster managers.
>
> > On Tue, Aug 25, 2015 at 10:51 AM, Marcelo Vanzin
> &
So do I need to manually copy these 2 jars on my spark executors?
On Tue, Aug 25, 2015 at 10:51 AM, Marcelo Vanzin
wrote:
> On Tue, Aug 25, 2015 at 10:48 AM, Utkarsh Sengar
> wrote:
> > Now I am going to try it out on our mesos cluster.
> > I assumed "spark.executor.ex
"spark.executor.extraClassPath" takes csv as jars the way
"--jars" takes it but it should be ":" separated like a regular classpath
jar.
Thanks for your help!
-Utkarsh
On Mon, Aug 24, 2015 at 5:05 PM, Utkarsh Sengar
wrote:
> I get the same error even when I set the SPARK_CLASSPA
, Utkarsh Sengar
wrote:
> I assumed that's the case beacause of the error I got and the
> documentation which says: "Extra classpath entries to append to the
> classpath of the driver."
>
> This is where I stand now:
>
> org.apache
able.logging.Log.(Log.java:31)
... 16 more
Thanks,
-Utkarsh
On Mon, Aug 24, 2015 at 4:11 PM, Marcelo Vanzin wrote:
> On Mon, Aug 24, 2015 at 3:58 PM, Utkarsh Sengar
> wrote:
> > That didn't work since "extraClassPath" flag was still appending the
> jars at
&g
gt; If you use "spark.driver.extraClassPath" and
> "spark.executor.extraClassPath" to add the jar, it should take
> precedence over the log4j binding embedded in the Spark assembly.
>
>
> On Mon, Aug 24, 2015 at 3:15 PM, Utkarsh Sengar
> wrote:
>
gt; That being said, that message is not an error, it's more of a noisy
> warning. I'd expect slf4j to use the first binding available - in your
> case, logback-classic. Is that not the case?
>
>
> On Mon, Aug 24, 2015 at 2:50 PM, Utkarsh Sengar
> wrote:
> > Continu
Continuing this discussion:
http://apache-spark-user-list.1001560.n3.nabble.com/same-log4j-slf4j-error-in-spark-9-1-td5592.html
I am getting this error when I use logback-classic.
SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in
[jar:file:.m2/repository/ch/qos/logback/l
34 matches
Mail list logo