On Mon, Sep 8, 2014 at 9:35 AM, Dimension Data, LLC.
wrote:
>user$ pyspark [some-options] --driver-java-options
> spark.yarn.jar=hdfs://namenode:8020/path/to/spark-assembly-*.jar
This command line does not look correct. "spark.yarn.jar" is not a JVM
command line option. You most probably need
On Mon, Sep 8, 2014 at 10:00 AM, Dimension Data, LLC. <
subscripti...@didata.us> wrote:
> user$ export MASTER=local[nn] # Run spark shell on LOCAL CPU threads.
> user$ pyspark [someOptions] --driver-java-options -Dspark.*XYZ*.jar='
> /usr/lib/spark/assembly/lib/spark-assembly-*.jar'
>
> My questi
On Mon, Sep 8, 2014 at 11:52 AM, Dimension Data, LLC. <
subscripti...@didata.us> wrote:
> So just to clarify for me: When specifying 'spark.yarn.jar' as I did
> above, even if I don't use HDFS to create a
> RDD (e.g. do something simple like: 'sc.parallelize(range(100))'), it is
> still necessary
On Mon, Sep 8, 2014 at 3:54 PM, Dimension Data, LLC. <
subscripti...@didata.us> wrote:
> You're probably right about the above because, as seen *below* for
> pyspark (but probably for other Spark
> applications too), once '-Dspark.master=[yarn-client|yarn-cluster]' is
> specified, the app invocat
Yes, that's how file: URLs are interpreted everywhere in Spark. (It's also
explained in the link to the docs I posted earlier.)
The second interpretation below is "local:" URLs in Spark, but that doesn't
work with Yarn on Spark 1.0 (so it won't work with CDH 5.1 and older
either).
On Mon, Sep 8,
This has all the symptoms of Yarn killing your executors due to them
exceeding their memory limits. Could you check your RM/NM logs to see
if that's the case?
(The error was because of an executor at
domU-12-31-39-0B-F1-D1.compute-1.internal, so you can check that NM's
log file.)
If that's the ca
Hi,
Yes, this is a problem, and I'm not aware of any simple workarounds
(or complex one for that matter). There are people working to fix
this, you can follow progress here:
https://issues.apache.org/jira/browse/SPARK-1239
On Tue, Sep 9, 2014 at 2:54 PM, jbeynon wrote:
> I'm running on Yarn with
Your executor is exiting or crashing unexpectedly:
On Tue, Sep 9, 2014 at 3:13 PM, Penny Espinoza
wrote:
> org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor: Exit
> code from container container_1410224367331_0006_01_03 is : 1
> 2014-09-09 21:47:26,345 WARN
> org.apache.hadoo
You're using "hadoopConf", a Configuration object, in your closure.
That type is not serializable.
You can use " -Dsun.io.serialization.extendedDebugInfo=true" to debug
serialization issues.
On Wed, Sep 10, 2014 at 8:23 AM, Sarath Chandra
wrote:
> Thanks Sean.
> Please find attached my code. Let
Yes please pretty please. This is really annoying.
On Sun, Sep 7, 2014 at 6:31 AM, Ognen Duzlevski
wrote:
>
> I keep getting below reply every time I send a message to the Spark user
> list? Can this person be taken off the list by powers that be?
> Thanks!
> Ognen
>
> Forwarded Message
On Mon, Sep 8, 2014 at 11:15 PM, Sean Owen wrote:
> This structure is not specific to Hadoop, but in theory works in any
> JAR file. You can put JARs in JARs and refer to them with Class-Path
> entries in META-INF/MANIFEST.MF.
Funny that you mention that, since someone internally asked the same
q
On Wed, Sep 10, 2014 at 3:44 PM, Sean Owen wrote:
> What's the Hadoop jar structure in question then? Is it something special
> like a WAR file? I confess I had never heard of this so thought this was
> about generic JAR stuff.
What I've been told (and Steve's e-mail alludes to) is that you can
p
On Wed, Sep 10, 2014 at 3:48 PM, Steve Lewis wrote:
> In modern projects there are a bazillion dependencies - when I use Hadoop I
> just put them in a lib directory in the jar - If I have a project that
> depends on 50 jars I need a way to deliver them to Spark - maybe wordcount
> can be written w
Yes, what Sandy said.
On top of that, I would suggest filing a bug for a new command line
argument for spark-submit to make the launcher process exit cleanly as
soon as a cluster job starts successfully. That can be helpful for
code that launches Spark jobs but monitors the job through different
m
Hi chinchu,
Where does the code trying to read the file run? Is it running on the
driver or on some executor?
If it's running on the driver, in yarn-cluster mode, the file should
have been copied to the application's work directory before the driver
is started. So hopefully just doing "new FileIn
You'll need to look at the driver output to have a better idea of
what's going on. You can use "yarn logs --applicationId blah" after
your app is finished (e.g. by killing it) to look at it.
My guess is that your cluster doesn't have enough resources available
to service the container request you'
t.
>
>
>
>
> On Wed, Sep 24, 2014 at 11:37 PM, Marcelo Vanzin
> wrote:
>>
>> You'll need to look at the driver output to have a better idea of
>> what's going on. You can use "yarn logs --applicationId blah" after
>> your app is fin
n im not able to view the UI how can
> change it in cloudera.?
>
>
>
>
> On Thu, Sep 25, 2014 at 12:04 AM, Marcelo Vanzin
> wrote:
>>
>> You need to use the command line yarn application that I mentioned
>> ("yarn logs"). You can't look at the log
Sounds like "spark-01" is not resolving correctly on your machine (or
is the wrong address). Can you ping "spark-01" and does that reach the
VM where you set up the Spark Master?
On Wed, Sep 24, 2014 at 1:12 PM, danilopds wrote:
> Hello,
> I'm learning about Spark Streaming and I'm really excited
Hmmm, you might be suffering from SPARK-1719.
Not sure what the proper workaround is, but it sounds like your native
libs are not in any of the "standard" lib directories; one workaround
might be to copy them there, or add their location to /etc/ld.so.conf
(I'm assuming Linux).
On Thu, Sep 25, 20
Then I think it's time for you to look at the Spark Master logs...
On Thu, Sep 25, 2014 at 7:51 AM, danilopds wrote:
> Hi Marcelo,
>
> Yes, I can ping "spark-01" and I also include the IP and host in my file
> /etc/hosts.
> My VM can ping the local machine too.
>
>
>
> --
> View this message in c
On Thu, Sep 25, 2014 at 8:55 AM, jamborta wrote:
> I am running spark with the default settings in yarn client mode. For some
> reason yarn always allocates three containers to the application (wondering
> where it is set?), and only uses two of them.
The default number of executors in Yarn mode
You can pass the HDFS location of those extra jars in the spark-submit
"--jars" argument. Spark will take care of using Yarn's distributed
cache to make them available to the executors. Note that you may need
to provide the full hdfs URL (not just the path, since that will be
interpreted as a local
ives ARCHIVES Comma separated list of archives to be
extracted into the
working directory of each executor.
On Thu, Sep 25, 2014 at 2:20 PM, Tamas Jambor wrote:
> Thank you.
>
> Where is the number of containers set?
>
> On Thu, Sep 25, 2014 at 7:17 PM,
I assume you did those things in all machines, not just on the machine
launching the job?
I've seen that workaround used successfully (well, actually, they
copied the library to /usr/lib or something, but same idea).
On Thu, Sep 25, 2014 at 7:45 PM, taqilabon wrote:
> You're right, I'm suffering
You can't set up the driver memory programatically in client mode. In
that mode, the same JVM is running the driver, so you can't modify
command line options anymore when initializing the SparkContext.
(And you can't really start cluster mode apps that way, so the only
way to set this is through t
up in a
few different contexts, but I don't think there's an "official"
solution yet.)
On Wed, Oct 1, 2014 at 9:59 AM, Tamas Jambor wrote:
> thanks Marcelo.
>
> What's the reason it is not possible in cluster mode, either?
>
> On Wed, Oct 1, 2014 at 5:42 P
No, you can't instantiate a SparkContext to start apps in cluster mode.
For Yarn, for example, you'd have to call directly into
org.apache.spark.deploy.yarn.Client; that class will tell the Yarn
cluster to launch the driver for you and then instantiate the
SparkContext.
On Wed, Oct 1, 2014 at 10:
You may want to take a look at this PR:
https://github.com/apache/spark/pull/1558
Long story short: while not a terrible idea to show running
applications, your particular case should be solved differently.
Applications are responsible for calling "SparkContext.stop()" at the
end of their run, cur
Hi Anurag,
Spark SQL (from the Spark standard distribution / sources) currently
requires Hive 0.12; as you mention, CDH4 has Hive 0.10, so that's not
gonna work.
CDH 5.2 ships with Spark 1.1.0 and is modified so that Spark SQL can
talk to the Hive 0.13.1 that is also bundled with CDH, so if that'
Hi Greg,
I'm not sure exactly what it is that you're trying to achieve, but I'm
pretty sure those variables are not supposed to be set by users. You
should take a look at the documentation for
"spark.driver.extraClassPath" and "spark.driver.extraLibraryPath", and
the equivalent options for executo
Hi Eric,
Check the "Debugging Your Application" section at:
http://spark.apache.org/docs/latest/running-on-yarn.html
Long story short: upload your log4j.properties using the "--files"
argument of spark-submit.
(Mental note: we could make the log level configurable via a system property...)
On
Hi Philip,
The assemblies are part of the CDH distribution. You can get them here:
http://www.cloudera.com/content/cloudera/en/downloads/cdh/cdh-5-2-0.html
As of Spark 1.1 (and, thus, CDH 5.2), assemblies are not published to
maven repositories anymore (you can see commit [1] for details).
[1] h
This error is not fatal, since Spark will retry on a different port..
but this might be a problem, for different reasons, if somehow your
code is trying to instantiate multiple SparkContexts.
I assume "nn.SimpleNeuralNetwork" is part of your application, and
since it seems to be instantiating a ne
On top of what Andrew said, you shouldn't need to manually add the
mllib jar to your jobs; it's already included in the Spark assembly
jar.
On Thu, Oct 16, 2014 at 11:51 PM, eric wong wrote:
> Hi,
>
> i using the comma separated style for submit multiple jar files in the
> follow shell but it doe
Hi Ashwin,
Let me try to answer to the best of my knowledge.
On Wed, Oct 22, 2014 at 11:47 AM, Ashwin Shankar
wrote:
> Here are my questions :
> 1. Sharing spark context : How exactly multiple users can share the cluster
> using same spark
> context ?
That's not something you might want to
On Wed, Oct 22, 2014 at 2:17 PM, Ashwin Shankar
wrote:
>> That's not something you might want to do usually. In general, a
>> SparkContext maps to a user application
>
> My question was basically this. In this page in the official doc, under
> "Scheduling within an application" section, it talks a
ext to share the same resource or 2)
> add dynamic resource management for Yarn mode is very much wanted.
>
> Jianshi
>
> On Thu, Oct 23, 2014 at 5:36 AM, Marcelo Vanzin wrote:
>>
>> On Wed, Oct 22, 2014 at 2:17 PM, Ashwin Shankar
>> wrote:
>> >> That
Hello there,
This is more of a question for the cdh-users list, but in any case...
In CDH 5.1 we skipped packaging of the Hive module in SparkSQL. That
has been fixed in CDH 5.2, so if it's possible for you I'd recommend
upgrading.
On Thu, Oct 23, 2014 at 2:53 PM, nitinkak001 wrote:
> I am tryin
On Thu, Oct 23, 2014 at 3:40 PM, ankits wrote:
> 2014-10-23 15:39:50,845 ERROR [] Exception in task 1.0 in stage 1.0 (TID 1)
> java.io.IOException: org.apache.thrift.protocol.TProtocolException:
This looks like an exception that's happening on an executor and just
being reported in the driver's
You assessment is mostly correct. I think the only thing I'd reword is
the comment about splitting the data, since Spark itself doesn't do
that, but read on.
On Thu, Oct 23, 2014 at 6:12 PM, matan wrote:
> In case I nailed it, how then does it handle a distributed hdfs file? does
> it pull all of
Actually, if you don't call SparkContext.stop(), the event log
information that is used by the history server will be incomplete, and
your application will never show up in the history server's UI.
If you don't use that functionality, then you're probably ok not
calling it as long as your applicat
On Mon, Oct 27, 2014 at 7:37 PM, buring wrote:
> Here is error log,I abstract as follows:
> INFO [binaryTest---main]: before first
> WARN [org.apache.spark.scheduler.TaskSetManager---Result resolver
> thread-0]: Lost task 0.0 in stage 0.0 (TID 0, spark-dev136):
> org.xerial.snappy.SnappyEr
Hello,
CDH 5.1.3 ships with a version of Hive that's not entirely the same as
the Hive Spark 1.1 supports. So when building your custom Spark, you
should make sure you change all the dependency versions to point to
the CDH versions.
IIRC Spark depends on org.spark-project.hive:0.12.0, you'd have
I haven't tried scala:cc, but you can ask maven to just build a
particular sub-project. For example:
mvn -pl :spark-examples_2.10 compile
On Sat, Nov 15, 2014 at 5:31 PM, Yiming (John) Zhang wrote:
> Hi,
>
>
>
> I have already successfully compile and run spark examples. My problem is
> that i
Can you check in your RM's web UI how much of each resource does Yarn
think you have available? You can also check that in the Yarn
configuration directly.
Perhaps it's not configured to use all of the available resources. (If
it was set up with Cloudera Manager, CM will reserve some room for
daem
Hi Anson,
We've seen this error when incompatible classes are used in the driver
and executors (e.g., same class name, but the classes are different
and thus the serialized data is different). This can happen for
example if you're including some 3rd party libraries in your app's
jar, or changing t
x27;s version of Spark, not trying to run an Apache Spark release on
top of CDH, right? (If that's the case, then we could probably move
this conversation to cdh-us...@cloudera.org, since it would be
CDH-specific.)
> On Wed Nov 19 2014 at 4:52:51 PM Marcelo Vanzin wrote:
>>
>
Hi Yiming,
On Wed, Nov 19, 2014 at 5:35 PM, Yiming (John) Zhang wrote:
> Thank you for your reply. I was wondering whether there is a method of
> reusing locally-built components without installing them? That is, if I have
> successfully built the spark project as a whole, how should I configur
Check the "--files" argument in the output "spark-submit -h".
On Thu, Nov 20, 2014 at 7:51 AM, Matt Narrell wrote:
> How do I configure the files to be uploaded to YARN containers. So far, I’ve
> only seen "--conf spark.yarn.jar=hdfs://….” which allows me to specify the
> HDFS location of the
Hi Tobias,
With the current Yarn code, packaging the configuration in your app's
jar and adding the "-Dlog4j.configuration=log4jConf.xml" argument to
the extraJavaOptions configs should work.
That's not the recommended way for get it to work, though, since this
behavior may change in the future.
Do you expect to be able to use the spark context on the remote task?
If you do, that won't work. You'll need to rethink what it is you're
trying to do, since SparkContext is not serializable and it doesn't
make sense to make it so. If you don't, you could mark the field as
@transient.
But the tw
Hello,
On Mon, Nov 24, 2014 at 12:07 PM, aecc wrote:
> This is the stacktrace:
>
> org.apache.spark.SparkException: Job aborted due to stage failure: Task not
> serializable: java.io.NotSerializableException: $iwC$$iwC$$iwC$$iwC$AAA
> - field (class "$iwC$$iwC$$iwC$$iwC", name: "aaa", typ
On Mon, Nov 24, 2014 at 1:56 PM, aecc wrote:
> I checked sqlContext, they use it in the same way I would like to use my
> class, they make the class Serializable with transient. Does this affects
> somehow the whole pipeline of data moving? I mean, will I get performance
> issues when doing this b
That's an interesting question for which I do not know the answer.
Probably a question for someone with more knowledge of the internals
of the shell interpreter...
On Mon, Nov 24, 2014 at 2:19 PM, aecc wrote:
> Ok, great, I'm gonna do do it that way, thanks :). However I still don't
> understand
Hello,
What exactly are you trying to see? Workers don't generate any events
that would be logged by enabling that config option. Workers generate
logs, and those are captured and saved to disk by the cluster manager,
generally, without you having to do anything.
On Mon, Nov 24, 2014 at 7:46 PM,
On Tue, Dec 2, 2014 at 11:22 AM, Judy Nash
wrote:
> Any suggestion on how can user with custom Hadoop jar solve this issue?
You'll need to include all the dependencies for that custom Hadoop jar
to the classpath. Those will include Guava (which is not included in
its original form as part of the
mllib while calling it from examples
> project?
>
> Thanks & Regards,
> Meethu M
>
>
> On Monday, 24 November 2014 3:33 PM, Yiming (John) Zhang
> wrote:
>
>
> Thank you, Marcelo and Sean, "mvn install" is a good answer for my demands.
>
> -邮件
hat a sub-project depends on?
>>>
>>> i rather avoid "mvn install" since this creates a local maven repo. i have
>>> been stung by that before (spend a day trying to do something and got weird
>>> errors because some toy version i once build was stuck in my
Hello,
In CDH 5.2 you need to manually add Hive classes to the classpath of
your Spark job if you want to use the Hive integration. Also, be aware
that since Spark 1.1 doesn't really support the version of Hive
shipped with CDH 5.2, this combination is to be considered extremely
experimental.
On
Hello,
What do you mean by "app that uses 2 cores and 8G of RAM"?
Spark apps generally involve multiple processes. The command line
options you used affect only one of them (the driver). You may want to
take a look at similar configuration for executors. Also, check the
documentation: http://spar
t developing it as a public API, but mostly for internal Hive use.
It can give you a few ideas, though. Also, SPARK-3215.
On Thu, Dec 11, 2014 at 5:41 PM, Marcelo Vanzin wrote:
> Hi Manoj,
>
> I'm not aware of any public projects that do something like that,
> except for the Ooya
Hi Manoj,
I'm not aware of any public projects that do something like that,
except for the Ooyala server which you say doesn't cover your needs.
We've been playing with something like that inside Hive, though:
On Thu, Dec 11, 2014 at 5:33 PM, Manoj Samel wrote:
> Hi,
>
> If spark based services
Hi,
This is a question more suited for cdh-us...@cloudera.org, since it's
probably CDH-specific. In the meantime, check the following:
- if you're using Yarn, check that you've also updated the copy of the
Spark assembly in HDFS (especially if you're using CM to manage
things)
- make sure all JDK
Hi Anton,
That could solve some of the issues (I've played with that a little
bit). But there are still some areas where this would be sub-optimal,
because Spark still uses system properties in some places and those
are global, not per-class loader.
(SparkSubmit is the biggest offender here, but
On Fri, Dec 19, 2014 at 4:05 PM, Haopu Wang wrote:
> My application doesn’t depends on hadoop-client directly.
>
> It only depends on spark-core_2.10 which depends on hadoop-client 1.0.4.
> This can be checked by Maven repository at
> http://mvnrepository.com/artifact/org.apache.spark/spark-core_2
How many cores / memory do you have available per NodeManager, and how
many cores / memory are you requesting for your job?
Remember that in Yarn mode, Spark launches "num executors + 1"
containers. The extra container, by default, reserves 1 core and about
1g of memory (more if running in cluster
If you don't specify your own log4j.properties, Spark will load the
default one (from
core/src/main/resources/org/apache/spark/log4j-defaults.properties,
which ends up being packaged with the Spark assembly).
You can easily override the config file if you want to, though; check
the "Debugging" sec
Spark doesn't really shade akka; it pulls a different build (kept
under the "org.spark-project.akka" group and, I assume, with some
build-time differences from upstream akka?), but all classes are still
in the original location.
The upgrade is a little more unfortunate than just changing akka,
sin
The Spark code generates the log directory with "770" permissions. On
top of that you need to make sure of two things:
- all directories up to /apps/spark/historyserver/logs/ are readable
by the user running the history server
- the user running the history server belongs to the group that owns
/a
This particular case shouldn't cause problems since both of those
libraries are java-only (the scala version appended there is just for
helping the build scripts).
But it does look weird, so it would be nice to fix it.
On Wed, Jan 7, 2015 at 12:25 AM, Aniket Bhatnagar
wrote:
> It seems that spar
This could be cause by many things including wrong configuration. Hard
to tell with just the info you provided.
Is there any reason why you want to use your own Spark instead of the
one shipped with CDH? CDH 5.3 has Spark 1.2, so unless you really need
to run Spark 1.1, you should be better off wi
apr and the user
> that runs Spark in our case is a unix ID called mapr (in the mapr group).
> Therefore, this can't read my job event logs as shown above.
>
>
> Thanks,
> Michael
>
>
> -Original Message-
> From: Marcelo Vanzin [mailto:van...@clo
Nevermind my last e-mail. HDFS complains about not understanding "3777"...
On Thu, Jan 8, 2015 at 9:46 AM, Marcelo Vanzin wrote:
> Hmm. Can you set the permissions of "/apps/spark/historyserver/logs"
> to 3777? I'm not sure HDFS respects the group id bit, but it
Sorry for the noise; but I just remembered you're actually using MapR
(and not HDFS), so maybe the "3777" trick could work...
On Thu, Jan 8, 2015 at 10:32 AM, Marcelo Vanzin wrote:
> Nevermind my last e-mail. HDFS complains about not understanding "3777"...
>
&
Just to add to Sandy's comment, check your client configuration
(generally in /etc/spark/conf). If you're using CM, you may need to
run the "Deploy Client Configuration" command on the cluster to update
the configs to match the new version of CDH.
On Thu, Jan 8, 2015 at 11:38 AM, Sandy Ryza wrote
Disclaimer: this seems more of a CDH question, I'd suggest sending
these to the CDH mailing list in the future.
CDH 5.2 actually has Spark 1.1. It comes with SparkSQL built-in, but
it does not include the thrift server because of incompatibilities
with the CDH version of Hive. To use Hive support,
Disclaimer: CDH questions are better handled at cdh-us...@cloudera.org.
But the question I'd like to ask is: why do you need your own Spark
build? What's wrong with CDH's Spark that it doesn't work for you?
On Thu, Jan 8, 2015 at 3:01 PM, freedafeng wrote:
> Could anyone come up with your experi
On Thu, Jan 8, 2015 at 3:33 PM, freedafeng wrote:
> I installed the custom as a standalone mode as normal. The master and slaves
> started successfully.
> However, I got error when I ran a job. It seems to me from the error message
> the some library was compiled against hadoop1, but my spark was
Hi Manoj,
As long as you're logged in (i.e. you've run kinit), everything should
just work. You can run "klist" to make sure you're logged in.
On Thu, Jan 8, 2015 at 3:49 PM, Manoj Samel wrote:
> Hi,
>
> For running spark 1.2 on Hadoop cluster with Kerberos, what spark
> configurations are requi
On Thu, Jan 8, 2015 at 4:09 PM, Manoj Samel wrote:
> Some old communication (Oct 14) says Spark is not certified with Kerberos.
> Can someone comment on this aspect ?
Spark standalone doesn't support kerberos. Spark running on top of
Yarn works fine with kerberos.
--
Marcelo
--
I ran this with CDH 5.2 without a problem (sorry don't have 5.3
readily available at the moment):
$ HBASE='/opt/cloudera/parcels/CDH/lib/hbase/\*'
$ spark-submit --driver-class-path $HBASE --conf
"spark.executor.extraClassPath=$HBASE" --master yarn --class
org.apache.spark.examples.HBaseTest
/opt/
Hi Alessandro,
You can look for a log line like this in your driver's output:
15/01/12 10:51:01 INFO storage.DiskBlockManager: Created local
directory at
/data/yarn/nm/usercache/systest/appcache/application_1421081007635_0002/spark-local-20150112105101-4f3d
If you're deploying your application i
Short answer: yes.
Take a look at: http://spark.apache.org/docs/latest/running-on-yarn.html
Look for "memoryOverhead".
On Mon, Jan 12, 2015 at 2:06 PM, Michael Albert
wrote:
> Greetings!
>
> My executors apparently are being terminated because they are
> "running beyond physical memory limits"
As the error message says...
On Wed, Jan 14, 2015 at 3:14 PM, freedafeng wrote:
> Error: Cluster deploy mode is currently not supported for python
> applications.
Use "yarn-client" instead of "yarn-cluster" for pyspark apps.
--
Marcelo
You're specifying the queue in the spark-submit command line:
--queue thequeue
Are you sure that queue exists?
On Thu, Jan 15, 2015 at 11:23 AM, Manoj Samel wrote:
> Hi,
>
> Setup is as follows
>
> Hadoop Cluster 2.3.0 (CDH5.0)
> - Namenode HA
> - Resource manager HA
> - Secured with Kerbero
Hi Kane,
What's the complete command line you're using to submit the app? Where
to you expect these options to appear?
On Fri, Jan 16, 2015 at 11:12 AM, Kane Kim wrote:
> I want to add some java options when submitting application:
> --conf "spark.executor.extraJavaOptions=-XX:+UnlockCommercialF
Hi Kane,
Here's the command line you sent me privately:
./spark-1.2.0-bin-hadoop2.4/bin/spark-submit --class
SimpleApp --conf
"spark.executor.extraJavaOptions=-XX:+UnlockCommercialFeatures
-XX:+FlightRecorder" --master local simpleapp.jar ./test.log
You're running the app in "local" mode. In tha
Hi Francis,
This might be a long shot, but do you happen to have built spark on an
encrypted home dir?
(I was running into the same error when I was doing that. Rebuilding
on an unencrypted disk fixed the issue. This is a known issue /
limitation with ecryptfs. It's weird that the build doesn't f
Hi Ian,
When you run your packaged application, are you adding its jar file to
the SparkContext (by calling the addJar() method)?
That will distribute the code to all the worker nodes. The failure
you're seeing seems to indicate the worker nodes do not have access to
your code.
On Mon, Apr 14, 2
Hi Joe,
If you cache rdd1 but not rdd2, any time you need rdd2's result, it
will have to be computed. It will use rdd1's cached data, but it will
have to compute its result again.
On Mon, Apr 14, 2014 at 5:32 AM, Joe L wrote:
> Hi I am trying to cache 2Gbyte data and to implement the following
> other code running except for that. (BTW, I don't use in my
> code above... I just removed it for security purposes.)
>
> Thanks,
>
> Ian
>
>
>
> On Mon, Apr 14, 2014 at 12:45 PM, Marcelo Vanzin
> wrote:
>>
>> Hi Ian,
>>
>> When
Hi Sung,
On Fri, Apr 18, 2014 at 5:11 PM, Sung Hwan Chung
wrote:
> while (true) {
> rdd.map((row : Array[Double]) => {
> row[numCols - 1] = computeSomething(row)
> }).reduce(...)
> }
>
> If it fails at some point, I'd imagine that the intermediate info being
> stored in row[numCols - 1] w
Hi Sung,
On Mon, Apr 21, 2014 at 10:52 AM, Sung Hwan Chung
wrote:
> The goal is to keep an intermediate value per row in memory, which would
> allow faster subsequent computations. I.e., computeSomething would depend on
> the previous value from the previous computation.
I think the fundamental
Hi Joe,
On Mon, Apr 21, 2014 at 11:23 AM, Joe L wrote:
> And, I haven't gotten any answers to my questions.
One thing that might explain that is that, at least for me, all (and I
mean *all*) of your messages are ending up in my GMail spam folder,
complaining that GMail can't verify that it real
Hi Ken,
On Mon, Apr 21, 2014 at 1:39 PM, Williams, Ken
wrote:
> I haven't figured out how to let the hostname default to the host mentioned
> in our /etc/hadoop/conf/hdfs-site.xml like the Hadoop command-line tools do,
> but that's not so important.
Try adding "/etc/hadoop/conf" to SPARK_CLASS
Hi,
One thing you can do is set the spark version your project depends on
to "1.0.0-SNAPSHOT" (make sure it matches the version of Spark you're
building); then before building your project, run "sbt publishLocal"
on the Spark tree.
On Wed, Apr 30, 2014 at 12:11 AM, wxhsdp wrote:
> i fixed it.
>
Have you tried making A extend Serializable?
On Thu, May 1, 2014 at 3:47 PM, SK wrote:
> Hi,
>
> I have the following code structure. I compiles ok, but at runtime it aborts
> with the error:
> Exception in thread "main" org.apache.spark.SparkException: Job aborted:
> Task not serializable: java
Hi Kristoffer,
You're correct that CDH5 only supports up to Java 7 at the moment. But
Yarn apps do not run in the same JVM as Yarn itself (and I believe MR1
doesn't either), so it might be possible to pass arguments in a way
that tells Yarn to launch the application master / executors with the
Jav
Is that true? I believe that API Chanwit is talking about requires
explicitly asking for files to be cached in HDFS.
Spark automatically benefits from the kernel's page cache (i.e. if
some block is in the kernel's page cache, it will be read more
quickly). But the explicit HDFS cache is a differen
401 - 500 of 507 matches
Mail list logo