Re: Where is yarn-shuffle.jar in maven?

2016-12-13 Thread Marcelo Vanzin
https://mvnrepository.com/artifact/org.apache.spark/spark-network-yarn_2.11/2.0.2 On Mon, Dec 12, 2016 at 9:56 PM, Neal Yin wrote: > Hi, > > For dynamic allocation feature, I need spark-xxx-yarn-shuffle.jar. In my > local spark build, I can see it. But in maven central, I

Re: how can I set the log configuration file for spark history server ?

2016-12-09 Thread Marcelo Vanzin
(-dev) Just configure your log4j.properties in $SPARK_HOME/conf (or set a custom $SPARK_CONF_DIR for the history server). On Thu, Dec 8, 2016 at 7:20 PM, John Fang wrote: > ./start-history-server.sh > starting org.apache.spark.deploy.history.HistoryServer, logging

Re: Kerberos and YARN - functions in spark-shell and spark submit local but not cluster mode

2016-12-08 Thread Marcelo Vanzin
Sure - I wanted to check with admin before sharing. I’ve attached it now, > does this help? > > Many thanks again, > > G > > > >> On 8 Dec 2016, at 20:18, Marcelo Vanzin <van...@cloudera.com> wrote: >> >> Then you probably have a configuration error

Re: Kerberos and YARN - functions in spark-shell and spark submit local but not cluster mode

2016-12-08 Thread Marcelo Vanzin
> I can run the SparkPi test script. The main difference between it and my > application is that it doesn’t access HDFS. > >> On 8 Dec 2016, at 18:43, Marcelo Vanzin <van...@cloudera.com> wrote: >> >> On Wed, Dec 7, 2016 at 11:54 PM, Gerard Casey <gerardhughca...

Re: Kerberos and YARN - functions in spark-shell and spark submit local but not cluster mode

2016-12-08 Thread Marcelo Vanzin
On Wed, Dec 7, 2016 at 11:54 PM, Gerard Casey wrote: > To be specific, where exactly should spark.authenticate be set to true? spark.authenticate has nothing to do with kerberos. It's for authentication between different Spark processes belonging to the same app. --

Re: Kerberos and YARN - functions in spark-shell and spark submit local but not cluster mode

2016-12-07 Thread Marcelo Vanzin
have a configuration issue somewhere. On Wed, Dec 7, 2016 at 1:09 PM, Gerard Casey <gerardhughca...@gmail.com> wrote: > Thanks. > > I’ve checked the TGT, principal and key tab. Where to next?! > >> On 7 Dec 2016, at 22:03, Marcelo Vanzin <van...@cloudera.com> wrote: &

Re: Kerberos and YARN - functions in spark-shell and spark submit local but not cluster mode

2016-12-07 Thread Marcelo Vanzin
On Wed, Dec 7, 2016 at 12:15 PM, Gerard Casey wrote: > Can anyone point me to a tutorial or a run through of how to use Spark with > Kerberos? This is proving to be quite confusing. Most search results on the > topic point to what needs inputted at the point of `sparks

Re: Kerberos and YARN - functions in spark-shell and spark submit local but not cluster mode

2016-12-05 Thread Marcelo Vanzin
ngMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:497) > at > org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:731) > at > org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSu

Re: Kerberos and YARN - functions in spark-shell and spark submit local but not cluster mode

2016-12-05 Thread Marcelo Vanzin
There's generally an exception in these cases, and you haven't posted it, so it's hard to tell you what's wrong. The most probable cause, without the extra information the exception provides, is that you're using the wrong Hadoop configuration when submitting the job to YARN. On Mon, Dec 5, 2016

Re: Does the delegator map task of SparkLauncher need to stay alive until Spark job finishes ?

2016-11-15 Thread Marcelo Vanzin
On Tue, Nov 15, 2016 at 5:57 PM, Elkhan Dadashov wrote: > This is confusing in the sense that, the client needs to stay alive for > Spark Job to finish successfully. > > Actually the client can die or finish (in Yarn-cluster mode), and the spark > job will successfully

Re: Correct SparkLauncher usage

2016-11-10 Thread Marcelo Vanzin
mage: https://]about.me/mti >> >> <https://about.me/mti?promo=email_sig_source=email_sig_medium=external_link_campaign=chrome_ext> >> >> >> >> >> [image: http://] >> >> Tariq, Mohammad >> about.me/mti >> [image: http://] >&

Re: Correct SparkLauncher usage

2016-11-10 Thread Marcelo Vanzin
On Thu, Nov 10, 2016 at 2:43 PM, Mohammad Tariq wrote: > @Override > public void stateChanged(SparkAppHandle handle) { > System.out.println("Spark App Id [" + handle.getAppId() + "]. State [" + > handle.getState() + "]"); > while(!handle.getState().isFinal()) {

Re: Correct SparkLauncher usage

2016-11-07 Thread Marcelo Vanzin
ternal_link_campaign=chrome_ext> > > > > > [image: http://] > > Tariq, Mohammad > about.me/mti > [image: http://] > <http://about.me/mti> > > > On Tue, Nov 8, 2016 at 5:06 AM, Marcelo Vanzin <van...@cloudera.com> > wrote: &

Re: Correct SparkLauncher usage

2016-11-07 Thread Marcelo Vanzin
On Mon, Nov 7, 2016 at 3:29 PM, Mohammad Tariq wrote: > I have been trying to use SparkLauncher.startApplication() to launch a Spark > app from within java code, but unable to do so. However, same piece of code > is working if I use SparkLauncher.launch(). > > Here are the

Re: SparkLauncer 2.0.1 version working incosistently in yarn-client mode

2016-11-07 Thread Marcelo Vanzin
On Sat, Nov 5, 2016 at 2:54 AM, Elkhan Dadashov wrote: > while (appHandle.getState() == null || !appHandle.getState().isFinal()) { > if (appHandle.getState() != null) { > log.info("while: Spark job state is : " + appHandle.getState()); > if

Re: Delegation Token renewal in yarn-cluster

2016-11-04 Thread Marcelo Vanzin
On Fri, Nov 4, 2016 at 1:57 AM, Zsolt Tóth wrote: > This was what confused me in the first place. Why does Spark ask for new > tokens based on the renew-interval instead of the max-lifetime? It could be just a harmless bug, since tokens have a "getMaxDate()" method

Re: Delegation Token renewal in yarn-cluster

2016-11-03 Thread Marcelo Vanzin
On Thu, Nov 3, 2016 at 3:47 PM, Zsolt Tóth wrote: > What is the purpose of the delegation token renewal (the one that is done > automatically by Hadoop libraries, after 1 day by default)? It seems that it > always happens (every day) until the token expires, no matter

Re: Delegation Token renewal in yarn-cluster

2016-11-03 Thread Marcelo Vanzin
> manager somehow automatically renews the delegation tokens for my > application? > > 2016-11-03 21:34 GMT+01:00 Marcelo Vanzin <van...@cloudera.com>: >> >> Sounds like your test was set up incorrectly. The default TTL for >> tokens is 7 days. Did you ch

Re: Delegation Token renewal in yarn-cluster

2016-11-03 Thread Marcelo Vanzin
Sounds like your test was set up incorrectly. The default TTL for tokens is 7 days. Did you change that in the HDFS config? The issue definitely exists and people definitely have run into it. So if you're not hitting it, it's most definitely an issue with your test configuration. On Thu, Nov 3,

Re: Can i get callback notification on Spark job completion ?

2016-10-28 Thread Marcelo Vanzin
On Fri, Oct 28, 2016 at 11:14 AM, Elkhan Dadashov wrote: > But if the map task will finish before the Spark job finishes, that means > SparkLauncher will go away. if the SparkLauncher handle goes away, then I > lose the ability to track the app's state, right ? > > I'm

Re: Can i get callback notification on Spark job completion ?

2016-10-28 Thread Marcelo Vanzin
If you look at the "startApplication" method it takes listeners as parameters. On Fri, Oct 28, 2016 at 10:23 AM, Elkhan Dadashov wrote: > Hi, > > I know that we can use SparkAppHandle (introduced in SparkLauncher version >>=1.6), and lt the delegator map task stay alive

Re: Does the delegator map task of SparkLauncher need to stay alive until Spark job finishes ?

2016-10-18 Thread Marcelo Vanzin
On Tue, Oct 18, 2016 at 3:01 PM, Elkhan Dadashov wrote: > Does my map task need to wait until Spark job finishes ? No... > Or is there any way, my map task finishes after launching Spark job, and I > can still query and get status of Spark job outside of map task (or

Re: Add sqldriver.jar to Spark 1.6.0 executors

2016-09-14 Thread Marcelo Vanzin
Use: spark-submit --jars /path/sqldriver.jar --conf spark.driver.extraClassPath=sqldriver.jar --conf spark.executor.extraClassPath=sqldriver.jar In client mode the driver's classpath needs to point to the full path, not just the name. On Wed, Sep 14, 2016 at 5:42 AM, Kevin Tran

Re: Spark 2.0.0 won't let you create a new SparkContext?

2016-09-13 Thread Marcelo Vanzin
You're running spark-shell. It already creates a SparkContext for you and makes it available in a variable called "sc". If you want to change the config of spark-shell's context, you need to use command line option. (Or stop the existing context first, although I'm not sure how well that will

Re: YARN memory overhead settings

2016-09-06 Thread Marcelo Vanzin
It kinda depends on the application. Certain compression libraries, in particular, are kinda lax with their use of off-heap buffers, so if you configure executors to use many cores you might end up with higher usage than the default configuration. Then there are also things like PARQUET-118. In

Re: Spark launcher handle and listener not giving state

2016-08-29 Thread Marcelo Vanzin
You haven't said which version of Spark you are using. The state API only works if the underlying Spark version is also 1.6 or later. On Mon, Aug 29, 2016 at 4:36 PM, ckanth99 wrote: > Hi All, > > I have a web application which will submit spark jobs on Cloudera spark >

Re: spark-jdbc impala with kerberos using yarn-client

2016-08-24 Thread Marcelo Vanzin
I believe the Impala JDBC driver is mostly the same as the Hive driver, but I could be wrong. In any case, the right place to ask that question is the Impala groups (see http://impala.apache.org/). On a side note, it is a little odd that you're trying to read data from Impala using JDBC, instead

Re: spark historyserver backwards compatible

2016-08-05 Thread Marcelo Vanzin
Yes, the 2.0 history server should be backwards compatible. On Fri, Aug 5, 2016 at 2:14 PM, Koert Kuipers wrote: > we have spark 1.5.x, 1.6.x and 2.0.0 job running on yarn > > but yarn can have only one spark history server. > > what to do? is it safe to use the spark 2

Re: ClassNotFoundException org.apache.spark.Logging

2016-08-05 Thread Marcelo Vanzin
On Fri, Aug 5, 2016 at 9:53 AM, Carlo.Allocca wrote: > > org.apache.spark > spark-core_2.10 > 2.0.0 > jar > > > org.apache.spark > spark-sql_2.10 > 2.0.0 >

Re: 2.0.0 packages for twitter streaming, flume and other connectors

2016-08-03 Thread Marcelo Vanzin
The Flume connector is still available from Spark: http://search.maven.org/#artifactdetails%7Corg.apache.spark%7Cspark-streaming-flume-assembly_2.11%7C2.0.0%7Cjar Many of the others have indeed been removed from Spark, and can be found at the Apache Bahir project: http://bahir.apache.org/ I

Re: spark run shell On yarn

2016-07-28 Thread Marcelo Vanzin
! solved !! > But this is a bug? > === > Name: cen sujun > Mobile: 13067874572 > Mail: ce...@lotuseed.com > > 在 2016年7月29日,08:19,Marcelo Vanzin <van...@cloudera.com> 写道: > > spark.hadoop.yarn.timelin

Re: spark run shell On yarn

2016-07-28 Thread Marcelo Vanzin
You can probably do that in Spark's conf too: spark.hadoop.yarn.timeline-service.enabled=false On Thu, Jul 28, 2016 at 5:13 PM, Jeff Zhang wrote: > One workaround is disable timeline in yarn-site, > > set yarn.timeline-service.enabled as false in yarn-site.xml > > On Thu, Jul

Re: Silly question about Yarn client vs Yarn cluster modes...

2016-06-22 Thread Marcelo Vanzin
On Wed, Jun 22, 2016 at 1:32 PM, Mich Talebzadeh wrote: > Does it also depend on the number of Spark nodes involved in choosing which > way to go? Not really. -- Marcelo - To unsubscribe, e-mail:

Re: Silly question about Yarn client vs Yarn cluster modes...

2016-06-22 Thread Marcelo Vanzin
Trying to keep the answer short and simple... On Wed, Jun 22, 2016 at 1:19 PM, Michael Segel wrote: > But this gets to the question… what are the real differences between client > and cluster modes? > What are the pros/cons and use cases where one has advantages over

Re: Spark 2.0 on YARN - Files in config archive not ending up on executor classpath

2016-06-20 Thread Marcelo Vanzin
It doesn't hurt to have a bug tracking it, in case anyone else has time to look at it before I do. On Mon, Jun 20, 2016 at 1:20 PM, Jonathan Kelly <jonathaka...@gmail.com> wrote: > Thanks for the confirmation! Shall I cut a JIRA issue? > > On Mon, Jun 20, 2016 at 10:42 AM Marc

Re: Spark 2.0 on YARN - Files in config archive not ending up on executor classpath

2016-06-20 Thread Marcelo Vanzin
I just tried this locally and can see the wrong behavior you mention. I'm running a somewhat old build of 2.0, but I'll take a look. On Mon, Jun 20, 2016 at 7:04 AM, Jonathan Kelly wrote: > Does anybody have any thoughts on this? > > On Fri, Jun 17, 2016 at 6:36 PM

Re: Apache Spark security.NosuchAlgorithm exception on changing from java 7 to java 8

2016-06-06 Thread Marcelo Vanzin
nk you Marcelo. I don't know how to remove it. Could you please tell me > how I can remove that configuration? > > On Mon, Jun 6, 2016 at 5:04 PM, Marcelo Vanzin <van...@cloudera.com> wrote: >> >> This sounds like your default Spark configuration has an >> "

Re: Apache Spark security.NosuchAlgorithm exception on changing from java 7 to java 8

2016-06-06 Thread Marcelo Vanzin
This sounds like your default Spark configuration has an "enabledAlgorithms" config in the SSL settings, and that is listing an algorithm name that is not available in jdk8. Either remove that configuration (to use the JDK's default algorithm list), or change it so that it lists algorithms

Re: Unable to set ContextClassLoader in spark shell

2016-06-06 Thread Marcelo Vanzin
On Mon, Jun 6, 2016 at 4:22 AM, shengzhixia wrote: > In my previous Java project I can change class loader without problem. Could > I know why the above method couldn't change class loader in spark shell? > Any way I can achieve it? The spark-shell for Scala 2.10 will

Re: Spark job is failing with kerberos error while creating hive context in yarn-cluster mode (through spark-submit)

2016-05-23 Thread Marcelo Vanzin
On Mon, May 23, 2016 at 4:41 AM, Chandraprakash Bhagtani wrote: > I am passing hive-site.xml through --files option. You need hive-site-xml in Spark's classpath too. Easiest way is to copy / symlink hive-site.xml in your Spark's conf directory. -- Marcelo

Re: Can not set spark dynamic resource allocation

2016-05-20 Thread Marcelo Vanzin
Hi Weifeng, That's the Spark event log, not the YARN application log. You get the latter using the "yarn logs" command. On Fri, May 20, 2016 at 1:14 PM, Cui, Weifeng wrote: > Here is the application log for this spark job. > > http://pastebin.com/2UJS9L4e > > > > Thanks, >

Re: Starting executor without a master

2016-05-19 Thread Marcelo Vanzin
On Thu, May 19, 2016 at 6:06 PM, Mathieu Longtin wrote: > I'm looking to bypass the master entirely. I manage the workers outside of > Spark. So I want to start the driver, the start workers that connect > directly to the driver. It should be possible to do that if you

Re: Starting executor without a master

2016-05-19 Thread Marcelo Vanzin
Hi Mathieu, There's nothing like that in Spark currently. For that, you'd need a new cluster manager implementation that knows how to start executors in those remote machines (e.g. by running ssh or something). In the current master there's an interface you can implement to try that if you

Re: SLF4J binding error while running Spark using YARN as Cluster Manager

2016-05-18 Thread Marcelo Vanzin
Hi Anubhav, This is happening because you're trying to use the configuration generated for CDH with upstream Spark. The CDH configuration will add extra needed jars that we don't include in our build of Spark, so you'll end up getting duplicate classes. You can either try to use a different

Re: How to use the spark submit script / capability

2016-05-15 Thread Marcelo Vanzin
ps://issues.apache.org/jira/secure/ViewProfile.jspa?name=zjffdu> added > a comment - 26/Nov/15 08:15 > > Marcelo Vanzin > <https://issues.apache.org/jira/secure/ViewProfile.jspa?name=vanzin> Is > there any user document about it ? I didn't find it on the spark official > site. If this

Re: How to use the spark submit script / capability

2016-05-15 Thread Marcelo Vanzin
hen Boesch <java...@gmail.com> wrote: > > There is a committed PR from Marcelo Vanzin addressing that capability: > > https://github.com/apache/spark/pull/3916/files > > Is there any documentation on how to use this? The PR itself has two > comments asking for the docs t

Re: How to transform a JSON string into a Java HashMap<> java.io.NotSerializableException

2016-05-11 Thread Marcelo Vanzin
Is the class mentioned in the exception below the parent class of the anonymous "Function" class you're creating? If so, you may need to make it serializable. Or make your function a proper "standalone" class (either a nested static class or a top-level one). On Wed, May 11, 2016 at 3:55 PM,

Re: spark 2.0 issue with yarn?

2016-05-09 Thread Marcelo Vanzin
On Mon, May 9, 2016 at 3:34 PM, Matt Cheah wrote: > @Marcelo: Interesting - why would this manifest on the YARN-client side > though (as Spark is the client to YARN in this case)? Spark as a client > shouldn’t care about what auxiliary services are on the YARN cluster. The

Re: spark 2.0 issue with yarn?

2016-05-09 Thread Marcelo Vanzin
Hi Jesse, On Mon, May 9, 2016 at 2:52 PM, Jesse F Chen wrote: > Sean - thanks. definitely related to SPARK-12154. > Is there a way to continue use Jersey 1 for existing working environment? The error you're getting is because of a third-party extension that tries to talk to

Re: Redirect from yarn to spark history server

2016-05-02 Thread Marcelo Vanzin
See http://spark.apache.org/docs/latest/running-on-yarn.html, especially the parts that talk about spark.yarn.historyServer.address. On Mon, May 2, 2016 at 2:14 PM, satish saley wrote: > > > Hello, > > I am running pyspark job using yarn-cluster mode. I can see spark job

Re: Which jar file has import org.apache.spark.internal.Logging

2016-04-22 Thread Marcelo Vanzin
On Fri, Apr 22, 2016 at 10:38 AM, Mich Talebzadeh wrote: > I am trying to test Spark with CEP and I have been shown a sample here >

Re: Which jar file has import org.apache.spark.internal.Logging

2016-04-22 Thread Marcelo Vanzin
Sorry, I've been looking at this thread and the related ones and one thing I still don't understand is: why are you trying to use internal Spark classes like Logging and SparkFunSuite in your code? Unless you're writing code that lives inside Spark, you really shouldn't be trying to reference

Re: Error with --files

2016-04-14 Thread Marcelo Vanzin
On Thu, Apr 14, 2016 at 2:14 PM, Benjamin Zaitlen wrote: >> spark-submit --master yarn-cluster /home/ubuntu/test_spark.py --files >> /home/ubuntu/localtest.txt#appSees.txt --files should come before the path to your python script. Otherwise it's just passed as arguments to

Re: Spark 1.6.0 - token renew failure

2016-04-14 Thread Marcelo Vanzin
You can set "spark.yarn.security.tokens.hive.enabled=false" in your config, although your app won't work if you actually need Hive delegation tokens. On Thu, Apr 14, 2016 at 12:21 AM, Luca Rea wrote: > Hi Jeff, > > > > Thank you for your support, I’ve removed

Re: Thread-safety of a SparkListener

2016-04-01 Thread Marcelo Vanzin
On Fri, Apr 1, 2016 at 9:23 AM, Truong Duc Kien wrote: > I need to gather some metrics using a SparkListener. Does the callback > methods need to thread-safe or they are always call from the same thread ? The callbacks are all fired on the same thread. Just be careful

Re: spark shuffle service on yarn

2016-03-21 Thread Marcelo Vanzin
If you use any shuffle service before 2.0 it should be compatible with all previous releases. The 2.0 version has currently an incompatibility that we should probably patch before releasing 2.0, to support this kind of use case (among others). On Fri, Mar 18, 2016 at 7:25 PM, Koert Kuipers

Re: SparkConf does not work for spark.driver.memory

2016-02-18 Thread Marcelo Vanzin
On Thu, Feb 18, 2016 at 10:26 AM, wgtmac wrote: > In the code, I did following: > val sc = new SparkContext(new > SparkConf().setAppName("test").set("spark.driver.memory", "4g")) You can't set the driver memory like this, in any deploy mode. When that code runs, the driver is

Re: Help needed in deleting a message posted in Spark User List

2016-02-05 Thread Marcelo Vanzin
You don't... just send a new one. On Fri, Feb 5, 2016 at 9:33 AM, swetha kasireddy wrote: > Hi, > > I want to edit/delete a message posted in Spark User List. How do I do that? > > Thanks! -- Marcelo

Re: Spark 1.5.2 Yarn Application Master - resiliencey

2016-02-03 Thread Marcelo Vanzin
Without the exact error from the driver that caused the job to restart, it's hard to tell. But a simple way to improve things is to install the Spark shuffle service on the YARN nodes, so that even if an executor crashes, its shuffle output is still available to other executors. On Wed, Feb 3,

Re: Spark 1.5.2 Yarn Application Master - resiliencey

2016-02-03 Thread Marcelo Vanzin
urce-allocation > > > > On Wed, Feb 3, 2016 at 11:50 AM, Marcelo Vanzin <van...@cloudera.com> > wrote: > >> Without the exact error from the driver that caused the job to restart, >> it's hard to tell. But a simple way to improve things is to install the >&

Re: Re: --driver-java-options not support multiple JVM configuration ?

2016-01-21 Thread Marcelo Vanzin
> Unrecognized VM option > 'newsize=2096m,-XX:MaxPermSize=512m,-XX:+PrintGCDetails,-XX:+PrintGCTimeStamps,-XX:+UseParNewGC,-XX:+UseConcMarkSweepGC,-XX:CMSInitiatingOccupancyFraction=80,-XX:GCTimeLimit=5,-XX:GCHeapFreeLimit=95' > > > > > From: Marcelo Vanzin > Date: 2016

Re: Spark Yarn executor memory overhead content

2016-01-21 Thread Marcelo Vanzin
On Thu, Jan 21, 2016 at 5:42 AM, Olivier Devoisin wrote: > The documentation states that it contains VM overheads, interned strings and > other native overheads. However it's really vague. It's intentionally vague, because it's "everything that is not Java

Re: --driver-java-options not support multiple JVM configuration ?

2016-01-20 Thread Marcelo Vanzin
On Wed, Jan 20, 2016 at 7:38 PM, our...@cnsuning.com wrote: > --driver-java-options $sparkdriverextraJavaOptions \ You need quotes around "$sparkdriverextraJavaOptions". -- Marcelo - To unsubscribe,

Re: strange behavior in spark yarn-client mode

2016-01-14 Thread Marcelo Vanzin
On Thu, Jan 14, 2016 at 10:17 AM, Sanjeev Verma wrote: > now it spawn a single executors with 1060M size, I am not able to understand > why this time it executes executors with 1G+overhead not 2G what I > specified. Where are you looking for the memory size for the

Re: strange behavior in spark yarn-client mode

2016-01-14 Thread Marcelo Vanzin
> I am looking into the web ui of spark application master(tab executors). > > On Fri, Jan 15, 2016 at 12:08 AM, Marcelo Vanzin <van...@cloudera.com> > wrote: >> >> On Thu, Jan 14, 2016 at 10:17 AM, Sanjeev Verma >> <sanjeev.verm...@gmail.com> wrote: >> &

Re: yarn-client: SparkSubmitDriverBootstrapper not found in yarn client mode (1.6.0)

2016-01-13 Thread Marcelo Vanzin
SparkSubmitDriverBootstrapper was removed back in Spark 1.4, so it seems you have a mixbag of 1.3 / 1.6 in your path / classpath and things are failing because of that. On Wed, Jan 13, 2016 at 9:31 AM, Lin Zhao wrote: > My job runs fine in yarn cluster mode but I have reason to

Re: What should be the ideal value(unit) for spark.memory.offheap.size

2016-01-06 Thread Marcelo Vanzin
Try "git grep -i spark.memory.offheap.size"... On Wed, Jan 6, 2016 at 2:45 PM, Ted Yu wrote: > Maybe I looked in the wrong files - I searched *.scala and *.java files (in > latest Spark 1.6.0 RC) for '.offheap.' but didn't find the config. > > Can someone enlighten me ? > >

Re: problem building spark on centos

2016-01-06 Thread Marcelo Vanzin
If you're trying to compile against Scala 2.11, you're missing "-Dscala-2.11" in that command. On Wed, Jan 6, 2016 at 12:27 PM, Jade Liu wrote: > Hi, Todd: > > Thanks for your suggestion. Yes I did run the ./dev/change-scala-version.sh > 2.11 script when using scala version

Re: Monitor Job on Yarn

2016-01-04 Thread Marcelo Vanzin
You should be looking at the YARN RM web ui to monitor YARN applications; that will have a link to the Spark application's UI, along with other YARN-related information. Also, if you run the app in client mode, it might be easier to debug it until you know it's running properly (since you'll see

Re: Spark REST API shows Error 503 Service Unavailable

2015-12-17 Thread Marcelo Vanzin
Hi Prateek, Are you using CDH 5.5 by any chance? We fixed this bug in an upcoming patch. Unfortunately there's no workaround at the moment... it doesn't affect upstream Spark either. On Fri, Dec 11, 2015 at 2:05 PM, prateek arora wrote: > > > Hi > > I am trying to

Re: Spark REST API shows Error 503 Service Unavailable

2015-12-17 Thread Marcelo Vanzin
On Thu, Dec 17, 2015 at 3:31 PM, Vikram Kone wrote: > No we are using standard spark w/ datastax cassandra. I'm able to see some > json when I do http://10.1.40.16:7080/json/v1/applications > but getting the following errors when I do >

Re: spark.authenticate=true YARN mode doesn't work

2015-12-07 Thread Marcelo Vanzin
m attaching all container logs. can you please take a look at it when you > get a chance. > > Thanks > Prasad > > On Sat, Dec 5, 2015 at 2:30 PM, Marcelo Vanzin <van...@cloudera.com> wrote: >> >> On Fri, Dec 4, 2015 at 5:47 PM, prasadreddy <alle.re...@gma

Re: spark.authenticate=true YARN mode doesn't work

2015-12-05 Thread Marcelo Vanzin
Hi Prasad, please reply to the list so that others can benefit / help. On Sat, Dec 5, 2015 at 4:06 PM, Prasad Reddy wrote: > Have you had a chance to try this authentication for any of your projects > earlier. Yes, we run with authenticate=true by default. It works fine.

Re: spark.authenticate=true YARN mode doesn't work

2015-12-05 Thread Marcelo Vanzin
On Fri, Dec 4, 2015 at 5:47 PM, prasadreddy wrote: > I am running Spark YARN and trying to enable authentication by setting > spark.authenticate=true. After enable authentication I am not able to Run > Spark word count or any other programs. Define "I am not able to run".

Re: Any clue on this error, Exception in thread "main" java.lang.NoSuchFieldError: SPARK_RPC_CLIENT_CONNECT_TIMEOUT

2015-12-03 Thread Marcelo Vanzin
(bcc: user@spark, since this is Hive code.) You're probably including unneeded Spark jars in Hive's classpath somehow. Either the whole assembly or spark-hive, both of which will contain Hive classes, and in this case contain old versions that conflict with the version of Hive you're running. On

Re: Any clue on this error, Exception in thread "main" java.lang.NoSuchFieldError: SPARK_RPC_CLIENT_CONNECT_TIMEOUT

2015-12-03 Thread Marcelo Vanzin
On Thu, Dec 3, 2015 at 10:32 AM, Mich Talebzadeh wrote: > hduser@rhes564::/usr/lib/spark/logs> hive --version > SLF4J: Found binding in > [jar:file:/usr/lib/spark/lib/spark-assembly-1.3.0-hadoop2.4.0.jar!/org/slf4j/impl/StaticLoggerBinder.class] As I suggested before, you

Re: ClassLoader resources on executor

2015-12-02 Thread Marcelo Vanzin
On Tue, Dec 1, 2015 at 12:45 PM, Charles Allen wrote: > Is there a way to pass configuration file resources to be resolvable through > the classloader? Not in general. If you're using YARN, you can cheat and use "spark.yarn.dist.files" which will place those files

Re: Question about yarn-cluster mode and spark.driver.allowMultipleContexts

2015-12-02 Thread Marcelo Vanzin
On Tue, Dec 1, 2015 at 9:43 PM, Anfernee Xu wrote: > But I have a single server(JVM) that is creating SparkContext, are you > saying Spark supports multiple SparkContext in the same JVM? Could you > please clarify on this? I'm confused. Nothing you said so far requires

Re: Question about yarn-cluster mode and spark.driver.allowMultipleContexts

2015-12-01 Thread Marcelo Vanzin
On Tue, Dec 1, 2015 at 3:32 PM, Anfernee Xu wrote: > I have a long running backend server where I will create a short-lived Spark > job in response to each user request, base on the fact that by default > multiple Spark Context cannot be created in the same JVM, looks like

Re: Port Control for YARN-Aware Spark

2015-11-23 Thread Marcelo Vanzin
On Mon, Nov 23, 2015 at 6:24 PM, gpriestley wrote: > Questions I have are: > 1) How does the spark.yarn.am.port relate to defined ports within Spark > (driver, executor, block manager, etc.)? > 2) Doe the spark.yarn.am.port parameter only relate to the spark >

Re: Anybody hit this issue in spark shell?

2015-11-09 Thread Marcelo Vanzin
We've had this in the past when using "@VisibleForTesting" in classes that for some reason the shell tries to process. QueryExecution.scala seems to use that annotation and that was added recently, so that's probably the issue. BTW, if anyone knows how Scala can find a reference to the original

Re: Anybody hit this issue in spark shell?

2015-11-09 Thread Marcelo Vanzin
On Mon, Nov 9, 2015 at 5:54 PM, Ted Yu <yuzhih...@gmail.com> wrote: > If there is no option to let shell skip processing @VisibleForTesting , > should the annotation be dropped ? That's what we did last time this showed up. > On Mon, Nov 9, 2015 at 5:50 PM, Marcelo Vanzin <v

Re: Guava ClassLoading Issue When Using Different Hive Metastore Version

2015-11-05 Thread Marcelo Vanzin
On Thu, Nov 5, 2015 at 3:41 PM, Joey Paskhay wrote: > We verified the Guava libraries are in the huge list of the included jars, > but we saw that in the > org.apache.spark.sql.hive.client.IsolatedClientLoader.isSharedClass method > it seems to assume that *all*

Re: Is the resources specified in configuration shared by all jobs?

2015-11-04 Thread Marcelo Vanzin
Resources belong to the application, not each job, so the latter. On Wed, Nov 4, 2015 at 9:24 AM, Nisrina Luthfiyati wrote: > Hi all, > > I'm running some spark jobs in java on top of YARN by submitting one > application jar that starts multiple jobs. > My question

Re: Spark dynamic allocation config

2015-11-03 Thread Marcelo Vanzin
Hi, your question is really CM-related and not Spark-related, so I'm bcc'ing the list and will reply separately. On Tue, Nov 3, 2015 at 11:08 AM, billou2k wrote: > Hi, > Sorry this is probably a silly question but > I have a standard CDH 5.4.2 config with Spark 1.3 and

Re: [Yarn] How to set user in ContainerLaunchContext?

2015-11-02 Thread Marcelo Vanzin
You can try the "--proxy-user" command line argument for spark-submit. That requires that your RM configuration allows the user running your AM to "proxy" other users. And I'm not completely sure it works without Kerberos. See:

Re: [Spark-SQL]: Unable to propagate hadoop configuration after SparkContext is initialized

2015-10-27 Thread Marcelo Vanzin
On Tue, Oct 27, 2015 at 10:43 AM, Jerry Lam wrote: > Anyone experiences issues in setting hadoop configurations after > SparkContext is initialized? I'm using Spark 1.5.1. > > I'm trying to use s3a which requires access and secret key set into hadoop > configuration. I tried

Re: [Spark-SQL]: Unable to propagate hadoop configuration after SparkContext is initialized

2015-10-27 Thread Marcelo Vanzin
Best Regards, > > Jerry > > > On Tue, Oct 27, 2015 at 2:05 PM, Marcelo Vanzin <van...@cloudera.com> wrote: >> >> On Tue, Oct 27, 2015 at 10:43 AM, Jerry Lam <chiling...@gmail.com> wrote: >> > Anyone experiences issues in setting hadoop configurations

Re: Programmatically connect to remote YARN in yarn-client mode

2015-10-14 Thread Marcelo Vanzin
On Wed, Oct 14, 2015 at 10:01 AM, Florian Kaspar wrote: > we are working on a project running on Spark. Currently we connect to a > remote Spark-Cluster in Standalone mode to obtain the SparkContext using > > new JavaSparkContext(new >

Re: Programmatically connect to remote YARN in yarn-client mode

2015-10-14 Thread Marcelo Vanzin
On Wed, Oct 14, 2015 at 10:29 AM, Florian Kaspar wrote: > so it is possible to simply copy the YARN configuration from the remote > cluster to the local machine (assuming, the local machine can resolve the > YARN host etc.) and just letting Spark do the rest? > Yes,

Re: Spark shuffle service does not work in stand alone

2015-10-13 Thread Marcelo Vanzin
It would probably be more helpful if you looked for the executor error and posted it. The screenshot you posted is the driver exception caused by the task failure, which is not terribly useful. On Tue, Oct 13, 2015 at 7:23 AM, wrote: > Has anyone tried shuffle

Re: Spark shuffle service does not work in stand alone

2015-10-13 Thread Marcelo Vanzin
tty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:528) > > at > io.netty.channel.nio.NioEventLoop.processSelectedKeysOptimized(NioEventLoop.java:468) > > at > io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:382) > >

Re: compatibility issue with Jersey2

2015-10-07 Thread Marcelo Vanzin
arkSubmit.scala:193) > at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:112) > at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala) > > > On 6 October 2015 at 16:20, Marcelo Vanzin <van...@cloudera.com> wrote: >> >> On Tue, Oct

Re: compatibility issue with Jersey2

2015-10-06 Thread Marcelo Vanzin
On Tue, Oct 6, 2015 at 12:04 PM, Gary Ogden wrote: > But we run unit tests differently in our build environment, which is > throwing the error. It's setup like this: > > I suspect this is what you were referring to when you said I have a problem? Yes, that is what I was

Re: compatibility issue with Jersey2

2015-10-06 Thread Marcelo Vanzin
On Tue, Oct 6, 2015 at 5:57 AM, oggie wrote: > We have a Java app written with spark 1.3.1. That app also uses Jersey 2.9 > client to make external calls. We see spark 1.4.1 uses Jersey 1.9. How is this app deployed? If it's run via spark-submit, you could use

Re: How does FAIR job scheduler work in Standalone cluster mode?

2015-10-02 Thread Marcelo Vanzin
You're mixing app scheduling in the cluster manager (your [1] link) with job scheduling within an app (your [2] link). They're independent things. On Fri, Oct 2, 2015 at 2:22 PM, Jacek Laskowski wrote: > Hi, > > The docs in Resource Scheduling [1] says: > >> The standalone

Re: How does FAIR job scheduler work in Standalone cluster mode?

2015-10-02 Thread Marcelo Vanzin
On Fri, Oct 2, 2015 at 5:29 PM, Jacek Laskowski wrote: >> The standalone cluster mode currently only supports a simple FIFO scheduler >> across applications. > > is correct or not? :( I think so. But, because they're different things, that does not mean you cannot use a fair

Re: Pyspark: "Error: No main class set in JAR; please specify one with --class"

2015-10-01 Thread Marcelo Vanzin
How are you running the actual application? I find it slightly odd that you're setting PYSPARK_SUBMIT_ARGS directly; that's supposed to be an internal env variable used by Spark. You'd normally pass those parameters in the spark-submit (or pyspark) command line. On Thu, Oct 1, 2015 at 8:56 AM,

Re: sc.parallelize with defaultParallelism=1

2015-09-30 Thread Marcelo Vanzin
If you want to process the data locally, why do you need to use sc.parallelize? Store the data in regular Scala collections and use their methods to process them (they have pretty much the same set of methods as Spark RDDs). Then when you're happy, finally use Spark to process the pre-processed

Re: Where are logs for Spark Kafka Yarn on Cloudera

2015-09-29 Thread Marcelo Vanzin
(-dev@) Try using the "yarn logs" command to read logs for finished applications. You can also browse the RM UI to find more information about the applications you ran. On Mon, Sep 28, 2015 at 11:37 PM, Rachana Srivastava wrote: > Hello all, > > > > I am

<    1   2   3   4   5   >