Re: Spark on Cloudera Configuration (Scheduler Mode = FAIR)

2017-07-21 Thread Marcelo Vanzin
On Fri, Jul 21, 2017 at 5:00 AM, Gokula Krishnan D wrote: > Is there anyway can we setup the scheduler mode in Spark Cluster level > besides application (SC level). That's called the cluster (or resource) manager. e.g., configure separate queues in YARN with a maximum number of resources for each

Re: how to set the assignee in JIRA please?

2017-07-24 Thread Marcelo Vanzin
We don't generally set assignees. Submit a PR on github and the PR will be linked on JIRA; if your PR is submitted, then the bug is assigned to you. On Mon, Jul 24, 2017 at 5:57 PM, 萝卜丝炒饭 <1427357...@qq.com> wrote: > Hi all, > If I want to do some work about an issue registed in JIRA, how to set t

Re: how to set the assignee in JIRA please?

2017-07-24 Thread Marcelo Vanzin
On Mon, Jul 24, 2017 at 6:04 PM, Hyukjin Kwon wrote: > However, I see some JIRAs are assigned to someone time to time. Were those > mistakes or would you mind if I ask when someone is assigned? I'm not sure if there are any guidelines of when to assign; since there has been an agreement that bugs

Re: running spark application compiled with 1.6 on spark 2.1 cluster

2017-07-27 Thread Marcelo Vanzin
On Wed, Jul 26, 2017 at 10:45 PM, satishl wrote: > is this a supported scenario - i.e., can I run app compiled with spark 1.6 > on a 2.+ spark cluster? In general, no. -- Marcelo - To unsubscribe e-mail: user-unsubscr...@spark

Re: Spark2.1 installation issue

2017-07-27 Thread Marcelo Vanzin
Hello, This is a CDH-specific issue, please use the Cloudera forums / support line instead of the Apache group. On Thu, Jul 27, 2017 at 10:54 AM, Vikash Kumar wrote: > I have installed spark2 parcel through cloudera CDH 12.0. I see some issue > there. Look like it didn't got configured properly.

Re: --jars from spark-submit on master on YARN don't get added properly to the executors - ClassNotFoundException

2017-08-09 Thread Marcelo Vanzin
Jars distributed using --jars are not added to the system classpath, so log4j cannot see them. To work around that, you need to manually add the *name* jar to the driver executor classpaths: spark.driver.extraClassPath=some.jar spark.executor.extraClassPath=some.jar In client mode you should use

Re: HDFS or NFS as a cache?

2017-10-02 Thread Marcelo Vanzin
You don't need to collect data in the driver to save it. The code in the original question doesn't use "collect()", so it's actually doing a distributed write. On Mon, Oct 2, 2017 at 11:26 AM, JG Perrin wrote: > Steve, > > > > If I refer to the collect() API, it says “Running collect requires mo

Re: Does the builtin hive jars talk of spark to HiveMetaStore(2.1) without any issues?

2017-11-09 Thread Marcelo Vanzin
I'd recommend against using the built-in jars for a different version of Hive. You don't need to build your own Spark; just set spark.sql.hive.metastore.jars / spark.sql.hive.metastore.version (see documentation). On Thu, Nov 9, 2017 at 2:10 AM, yaooqinn wrote: > Hi, all > The builtin hive versio

Re: Programmatically get status of job (WAITING/RUNNING)

2017-12-05 Thread Marcelo Vanzin
SparkLauncher operates at a different layer than Spark applications. It doesn't know about executors or driver or anything, just whether the Spark application was started or not. So it doesn't work for your case. The best option for your case is to install a SparkListener and monitor events. But t

Re: Programmatically get status of job (WAITING/RUNNING)

2017-12-05 Thread Marcelo Vanzin
On Tue, Dec 5, 2017 at 12:43 PM, bsikander wrote: > 2) If I use context.addSparkListener, I can customize the listener but then > I miss the onApplicationStart event. Also, I don't know the Spark's logic to > changing the state of application from WAITING -> RUNNING. I'm not sure I follow you her

Re: Programmatically get status of job (WAITING/RUNNING)

2017-12-07 Thread Marcelo Vanzin
On Thu, Dec 7, 2017 at 11:40 AM, bsikander wrote: > For example, if an application wanted 4 executors > (spark.executor.instances=4) but the spark cluster can only provide 1 > executor. This means that I will only receive 1 onExecutorAdded event. Will > the application state change to RUNNING (eve

Re: Programmatically get status of job (WAITING/RUNNING)

2017-12-07 Thread Marcelo Vanzin
That's the Spark Master's view of the application. I don't know exactly what it means in the different run modes, I'm more familiar with YARN. But I wouldn't be surprised if, as with others, it mostly tracks the driver's state. On Thu, Dec 7, 2017 at 12:06 PM, bsikander wrote: >

Re: Loading a spark dataframe column into T-Digest using java

2017-12-11 Thread Marcelo Vanzin
The closure in your "foreach" loop runs in a remote executor, no the local JVM, so it's updating its own copy of the t-digest instance. The one on the driver side is never touched. On Sun, Dec 10, 2017 at 10:27 PM, Himasha de Silva wrote: > Hi, > > I want to load a spark dataframe column into T-D

Re: Why do I see five attempts on my Spark application

2017-12-13 Thread Marcelo Vanzin
On Wed, Dec 13, 2017 at 11:21 AM, Toy wrote: > I'm wondering why am I seeing 5 attempts for my Spark application? Does Spark > application restart itself? It restarts itself if it fails (up to a limit that can be configured either per Spark application or globally in YARN). -- Marcelo --

Re: flatMap() returning large class

2017-12-14 Thread Marcelo Vanzin
This sounds like something mapPartitions should be able to do, not sure if there's an easier way. On Thu, Dec 14, 2017 at 10:20 AM, Don Drake wrote: > I'm looking for some advice when I have a flatMap on a Dataset that is > creating and returning a sequence of a new case class > (Seq[BigDataStruc

Re: Is spark-env.sh sourced by Application Master and Executor for Spark on YARN?

2018-01-03 Thread Marcelo Vanzin
On Tue, Jan 2, 2018 at 10:57 PM, John Zhuge wrote: > I am running Spark 2.0.0 and 2.1.1 on YARN in a Hadoop 2.7.3 cluster. Is > spark-env.sh sourced when starting the Spark AM container or the executor > container? No, it's not. -- Marcelo --

Re: Is spark-env.sh sourced by Application Master and Executor for Spark on YARN?

2018-01-03 Thread Marcelo Vanzin
3, 2018 at 9:59 AM, Marcelo Vanzin wrote: >> >> On Tue, Jan 2, 2018 at 10:57 PM, John Zhuge wrote: >> > I am running Spark 2.0.0 and 2.1.1 on YARN in a Hadoop 2.7.3 cluster. Is >> > spark-env.sh sourced when starting the Spark AM container or the >> > executo

Re: Is spark-env.sh sourced by Application Master and Executor for Spark on YARN?

2018-01-04 Thread Marcelo Vanzin
On Wed, Jan 3, 2018 at 8:18 PM, John Zhuge wrote: > Something like: > > Note: When running Spark on YARN, environment variables for the executors > need to be set using the spark.yarn.executorEnv.[EnvironmentVariableName] > property in your conf/spark-defaults.conf file or on the command line. > E

Re: [spark-sql] Custom Query Execution listener via conf properties

2018-02-16 Thread Marcelo Vanzin
According to https://issues.apache.org/jira/browse/SPARK-19558 this feature was added in 2.3. On Fri, Feb 16, 2018 at 12:43 AM, kurian vs wrote: > Hi, > > I was trying to create a custom Query execution listener by extending the > org.apache.spark.sql.util.QueryExecutionListener class. My custom

Re: How to run spark shell using YARN

2018-03-12 Thread Marcelo Vanzin
That's not an error, just a warning. The docs [1] have more info about the config options mentioned in that message. [1] http://spark.apache.org/docs/latest/running-on-yarn.html On Mon, Mar 12, 2018 at 4:42 PM, kant kodali wrote: > Hi All, > > I am trying to use YARN for the very first time. I b

Re: How to run spark shell using YARN

2018-03-12 Thread Marcelo Vanzin
auth.Subject.doAs(Subject.java:422) > 18/03/13 00:19:13 INFO LineBufferedStream: stdout: at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1698) > 18/03/13 00:19:13 INFO LineBufferedStream: stdout: at > org.apache.hadoop.ipc.Server$Handler.run(Server.java:2043

Re: Accessing a file that was passed via --files to spark submit

2018-03-19 Thread Marcelo Vanzin
>From spark-submit -h: --files FILES Comma-separated list of files to be placed in the working directory of each executor. File paths of these files in executors can be accessed via SparkFiles.get(fileName). On Sun, Mar 18,

Re: HadoopDelegationTokenProvider

2018-03-21 Thread Marcelo Vanzin
They should be available in the current user. UserGroupInformation.getCurrentUser().getCredentials() On Wed, Mar 21, 2018 at 7:32 AM, Jorge Machado wrote: > Hey spark group, > > I want to create a Delegation Token Provider for Accumulo I have One > Question: > > How can I get the token that I ad

Re: Spark logs compression

2018-03-26 Thread Marcelo Vanzin
Log compression is a client setting. Doing that will make new apps write event logs in compressed format. The SHS doesn't compress existing logs. On Mon, Mar 26, 2018 at 9:17 AM, Fawze Abujaber wrote: > Hi All, > > I'm trying to compress the logs at SPark history server, i added > spark.eventLog

Re: Spark logs compression

2018-03-26 Thread Marcelo Vanzin
but I don’t , do I > need to perform restart to spark or Yarn? > > On Mon, 26 Mar 2018 at 19:53 Marcelo Vanzin wrote: >> >> Log compression is a client setting. Doing that will make new apps >> write event logs in compressed format. >> >> The SHS doesn't compr

Re: Spark logs compression

2018-03-26 Thread Marcelo Vanzin
t; > On Mon, 26 Mar 2018 at 20:05 Marcelo Vanzin wrote: >> >> If the spark-defaults.conf file in the machine where you're starting >> the Spark app has that config, then that's all that should be needed. >> >> On Mon, Mar 26, 2018 at 10:02 AM, Fawze Abuja

Re: Spark logs compression

2018-03-26 Thread Marcelo Vanzin
story/application_1522085988298_0002.snappy On Mon, Mar 26, 2018 at 10:48 AM, Fawze Abujaber wrote: > I distributed this config to all the nodes cross the cluster and with no > success, new spark logs still uncompressed. > > On Mon, Mar 26, 2018 at 8:12 PM, Marcelo Vanzin wrote: >> &

Re: Spark logs compression

2018-03-26 Thread Marcelo Vanzin
On Mon, Mar 26, 2018 at 11:01 AM, Fawze Abujaber wrote: > Weird, I just ran spark-shell and it's log is comprised but my spark jobs > that scheduled using oozie is not getting compressed. Ah, then it's probably a problem with how Oozie is generating the config for the Spark job. Given your env i

Re: Local dirs

2018-03-26 Thread Marcelo Vanzin
On Mon, Mar 26, 2018 at 1:08 PM, Gauthier Feuillen wrote: > Is there a way to change this value without changing yarn-site.xml ? No. Local dirs are defined by the NodeManager, and Spark cannot override them. -- Marcelo - To un

Re: all spark settings end up being system properties

2018-03-30 Thread Marcelo Vanzin
Why: it's part historical, part "how else would you do it". SparkConf needs to read properties read from the command line, but SparkConf is something that user code instantiates, so we can't easily make it read data from arbitrary locations. You could use thread locals and other tricks, but user c

Re: Spark on Kubernetes (minikube) 2.3 fails with class not found exception

2018-04-10 Thread Marcelo Vanzin
This is the problem: > :/opt/spark/examples/jars/spark-examples_2.11-2.3.0.jar;/opt/spark/examples/jars/spark-examples_2.11-2.3.0.jar Seems like some code is confusing things when mixing OSes. It's using the Windows separator when building a command line ti be run on a Linux host. On Tue, Apr 1

Re: Spark LOCAL mode and external jar (extraClassPath)

2018-04-13 Thread Marcelo Vanzin
There are two things you're doing wrong here: On Thu, Apr 12, 2018 at 6:32 PM, jb44 wrote: > Then I can add the alluxio client library like so: > sparkSession.conf.set("spark.driver.extraClassPath", ALLUXIO_SPARK_CLIENT) First one, you can't modify JVM configuration after it has already started.

Re: Spark launcher listener not getting invoked k8s Spark 2.3

2018-04-30 Thread Marcelo Vanzin
g on > k8 but listener is not getting invoked > > > On Monday, April 30, 2018, Marcelo Vanzin wrote: >> >> I'm pretty sure this feature hasn't been implemented for the k8s backend. >> >> On Mon, Apr 30, 2018 at 4:51 PM, purna m wrote: >> > HI

Re: Spark UI Source Code

2018-05-07 Thread Marcelo Vanzin
On Mon, May 7, 2018 at 1:44 AM, Anshi Shrivastava wrote: > I've found a KVStore wrapper which stores all the metrics in a LevelDb > store. This KVStore wrapper is available as a spark-dependency but we cannot > access the metrics directly from spark since they are all private. I'm not sure what i

Re: Guava dependency issue

2018-05-08 Thread Marcelo Vanzin
Using a custom Guava version with Spark is not that simple. Spark shades Guava, but a lot of libraries Spark uses do not - the main one being all of the Hadoop ones, and they need a quite old Guava. So you have two options: shade/relocate Guava in your application, or use spark.{driver|executor}.u

Re: Spark UI Source Code

2018-05-09 Thread Marcelo Vanzin
spark). Is there a way to fetch data from this > KVStore (which uses levelDb for storage) and filter it on basis on > timestamp? > > Thanks, > Anshi > > On Mon, May 7, 2018 at 9:51 PM, Marcelo Vanzin [via Apache Spark User List] > wrote: >> >> On Mon, May 7,

Re: Submit many spark applications

2018-05-16 Thread Marcelo Vanzin
You can either: - set spark.yarn.submit.waitAppCompletion=false, which will make spark-submit go away once the app starts in cluster mode. - use the (new in 2.3) InProcessLauncher class + some custom Java code to submit all the apps from the same "launcher" process. On Wed, May 16, 2018 at 1:45 P

Re: Encounter 'Could not find or load main class' error when submitting spark job on kubernetes

2018-05-22 Thread Marcelo Vanzin
On Tue, May 22, 2018 at 12:45 AM, Makoto Hashimoto wrote: > local:///usr/local/oss/spark-2.3.0-bin-hadoop2.7/examples/jars/spark-examples_2.11-2.3.0.jar Is that the path of the jar inside your docker image? The default image puts that in /opt/spark IIRC. -- Marcelo

Re: Submit many spark applications

2018-05-23 Thread Marcelo Vanzin
On Wed, May 23, 2018 at 12:04 PM, raksja wrote: > So InProcessLauncher wouldnt use the native memory, so will it overload the > mem of parent process? I will still use "native memory" (since the parent process will still use memory), just less of it. But yes, it will use more memory in the parent

Re: Submit many spark applications

2018-05-25 Thread Marcelo Vanzin
That's what Spark uses. On Fri, May 25, 2018 at 10:09 AM, raksja wrote: > thanks for the reply. > > Have you tried submit a spark job directly to Yarn using YarnClient. > https://hadoop.apache.org/docs/r2.6.0/api/org/apache/hadoop/yarn/client/api/YarnClient.html > > Not sure whether its performan

Re: Submit many spark applications

2018-05-25 Thread Marcelo Vanzin
On Fri, May 25, 2018 at 10:18 AM, raksja wrote: > InProcessLauncher would just start a subprocess as you mentioned earlier. No. As the name says, it runs things in the same process. -- Marcelo - To unsubscribe e-mail: user-uns

Re: Submit many spark applications

2018-05-25 Thread Marcelo Vanzin
I already gave my recommendation in my very first reply to this thread... On Fri, May 25, 2018 at 10:23 AM, raksja wrote: > ok, when to use what? > do you have any recommendation? > > > > -- > Sent from: http://apache-spark-user-list.1001560.n3.nabble.com/ > >

Re: [SparkLauncher] stateChanged event not received in standalone cluster mode

2018-06-06 Thread Marcelo Vanzin
That feature has not been implemented yet. https://issues.apache.org/jira/browse/SPARK-11033 On Wed, Jun 6, 2018 at 5:18 AM, Behroz Sikander wrote: > I have a client application which launches multiple jobs in Spark Cluster > using SparkLauncher. I am using Standalone cluster mode. Launching jobs

[ANNOUNCE] Announcing Apache Spark 2.3.1

2018-06-11 Thread Marcelo Vanzin
We are happy to announce the availability of Spark 2.3.1! Apache Spark 2.3.1 is a maintenance release, based on the branch-2.3 maintenance branch of Spark. We strongly recommend all 2.3.x users to upgrade to this stable release. To download Spark 2.3.1, head over to the download page: http://spar

Re: Spark user classpath setting

2018-06-14 Thread Marcelo Vanzin
I only know of a way to do that with YARN. You can distribute the jar files using "--files" and add just their names (not the full path) to the "extraClassPath" configs. You don't need "userClassPathFirst" in that case. On Thu, Jun 14, 2018 at 1:28 PM, Arjun kr wrote: > Hi All, > > > I am trying

Re: Issue upgrading to Spark 2.3.1 (Maintenance Release)

2018-06-15 Thread Marcelo Vanzin
I'm not familiar with PyCharm. But if you can run "pyspark" from the command line and not hit this, then this might be an issue with PyCharm or your environment - e.g. having an old version of the pyspark code around, or maybe PyCharm itself might need to be updated. On Thu, Jun 14, 2018 at 10:01

Re: deploy-mode cluster. FileNotFoundException

2018-09-05 Thread Marcelo Vanzin
See SPARK-4160. Long story short: you need to upload the files and jars to some shared storage (like HDFS) manually. On Wed, Sep 5, 2018 at 2:17 AM Guillermo Ortiz Fernández wrote: > > I'm using standalone cluster and the final command I'm trying is: > spark-submit --verbose --deploy-mode cluster

Re: Specifying different version of pyspark.zip and py4j files on worker nodes with Spark pre-installed

2018-10-04 Thread Marcelo Vanzin
Normally the version of Spark installed on the cluster does not matter, since Spark is uploaded from your gateway machine to YARN by default. You probably have some configuration (in spark-defaults.conf) that tells YARN to use a cached copy. Get rid of that configuration, and you can use whatever

Re: Specifying different version of pyspark.zip and py4j files on worker nodes with Spark pre-installed

2018-10-04 Thread Marcelo Vanzin
.bashrc , for the > pre-installed 2.2.1 path. > > I don't want to make any changes to worker node configuration, so any way to > override the order? > > Jianshi > > On Fri, Oct 5, 2018 at 12:11 AM Marcelo Vanzin wrote: >> >> Normally the version of Spark in

Re: Specifying different version of pyspark.zip and py4j files on worker nodes with Spark pre-installed

2018-10-05 Thread Marcelo Vanzin
ps://github.com/apache/spark/blob/88e7e87bd5c052e10f52d4bb97a9d78f5b524128/core/src/main/scala/org/apache/spark/api/python/PythonUtils.scala#L31 >> > >> > The code shows Spark will try to find the path if SPARK_HOME is specified. >> > And on my worker node, SPARK_HOME is s

Re: kerberos auth for MS SQL server jdbc driver

2018-10-15 Thread Marcelo Vanzin
Spark only does Kerberos authentication on the driver. For executors it currently only supports Hadoop's delegation tokens for Kerberos. To use something that does not support delegation tokens you have to manually manage the Kerberos login in your code that runs in executors, which might be trick

Re: [Spark UI] Spark 2.3.1 UI no longer respects spark.ui.retainedJobs

2018-10-20 Thread Marcelo Vanzin
On Tue, Oct 16, 2018 at 9:34 AM Patrick Brown wrote: > I recently upgraded to spark 2.3.1 I have had these same settings in my spark > submit script, which worked on 2.0.2, and according to the documentation > appear to not have changed: > > spark.ui.retainedTasks=1 > spark.ui.retainedStages=1 >

Re: [Spark UI] Spark 2.3.1 UI no longer respects spark.ui.retainedJobs

2018-10-22 Thread Marcelo Vanzin
Just tried on 2.3.2 and worked fine for me. UI had a single job and a single stage (+ the tasks related to that single stage), same thing in memory (checked with jvisualvm). On Sat, Oct 20, 2018 at 6:45 PM Marcelo Vanzin wrote: > > On Tue, Oct 16, 2018 at 9:34 AM Patrick Brown > wro

Re: [Spark UI] Spark 2.3.1 UI no longer respects spark.ui.retainedJobs

2018-10-24 Thread Marcelo Vanzin
; import scala.concurrent._ > scala> import scala.concurrent.ExecutionContext.Implicits.global > scala> for (i <- 0 until 5) { Future { println(sc.parallelize(0 until > i).collect.length) } } > > On Mon, Oct 22, 2018 at 11:25 AM Marcelo Vanzin wrote: >> >> Just tried on 2.3.2 and wor

Re: [Spark UI] Spark 2.3.1 UI no longer respects spark.ui.retainedJobs

2018-10-25 Thread Marcelo Vanzin
as my production application continues to submit jobs every once in a while, > the issue persists. > > On Wed, Oct 24, 2018 at 5:05 PM Marcelo Vanzin wrote: >> >> When you say many jobs at once, what ballpark are you talking about? >> >> The code in 2.3+ does try

Re: [ANNOUNCE] Announcing Apache Spark 2.4.0

2018-11-08 Thread Marcelo Vanzin
+user@ >> -- Forwarded message - >> From: Wenchen Fan >> Date: Thu, Nov 8, 2018 at 10:55 PM >> Subject: [ANNOUNCE] Announcing Apache Spark 2.4.0 >> To: Spark dev list >> >> >> Hi all, >> >> Apache Spark 2.4.0 is the fifth release in the 2.x line. This release adds >> Barrier Exe

Re: Custom Metric Sink on Executor Always ClassNotFound

2018-12-20 Thread Marcelo Vanzin
First, it's really weird to use "org.apache.spark" for a class that is not in Spark. For executors, the jar file of the sink needs to be in the system classpath; the application jar is not in the system classpath, so that does not work. There are different ways for you to get it there, most of the

Re: How to reissue a delegated token after max lifetime passes for a spark streaming application on a Kerberized cluster

2019-01-03 Thread Marcelo Vanzin
If you are using the principal / keytab params, Spark should create tokens as needed. If it's not, something else is going wrong, and only looking at full logs for the app would help. On Wed, Jan 2, 2019 at 5:09 PM Ali Nazemian wrote: > > Hi, > > We are using a headless keytab to run our long-runn

Re: How to reissue a delegated token after max lifetime passes for a spark streaming application on a Kerberized cluster

2019-01-03 Thread Marcelo Vanzin
ing with “kms-dt”. > > > > Anyone knows why this is happening ? Any suggestion to make it working > with KMS ? > > > > Thanks > > > > > > > > [image: cid:image001.jpg@01D41D15.E01B6F00] > > *Paolo Platter* > > *CTO* > > E-mail: p

Re: How to force-quit a Spark application?

2019-01-15 Thread Marcelo Vanzin
You should check the active threads in your app. Since your pool uses non-daemon threads, that will prevent the app from exiting. spark.stop() should have stopped the Spark jobs in other threads, at least. But if something is blocking one of those threads, or if something is creating a non-daemon

Re: How to force-quit a Spark application?

2019-01-16 Thread Marcelo Vanzin
ed many ways to > exit the spark (e.g., System.exit()), but failed. Is there an explicit way to > shutdown all the alive threads in the spark application and then quit > afterwards? > > > On Tue, Jan 15, 2019 at 2:38 PM Marcelo Vanzin wrote: >> >> You should check th

Re: How to force-quit a Spark application?

2019-01-16 Thread Marcelo Vanzin
at org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:197) > at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:227) > at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:136) > at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala) > > "

Re: How to force-quit a Spark application?

2019-01-24 Thread Marcelo Vanzin
Hi, On Tue, Jan 22, 2019 at 11:30 AM Pola Yao wrote: > "Thread-1" #19 prio=5 os_prio=0 tid=0x7f9b6828e800 nid=0x77cb waiting on > condition [0x7f9a123e3000] >java.lang.Thread.State: WAITING (parking) > at sun.misc.Unsafe.park(Native Method) > - parking to wait for <0

Re: Multiple context in one Driver

2019-03-14 Thread Marcelo Vanzin
It doesn't work (except if you're extremely lucky), it will eat your lunch and will also kick your dog. And it's not even going to be an option in the next version of Spark. On Wed, Mar 13, 2019 at 11:38 PM Ido Friedman wrote: > > Hi, > > I am researching the use of multiple sparkcontext in one

Re: RPC timeout error for AES based encryption between driver and executor

2019-03-26 Thread Marcelo Vanzin
I don't think "spark.authenticate" works properly with k8s in 2.4 (which would make it impossible to enable encryption since it requires authentication). I'm pretty sure I fixed it in master, though. On Tue, Mar 26, 2019 at 2:29 AM Sinha, Breeta (Nokia - IN/Bangalore) wrote: > > Hi All, > > > > W

Re: spark.submit.deployMode: cluster

2019-03-26 Thread Marcelo Vanzin
If you're not using spark-submit, then that option does nothing. If by "context creation API" you mean "new SparkContext()" or an equivalent, then you're explicitly creating the driver inside your application. On Tue, Mar 26, 2019 at 1:56 PM Pat Ferrel wrote: > > I have a server that starts a Sp

Re: Is it possible to obtain the full command to be invoked by SparkLauncher?

2019-04-24 Thread Marcelo Vanzin
Setting the SPARK_PRINT_LAUNCH_COMMAND env variable to 1 in the launcher env will make Spark code print the command to stderr. Not optimal but I think it's the only current option. On Wed, Apr 24, 2019 at 1:55 PM Jeff Evans wrote: > > The org.apache.spark.launcher.SparkLauncher is used to constru

Re: Is it possible to obtain the full command to be invoked by SparkLauncher?

2019-04-24 Thread Marcelo Vanzin
BTW the SparkLauncher API has hooks to capture the stderr of the spark-submit process into the logging system of the parent process. Check the API javadocs since it's been forever since I looked at that. On Wed, Apr 24, 2019 at 1:58 PM Marcelo Vanzin wrote: > >

Re: Issues with Apache Spark tgz file

2019-12-30 Thread Marcelo Vanzin
That first URL is not the file. It's a web page with links to the file in different mirrors. I just looked at the actual file in one of the mirrors and it looks fine. On Mon, Dec 30, 2019 at 1:34 PM rsinghania wrote: > > Hi, > > I'm trying to open the file > https://www.apache.org/dyn/closer.lua/

Re: Discourse: A proposed alternative to the Spark User list

2015-01-22 Thread Marcelo Vanzin
On Thu, Jan 22, 2015 at 10:21 AM, Sean Owen wrote: > I think a Spark site would have a lot less traffic. One annoyance is > that people can't figure out when to post on SO vs Data Science vs > Cross Validated. Another is that a lot of the discussions we see on the Spark users list would be closed

Re: Spark on Windows 2008 R2 serv er does not work

2015-01-28 Thread Marcelo Vanzin
https://issues.apache.org/jira/browse/SPARK-2356 Take a look through the comments, there are some workarounds listed there. On Wed, Jan 28, 2015 at 1:40 PM, Wang, Ningjun (LNG-NPV) wrote: > Has anybody successfully install and run spark-1.2.0 on windows 2008 R2 or > windows 7? How do you get tha

Re: Spark SQL - Unable to use Hive UDF because of ClassNotFoundException

2015-01-30 Thread Marcelo Vanzin
Hi Capitão, Since you're using CDH, your question is probably more appropriate for the cdh-u...@cloudera.org list. The problem you're seeing is most probably an artifact of the way CDH is currently packaged. You have to add Hive jars manually to you Spark app's classpath if you want to use the Hi

Re: “mapreduce.job.user.classpath.first” for Spark

2015-02-04 Thread Marcelo Vanzin
Hi Corey, When you run on Yarn, Yarn's libraries are placed in the classpath, and they have precedence over your app's. So, with Spark 1.2, you'll get Guava 11 in your classpath (with Spark 1.1 and earlier you'd get Guava 14 from Spark, so still a problem for you). Right now, the option Markus me

Re: “mapreduce.job.user.classpath.first” for Spark

2015-02-04 Thread Marcelo Vanzin
Hi Koert, On Wed, Feb 4, 2015 at 11:35 AM, Koert Kuipers wrote: > do i understand it correctly that on yarn the the customer jars are truly > placed before the yarn and spark jars on classpath? meaning at container > construction time, on the same classloader? that would be great news for me. > i

Re: “mapreduce.job.user.classpath.first” for Spark

2015-02-04 Thread Marcelo Vanzin
Hi Corey, On Wed, Feb 4, 2015 at 12:44 PM, Corey Nolet wrote: >> Another suggestion is to build Spark by yourself. > > I'm having trouble seeing what you mean here, Marcelo. Guava is already > shaded to a different package for the 1.2.0 release. It shouldn't be causing > conflicts. That wasn't m

Re: “mapreduce.job.user.classpath.first” for Spark

2015-02-04 Thread Marcelo Vanzin
On Wed, Feb 4, 2015 at 1:12 PM, Koert Kuipers wrote: > about putting stuff on classpath before spark or yarn... yeah you can shoot > yourself in the foot with it, but since the container is isolated it should > be ok, no? we have been using HADOOP_USER_CLASSPATH_FIRST forever with great > success.

Re: Will Spark serialize an entire Object or just the method referred in an object?

2015-02-09 Thread Marcelo Vanzin
`func1` and `func2` never get serialized. They must exist on the other end in the form of a class loaded by the JVM. What gets serialized is an instance of a particular closure (the argument to your "map" function). That's a separate class. The instance of that class that is serialized contains re

Re: Will Spark serialize an entire Object or just the method referred in an object?

2015-02-10 Thread Marcelo Vanzin
instance we would > load a new instance containing the func1 and func2 from jars that are > already cached into local nodes? > > Thanks, > Yitong > > 2015-02-09 14:35 GMT-08:00 Marcelo Vanzin : > >> `func1` and `func2` never get serialized. They must exist on the othe

Re: How to log using log4j to local file system inside a Spark application that runs on YARN?

2015-02-11 Thread Marcelo Vanzin
For Yarn, you need to upload your log4j.properties separately from your app's jar, because of some internal issues that are too boring to explain here. :-) Basically: spark-submit --master yarn --files log4j.properties blah blah blah Having to keep it outside your app jar is sub-optimal, and I

Re: Class loading issue, spark.files.userClassPathFirst doesn't seem to be working

2015-02-18 Thread Marcelo Vanzin
Hello, On Tue, Feb 17, 2015 at 8:53 PM, dgoldenberg wrote: > I've tried setting spark.files.userClassPathFirst to true in SparkConf in my > program, also setting it to true in $SPARK-HOME/conf/spark-defaults.conf as Is the code in question running on the driver or in some executor? spark.files.

Re: issue Running Spark Job on Yarn Cluster

2015-02-19 Thread Marcelo Vanzin
You'll need to look at your application's logs. You can use "yarn logs --applicationId [id]" to see them. On Wed, Feb 18, 2015 at 2:39 AM, sachin Singh wrote: > Hi, > I want to run my spark Job in Hadoop yarn Cluster mode, > I am using below command - > spark-submit --master yarn-cluster --driver

Re: output worker stdout to one place

2015-02-20 Thread Marcelo Vanzin
Hi Anny, You could play with creating your own log4j.properties that will write the output somewhere else (e.g. to some remote mount, or remote syslog). Sorry, but I don't have an example handy. Alternatively, if you can use Yarn, it will collect all logs after the job is finished and make them a

Re: Upgrade to Spark 1.2.1 using Guava

2015-02-25 Thread Marcelo Vanzin
Guava is not in Spark. (Well, long version: it's in Spark but it's relocated to a different package except for some special classes leaked through the public API.) If your app needs Guava, it needs to package Guava with it (e.g. by using maven-shade-plugin, or using "--jars" if only executors use

Re: Spark excludes "fastutil" dependencies we need

2015-02-26 Thread Marcelo Vanzin
On Wed, Feb 25, 2015 at 8:42 PM, Jim Kleckner wrote: > So, should the userClassPathFirst flag work and there is a bug? Sorry for jumping in the middle of conversation (and probably missing some of it), but note that this option applies only to executors. If you're trying to use the class in your

Re: Error: no snappyjava in java.library.path

2015-02-26 Thread Marcelo Vanzin
Hi Dan, This is a CDH issue, so I'd recommend using cdh-u...@cloudera.org for those questions. This is an issue with fixed in recent CM 5.3 updates; if you're not using CM, or want a workaround, you can manually configure "spark.driver.extraLibraryPath" and "spark.executor.extraLibraryPath" to in

Re: Is SPARK_CLASSPATH really deprecated?

2015-02-26 Thread Marcelo Vanzin
SPARK_CLASSPATH is definitely deprecated, but my understanding is that spark.executor.extraClassPath is not, so maybe the documentation needs fixing. I'll let someone who might know otherwise comment, though. On Thu, Feb 26, 2015 at 2:43 PM, Kannan Rajah wrote: > SparkConf.scala logs a warning s

Re: Is SPARK_CLASSPATH really deprecated?

2015-02-26 Thread Marcelo Vanzin
On Thu, Feb 26, 2015 at 5:12 PM, Kannan Rajah wrote: > Also, I would like to know if there is a localization overhead when we use > spark.executor.extraClassPath. Again, in the case of hbase, these jars would > be typically available on all nodes. So there is no need to localize them > from the no

Re: Upgrade to Spark 1.2.1 using Guava

2015-02-27 Thread Marcelo Vanzin
Broadcast.scala:177) > at org.apache.spark.util.Utils$.tryOrIOException(Utils.scala:1090) > ... 19 more > > root eror ====== > Caused by: java.lang.ClassNotFoundException: > com.google.common.collect.HashBiMap > at java.net.URLClassLoader$1.run(URL

Re: Upgrade to Spark 1.2.1 using Guava

2015-02-27 Thread Marcelo Vanzin
On Fri, Feb 27, 2015 at 1:30 PM, Pat Ferrel wrote: > @Marcelo do you mean by modifying spark.executor.extraClassPath on all > workers, that didn’t seem to work? That's an app configuration, not a worker configuration, so if you're trying to set it on the worker configuration it will definitely no

Re: Upgrade to Spark 1.2.1 using Guava

2015-02-27 Thread Marcelo Vanzin
On Fri, Feb 27, 2015 at 1:42 PM, Pat Ferrel wrote: > I changed in the spark master conf, which is also the only worker. I added a > path to the jar that has guava in it. Still can’t find the class. Sorry, I'm still confused about what config you're changing. I'm suggesting using: spark-submit -

Re: Is SPARK_CLASSPATH really deprecated?

2015-03-02 Thread Marcelo Vanzin
ll let us add logic inside get_hbase_jars_for_cp function to pick the >> right version hbase jars. There could be multiple versions installed on the >> node. >> >> >> >> -- >> Kannan >> >> On Thu, Feb 26, 2015 at 6:08 PM, Marcelo Vanzin wrote: &

Re: Spark UI and running spark-submit with --master yarn

2015-03-02 Thread Marcelo Vanzin
What are you calling ""? In yarn-cluster mode, the driver is running somewhere in your cluster, not on the machine where you run spark-submit. The easiest way to get to the Spark UI when using Yarn is to use the Yarn RM's web UI. That will give you a link to the application's UI regardless of whet

Re: Spark UI and running spark-submit with --master yarn

2015-03-02 Thread Marcelo Vanzin
> What am I missing here ? > Thanks a lot for the help > -AJ > > > On Mon, Mar 2, 2015 at 3:50 PM, Marcelo Vanzin wrote: >> >> What are you calling ""? In yarn-cluster mode, the driver >> is running somewhere in your cluster, not on the machine where

Re: Spark UI and running spark-submit with --master yarn

2015-03-02 Thread Marcelo Vanzin
com:9026 shows > me all the applications. > Do I have to do anything for the port 8088 or whatever I am seeing at 9026 > port is good .Attached is screenshot . > Thanks > AJ > > On Mon, Mar 2, 2015 at 4:24 PM, Marcelo Vanzin wrote: >> >> That's the R

Re: Spark Monitoring UI for Hadoop Yarn Cluster

2015-03-03 Thread Marcelo Vanzin
Spark applications shown in the RM's UI should have an "Application Master" link when they're running. That takes you to the Spark UI for that application where you can see all the information you're looking for. If you're running a history server and add "spark.yarn.historyServer.address" to your

Re: ImportError: No module named iter ... (on CDH5 v1.2.0+cdh5.3.2+369-1.cdh5.3.2.p0.17.el6.noarch) ...

2015-03-03 Thread Marcelo Vanzin
Weird python errors like this generally mean you have different versions of python in the nodes of your cluster. Can you check that? On Tue, Mar 3, 2015 at 4:21 PM, subscripti...@prismalytics.io wrote: > Hi Friends: > > We noticed the following in 'pyspark' happens when running in distributed > S

Re: Spark Monitoring UI for Hadoop Yarn Cluster

2015-03-04 Thread Marcelo Vanzin
On Wed, Mar 4, 2015 at 10:08 AM, Srini Karri wrote: > spark.executor.extraClassPath > D:\\Apache\\spark-1.2.1-bin-hadoop2\\spark-1.2.1-bin-hadoop2.4\\bin\\classes > spark.eventLog.dir > D:/Apache/spark-1.2.1-bin-hadoop2/spark-1.2.1-bin-hadoop2.4/bin/tmp/spark-events > spark.history.fs.logDirectory

Re: Issues with maven dependencies for version 1.2.0 but not version 1.1.0

2015-03-04 Thread Marcelo Vanzin
Seems like someone set up "m2.mines.com" as a mirror in your pom file or ~/.m2/settings.xml, and it doesn't mirror Spark 1.2 (or does but is in a messed up state). On Wed, Mar 4, 2015 at 3:49 PM, kpeng1 wrote: > Hi All, > > I am currently having problem with the maven dependencies for version 1.2

Re: Issues with maven dependencies for version 1.2.0 but not version 1.1.0

2015-03-04 Thread Marcelo Vanzin
a mirror, but 1.1.0 works >>> properly, while 1.2.0 does not. I suspect there is crc in the 1.2.0 pom >>> file. >>> >>> On Wed, Mar 4, 2015 at 4:10 PM, Marcelo Vanzin >>> wrote: >>>> >>>> Seems like someone set up "m

Re: Spark Build with Hadoop 2.6, yarn - encounter java.lang.NoClassDefFoundError: org/codehaus/jackson/map/deser/std/StdDeserializer

2015-03-05 Thread Marcelo Vanzin
It seems from the excerpt below that your cluster is set up to use the Yarn ATS, and the code is failing in that path. I think you'll need to apply the following patch to your Spark sources if you want this to work: https://github.com/apache/spark/pull/3938 On Thu, Mar 5, 2015 at 10:04 AM, Todd N

<    1   2   3   4   5   6   >