Vote on Dynamic resource allocation for structured streaming [SPARK-24815]

2024-02-23 Thread Aaron Kern
+1

Re: [VOTE] Release Apache Spark 1.6.0 (RC4)

2015-12-22 Thread Aaron Davidson
+1 On Tue, Dec 22, 2015 at 7:01 PM, Josh Rosen wrote: > +1 > > On Tue, Dec 22, 2015 at 7:00 PM, Jeff Zhang wrote: > >> +1 >> >> On Wed, Dec 23, 2015 at 7:36 AM, Mark Hamstra >> wrote: >> >>> +1 >>> >>> On Tue, Dec 22, 2015 at 12:10 PM, Michael Armbrust < >>> mich...@databricks.com> wrote: >>>

Re: Update to Spar Mesos docs possibly? LIBPROCESS_IP needs to be set for client mode

2015-12-16 Thread Aaron
Wrt to PR, sure, let me update the documentation, i'll send it out shortly. My Fork is on Github..is the PR from there ok? Cheers, Aaron On Wed, Dec 16, 2015 at 11:33 AM, Timothy Chen wrote: > Yes if want to manually override what IP to use to be contacted by the > master

Re: Update to Spar Mesos docs possibly? LIBPROCESS_IP needs to be set for client mode

2015-12-16 Thread Aaron
t, some kind of documentation about this possible issue would have saved me some time. Cheers, Aaron On Wed, Dec 16, 2015 at 11:07 AM, Aaron wrote: > Found this thread that talked about it to help understand it better: > > https://mail-archives.apache.org/mod_mbox/mesos-user/201507.mbox/%3

Re: Update to Spar Mesos docs possibly? LIBPROCESS_IP needs to be set for client mode

2015-12-16 Thread Aaron
tname/ip in mesos configuration - see Nikolaos answer > Cheers, Aaron On Wed, Dec 16, 2015 at 11:00 AM, Iulian DragoČ™ wrote: > Hi Aaron, > > I never had to use that variable. What is it for? > > On Wed, Dec 16, 2015 at 2:00 PM, Aaron wrote: >> >> In going through runnin

Re: Update to Spar Mesos docs possibly? LIBPROCESS_IP needs to be set for client mode

2015-12-16 Thread Aaron
On Wed, Dec 16, 2015 at 8:00 AM, Aaron wrote: > In going through running various Spark jobs, both Spark 1.5.2 and the > new Spark 1.6 SNAPSHOTs, on a Mesos cluster (currently 0.25), we > noticed that is in order to run the Spark shells (both python and > scala), we needed to set the LIB

Update to Spar Mesos docs possibly? LIBPROCESS_IP needs to be set for client mode

2015-12-16 Thread Aaron
the Spark on Mesos docs should be updated, under the Client Mode section, to include setting this environment variable? Cheers Aaron - To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org For additional commands, e-mail: dev-h

-Phive-thriftserver when compiling for use in pyspark and JDBC connections

2015-07-21 Thread Aaron
w the hive-thriftserver module plays into this type of interaction. Thanks in advance. Cheers, Aaron - To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org For additional commands, e-mail: dev-h...@spark.apache.org

Re: External Shuffle service over yarn

2015-06-26 Thread Aaron Davidson
A second advantage is that it allows individual Executors to go into GC pause (or even crash) and still allow other Executors to read shuffle data and make progress, which tends to improve stability of memory-intensive jobs. On Thu, Jun 25, 2015 at 11:42 PM, Sandy Ryza wrote: > Hi Yash, > > One

Re: hadoop input/output format advanced control

2015-03-25 Thread Aaron Davidson
Should we mention that you should synchronize on HadoopRDD.CONFIGURATION_INSTANTIATION_LOCK to avoid a possible race condition in cloning Hadoop Configuration objects prior to Hadoop 2.7.0? :) On Wed, Mar 25, 2015 at 7:16 PM, Patrick Wendell wrote: > Great - that's even easier. Maybe we could ha

Re: enum-like types in Spark

2015-03-23 Thread Aaron Davidson
The only issue I knew of with Java enums was that it does not appear in the Scala documentation. On Mon, Mar 23, 2015 at 1:46 PM, Sean Owen wrote: > Yeah the fully realized #4, which gets back the ability to use it in > switch statements (? in Scala but not Java?) does end up being kind of > hug

Re: Block Transfer Service encryption support

2015-03-16 Thread Aaron Davidson
Out of curiosity, why could we not use Netty's SslHandler injected into the TransportContext pipeline? On Mon, Mar 16, 2015 at 7:56 PM, turp1twin wrote: > Hey Patrick, > > Sorry for the delay, I was at Elastic{ON} last week and well, my day job > has > been keeping me busy... I went ahead and op

Re: enum-like types in Spark

2015-03-16 Thread Aaron Davidson
;>>> >>>> sealed abstract class StorageLevel // cannot be a trait >>>> >>>> object StorageLevel { >>>>private[this] case object _MemoryOnly extends StorageLevel >>>>final val MemoryOnly: StorageLevel = _MemoryOnly >>

Re: enum-like types in Spark

2015-03-09 Thread Aaron Davidson
is the > > >> minimal code I found to make everything show up correctly in both > > >> Scala and Java: > > >> > > >> sealed abstract class StorageLevel // cannot be a trait > > >> > > >> object StorageLevel { > > >> private[this] case obje

Re: Which OutputCommitter to use for S3?

2015-03-05 Thread Aaron Davidson
one runs into the same problem I had. > >> > >> By setting --hadoop-major-version=2 when using the ec2 scripts, > >> everything worked fine. > >> > >> Darin. > >> > >> > >> - Original Message - > >> From: Darin McBeath &

Re: enum-like types in Spark

2015-03-04 Thread Aaron Davidson
agree with Aaron's suggestion. > > > > - Patrick > > > > On Wed, Mar 4, 2015 at 6:07 PM, Aaron Davidson > wrote: > >> I'm cool with #4 as well, but make sure we dictate that the values > should > >> be defined within an object with the sa

Re: enum-like types in Spark

2015-03-04 Thread Aaron Davidson
I'm cool with #4 as well, but make sure we dictate that the values should be defined within an object with the same name as the enumeration (like we do for StorageLevel). Otherwise we may pollute a higher namespace. e.g. we SHOULD do: trait StorageLevel object StorageLevel { case object MemoryO

Re: [Performance] Possible regression in rdd.take()?

2015-02-18 Thread Aaron Davidson
You might be seeing the result of this patch: https://github.com/apache/spark/commit/d069c5d9d2f6ce06389ca2ddf0b3ae4db72c5797 which was introduced in 1.1.1. This patch disabled the ability for take() to run without launching a Spark job, which means that the latency is significantly increased for

Re: Custom Cluster Managers / Standalone Recovery Mode in Spark

2015-02-01 Thread Aaron Davidson
For the specific question of supplementing Standalone Mode with a custom leader election protocol, this was actually already committed in master and will be available in Spark 1.3: https://github.com/apache/spark/pull/771/files You can specify spark.deploy.recoveryMode = "CUSTOM" and spark.deploy

Re: Semantics of LGTM

2015-01-17 Thread Aaron Davidson
I think I've seen something like +2 = "strong LGTM" and +1 = "weak LGTM; someone else should review" before. It's nice to have a shortcut which isn't a sentence when talking about weaker forms of LGTM. On Sat, Jan 17, 2015 at 6:59 PM, wrote: > I think clarifying these semantics is definitely wor

Re: OutOfMemoryError on parquet SnappyDecompressor

2014-09-23 Thread Aaron Davidson
This may be related: https://github.com/Parquet/parquet-mr/issues/211 Perhaps if we change our configuration settings for Parquet it would get better, but the performance characteristics of Snappy are pretty bad here under some circumstances. On Tue, Sep 23, 2014 at 10:13 AM, Cody Koeninger wrot

Your Weekly GPGold Offer Is Waiting Inside

2014-09-06 Thread Aaron Babcock
http://maxigas.rrzconsulting.com/ssnfxezj/fpxihvkqhbkjlwgz.upbuevikyiirz

Re: Configuring Spark Memory

2014-07-24 Thread Aaron Davidson
artin Goodson wrote: > Great - thanks for the clarification Aaron. The offer stands for me to > write some documentation and an example that covers this without leaving > *any* room for ambiguity. > > > > > -- > Martin Goodson | VP Data Science > (0)20 3397 1240 >

Re: Configuring Spark Memory

2014-07-24 Thread Aaron Davidson
ne mode? This is after having closely >>>>> read the documentation several times: >>>>> >>>>> *http://spark.apache.org/docs/latest/configuration.html >>>>> <http://spark.apache.org/docs/latest/configuration.html>* >>>>&g

Re: ec2 clusters launched at 9fe693b5b6 are broken (?)

2014-07-14 Thread Aaron Davidson
This one is typically due to a mismatch between the Hadoop versions -- i.e., Spark is compiled against 1.0.4 but is running with 2.3.0 in the classpath, or something like that. Not certain why you're seeing this with spark-ec2, but I'm assuming this is related to the issues you posted in a separate

Re: Profiling Spark tests with YourKit (or something else)

2014-07-14 Thread Aaron Davidson
age - > > From: "Aaron Davidson" > > To: dev@spark.apache.org > > Sent: Monday, July 14, 2014 5:21:10 PM > > Subject: Re: Profiling Spark tests with YourKit (or something else) > > > > Out of curiosity, what problems are you seeing with Utils.getCallSite?

Re: better compression codecs for shuffle blocks?

2014-07-14 Thread Aaron Davidson
One of the core problems here is the number of open streams we have, which is (# cores * # reduce partitions), which can easily climb into the tens of thousands for large jobs. This is a more general problem that we are planning on fixing for our largest shuffles, as even moderate buffer sizes can

Re: Reproducible deadlock in 1.0.1, possibly related to Spark-1097

2014-07-14 Thread Aaron Davidson
that tries > to > > >> > mutate the configuration, then I could see how we might still have > the > > >> > ConcurrentModificationException. > > >> > > > >> > I looked at your patch for HADOOP-10456 and the only example you > give > &g

Re: Profiling Spark tests with YourKit (or something else)

2014-07-14 Thread Aaron Davidson
Out of curiosity, what problems are you seeing with Utils.getCallSite? On Mon, Jul 14, 2014 at 2:59 PM, Will Benton wrote: > Thanks, Matei; I have also had some success with jmap and friends and will > probably just stick with them! > > > best, > wb > > > - Original Message - > > From:

Re: Reproducible deadlock in 1.0.1, possibly related to Spark-1097

2014-07-14 Thread Aaron Davidson
The full jstack would still be useful, but our current working theory is that this is due to the fact that Configuration#loadDefaults goes through every Configuration object that was ever created (via Configuration.REGISTRY) and locks it, thus introducing a dependency from new Configuration to old,

Re: ExecutorState.LOADING?

2014-07-09 Thread Aaron Davidson
Agreed that the behavior of the Master killing off an Application when Executors from the same set of nodes repeatedly die is silly. This can also strike if a single node enters a state where any Executor created on it quickly dies (e.g., a block device becomes faulty). This prevents the Applicatio

Re: on shark, is tachyon less efficient than memory_only cache strategy ?

2014-07-08 Thread Aaron Davidson
Shark's in-memory format is already serialized (it's compressed and column-based). On Tue, Jul 8, 2014 at 9:50 AM, Mridul Muralidharan wrote: > You are ignoring serde costs :-) > > - Mridul > > On Tue, Jul 8, 2014 at 8:48 PM, Aaron Davidson wrote: > > Tachyon

Re: on shark, is tachyon less efficient than memory_only cache strategy ?

2014-07-08 Thread Aaron Davidson
Tachyon should only be marginally less performant than memory_only, because we mmap the data from Tachyon's ramdisk. We do not have to, say, transfer the data over a pipe from Tachyon; we can directly read from the buffers in the same way that Shark reads from its in-memory columnar format. On T

Re: task always lost

2014-07-03 Thread Aaron Davidson
e is the log: > >> > >> E0702 10:32:07.599364 14915 slave.cpp:2686] Failed to unmonitor > container > >> for executor 20140616-104524-1694607552-5050-26919-1 of framework > >> 20140702-102939-1694607552-5050-14846-: Not monitored > >> > >> > >

Re: Pass parameters to RDD functions

2014-07-03 Thread Aaron Davidson
Either Serializable works, scala Serializable extends Java's (originally intended a common interface for people who didn't want to run Scala on a JVM). Class fields require the class be serialized along with the object to access. If you declared "val n" inside a method's scope instead, though, we

Re: task always lost

2014-07-01 Thread Aaron Davidson
Can you post the logs from any of the dying executors? On Tue, Jul 1, 2014 at 1:25 AM, qingyang li wrote: > i am using mesos0.19 and spark0.9.0 , the mesos cluster is started, when I > using spark-shell to submit one job, the tasks always lost. here is the > log: > -- > 14/07/01 16:24

Re: Eliminate copy while sending data : any Akka experts here ?

2014-06-30 Thread Aaron Davidson
I don't know of any way to avoid Akka doing a copy, but I would like to mention that it's on the priority list to piggy-back only the map statuses relevant to a particular map task on the task itself, thus reducing the total amount of data sent over the wire by a factor of N for N physical machines

Re: Why does spark REPL not embed scala REPL?

2014-05-30 Thread Aaron Davidson
There's some discussion here as well on just using the Scala REPL for 2.11: http://apache-spark-developers-list.1001551.n3.nabble.com/Spark-on-Scala-2-11-td6506.html#a6523 Matei's response mentions the features we needed to change from the Scala REPL (class-based wrappers and where to output the g

Re: [VOTE] Release Apache Spark 1.0.0 (RC10)

2014-05-22 Thread Aaron Davidson
In Spark 0.9.0 and 0.9.1, we stopped using the FileSystem cache correctly, and we just recently resumed using it in 1.0 (and in 0.9.2) when this issue was fixed: https://issues.apache.org/jira/browse/SPARK-1676 Prior to this fix, each Spark task created and cached its own FileSystems due to a bug

Re: (test)

2014-05-16 Thread Aaron Davidson
No. Only 3 of the responses. On Fri, May 16, 2014 at 10:38 AM, Nishkam Ravi wrote: > Yes. > > > On Fri, May 16, 2014 at 8:40 AM, DB Tsai wrote: > > > Yes. > > On May 16, 2014 8:39 AM, "Andrew Or" wrote: > > > > > Apache has been having some problems lately. Do you guys see this > > message? >

Re: [VOTE] Release Apache Spark 1.0.0 (rc7)

2014-05-16 Thread Aaron Davidson
It was, but due to the apache infra issues, some may not have received the email yet... On Fri, May 16, 2014 at 10:48 AM, Henry Saputra wrote: > Hi Patrick, > > Just want to make sure that VOTE for rc6 also cancelled? > > > Thanks, > > Henry > > On Thu, May 15, 2014 at 1:15 AM, Patrick Wendell >

Re: Tests failed after assembling the latest code from github

2014-04-14 Thread Aaron Davidson
I revert my changes. The test result is same. > > > > I touched the ReplSuite.scala file (use touch command), the test order > is reversed, same as the very beginning. And the output is also the > same.(The result in my first post). > > > > > > -- > > Ye Xianjin > > Sent

Re: Tests failed after assembling the latest code from github

2014-04-14 Thread Aaron Davidson
This may have something to do with running the tests on a Mac, as there is a lot of File/URI/URL stuff going on in that test which may just have happened to work if run on a Linux system (like Jenkins). Note that this suite was added relatively recently: https://github.com/apache/spark/pull/217 O

Re: Contributing to Spark

2014-04-08 Thread Aaron Davidson
Matei's link seems to point to a specific starter project as part of the starter list, but here is the list itself: https://issues.apache.org/jira/issues/?jql=project%20%3D%20SPARK%20AND%20labels%20%3D%20Starter%20AND%20status%20in%20(Open%2C%20%22In%20Progress%22%2C%20Reopened) On Mon, Apr 7, 20

Re: spark config params conventions

2014-03-12 Thread Aaron Davidson
One solution for typesafe config is to use "spark.speculation" = true Typesafe will recognize the key as a string rather than a path, so the name will actually be "\"spark.speculation\"", so you need to handle this contingency when passing the config operations to spark (stripping the quotes from

Re: spark config params conventions

2014-03-12 Thread Aaron Davidson
Should we try to deprecate these types of configs for 1.0.0? We can start by accepting both and giving a warning if you use the old one, and then actually remove them in the next minor release. I think "spark.speculation.enabled=true" is better than "spark.speculation=true", and if we decide to use

Re: Github emails

2014-02-24 Thread Aaron Davidson
By the way, we still need to get our JIRAs migrated over to the Apache system. Unrelated, just... saying. On Mon, Feb 24, 2014 at 10:55 PM, Matei Zaharia wrote: > This is probably a snafu because we had a GitHub hook that was sending > messages to d...@spark.incubator.apache.org, and that list w