ashok34...@yahoo.com.INVALID wrote:
Is it possible to use Spark docker built on GCP on AWS without
rebuilding from new on AWS?
I am using the spark image from bitnami for running on k8s.
And yes, it's deployed by helm.
--
https://kenpeng.pages.dev/
We use Spark with NFS as the data store, mainly using Dr. Jeremy Freeman’s Thunder framework. Works very well (and I see HUGE throughput on the storage system during loads). I haven’t seen (or heard from the devs/users) a need for HDFS or S3.
—Ken
On Aug 25, 2016, at 8:02 PM
Hi Deepak,
Yes, that’s about the size of it. The spark job isn’t filling the disk by any stretch of the imagination; in fact the only stuff that’s writing to the disk from Spark in certain of these instances is the logging.
Thanks,
—Ken
On Jun 16, 2016, at 12:17 PM
attempted to run with 15 cores out of 16 and 25GB of RAM out of 128. He still lost nodes.
4. He’s currently running storage benchmarking tests, which consist mainly of shuffles.
Thanks!
Ken
On Jun 16, 2016, at 8:00 AM, Deepak Goel <deic...@gmail.com> wrote:
I am no expert, but some
anyone seen anything like this? Any ideas where to look next?
Thanks,
Ken
-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org
accordingly.
Thanks!
—Ken
On Apr 3, 2016, at 11:06 AM, Yong Zhang <java8...@hotmail.com> wrote:
In the standalone mode, it applies to the Driver JVM processor heap size.
You should consider giving enough memory space to it, in standalone mode, due to:
1) Any data you bring back
both 256GB nodes and 128GB nodes available for use as the
driver)
Thanks,
Ken
-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org
cluster work as far as failed workers.
Thanks again,
—Ken
On Mar 26, 2016, at 4:08 PM, Sven Krasser <kras...@gmail.com> wrote:
My understanding is that the spark.executor.cores setting controls the number of worker threads in the executor in the JVM. Each worker
threads?
Thanks!
Ken
On Mar 25, 2016, at 9:10 PM, Sven Krasser <kras...@gmail.com> wrote:
Hey Ken,
I also frequently see more pyspark daemons than configured concurrency, often it's a low multiple. (There was an issue pre-1.3.0 that caused this to be quite a
own, driving the load up. I’m hoping someone has seen something like this.
—Ken
On Mar 21, 2016, at 3:07 PM, Carlile, Ken <carli...@janelia.hhmi.org> wrote:
No further input on this? I discovered today that the pyspark.daemon threadcount was actually 48, which makes a littl
No further input on this? I discovered today that the pyspark.daemon threadcount was actually 48, which makes a little more sense (at least it’s a multiple of 16), and it seems to be happening at reduce and collect portions of the code.
—Ken
On Mar 17, 2016, at 10:51 AM, Carlile
.
—Ken
On Mar 17, 2016, at 10:50 AM, Ted Yu <yuzhih...@gmail.com> wrote:
I took a look at docs/configuration.md
Though I didn't find answer for your first question, I think the following pertains to your second question:
spark.python.worker.mem
30GB to play
with, assuming there is no overhead outside the JVM’s 90GB heap (ha ha.)
Thanks,
Ken Carlile
Sr. Unix Engineer
HHMI/Janelia Research Campus
571-209-4363
shuffling via a merge join?
I know that Flink supports this, but its JDBC support is pretty lacking in
general.
Thanks,
Ken
java version as 1.8, but I just got the
same error with invalid source release: 1.8 instead of 1.7.
My java -version and javac -version are reporting as 1.8.0.45, and I have the
JAVA_HOME env set. Anyone have any ideas?
Incidentally, building 2.0.0 from source worked fine…
Thanks,
Ken
today:
https://unscrupulousmodifier.wordpress.com/2015/07/20/running-spark-as-a-job-on-a-grid-engine-hpc-cluster-part-1
—Ken
> On Dec 21, 2015, at 4:00 PM, MegaLearn wrote:
>
> How do you start the Spark daemon, directly?
> https://issues.apache.org/jira/browse/SPARK-11570
>
>
Dani, this appears to be addressed in SPARK-5567, scheduled for Spark 1.5.0.
Ken
On May 21, 2015, at 11:12 PM, user-digest-h...@spark.apache.org wrote:
> From: Dani Qiu
> Subject: LDA prediction on new document
> Date: May 21, 2015 at 8:48:40 PM PDT
> To: user@spark.apache.or
> From: , Ken Williams
> mailto:ken.willi...@windlogics.com>>
> Date: Thursday, March 19, 2015 at 10:59 AM
> To: Spark list mailto:user@spark.apache.org>>
> Subject: JAVA_HOME problem with upgrade to 1.3.0
>
> […]
> Finally, I go and check the YARN app mast
export JAVA_HOME=/usr/jdk64/jdk1.6.0_31
-Ken
CONFIDENTIALITY NOTICE: This e-mail message is for the sole use of the intended
recipient(s) and may contain confidential and privileged information. Any
unauthorized review, use, disclosure or distribution of
/bin/bash: {{JAVA_HOME}}/bin/java: No such file or directory
Log Type: stdout
Log Length: 0
I’m not sure how to interpret that – is '{{JAVA_HOME}}' a literal (including
the brackets) that’s somehow making it into a script? Is this coming from the
I am using Spark SQL from Hive table with Parquet SerDe. Most queries are
executed from Spark's JDBC Thrift server. Is there more efficient way to
access/query data? For example, using saveAsParquetFile() and parquetFile()
to save/load Parquet data and run queries directly?
Thanks,
Ken
--
Thanks Akhil.
So the worker spark node doesn't need access to metastore to run Hive
queries? If yes, which component accesses the metastore?
For Hive, the Hive-cli accesses the metastore before submitting M/R jobs.
Thanks,
Ken
--
View this message in context:
http://apache-spark-user
Does a Spark worker node need access to Hive's metastore if part of a job
contains Hive queries?
Thanks,
Ken
--
View this message in context:
http://apache-spark-user-list.1001560.n3.nabble.com/Spark-Worker-node-accessing-Hive-metastore-tp17255.html
Sent from the Apache Spark User
I am using Spark's Thrift server to connect to Hive and use JDBC to issue
queries. Is there a way to cache table in Sparck by using JDBC call?
Thanks,
Ken
--
View this message in context:
http://apache-spark-user-list.1001560.n3.nabble.com/cache-table-with-JDBC-tp12675.html
Sent fro
What is the best way to run Hive queries in 1.0.2? In my case. Hive queries
will be invoked from a middle tier webapp. I am thinking to use the Hive JDBC
driver.
Thanks,
Ken
From: Michael Armbrust [mailto:mich...@databricks.com]
Sent: Wednesday, August 20, 2014 9:38 AM
To: Tam, Ken K
Cc: user
Is Spark SQL Thrift Server part of the 1.0.2 release? If not, which release is
the target?
Thanks,
Ken
en.apache.org/maven2/org/apache/hadoop/hadoop-yarn-server/2.2.0/hadoop-yarn-server-2.2.0.jar
-Ken
From: Shivaram Venkataraman [mailto:shiva...@eecs.berkeley.edu]
Sent: Friday, April 25, 2014 4:31 PM
To: user@spark.apache.org
Subject: Re: Build times for Spark
Are you by any chance building this
No, I haven’t done any config for SBT. Is there somewhere you might be able to
point me toward for how to do that?
-Ken
From: Josh Rosen [mailto:rosenvi...@gmail.com]
Sent: Friday, April 25, 2014 3:27 PM
To: user@spark.apache.org
Subject: Re: Build times for Spark
Did you configure SBT to use
wallclock time (88 minutes of
CPU time). After that, I did 'SPARK_HADOOP_VERSION=2.2.0 SPARK_YARN=true
sbt/sbt assembly' and that took 25 minutes wallclock, 73 minutes CPU.
Is that typical? Or does that indicate some setup problem in my environment?
--
Ken Williams, Senior Research Sc
> -Original Message-
> From: Marcelo Vanzin [mailto:van...@cloudera.com]
> Hi Ken,
>
> On Mon, Apr 21, 2014 at 1:39 PM, Williams, Ken
> wrote:
> > I haven't figured out how to let the hostname default to the host
> mentioned in our /etc/hadoop/conf/hdfs-si
e the Hadoop command-line tools do, but
that's not so important.
-Ken
> -Original Message-
> From: Williams, Ken [mailto:ken.willi...@windlogics.com]
> Sent: Monday, April 21, 2014 2:04 PM
> To: Spark list
> Subject: Problem connecting to HDFS in Spark shell
>
>
ructor.newInstance(Constructor.java:526)
at
org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:129)
... 41 more
Is this recognizable to anyone as a build problem, or a config problem, or
anything? Failing that, any way to get more information about where in the
process it's faili
Sorry, I forgot to mention this is spark-0.9.1 and shark-0.9.1.
Ken
On Thursday, April 10, 2014 9:02 AM, Ken Ellinwood wrote:
14/04/10 08:00:42 INFO AppClient$ClientActor: Executor added:
app-20140410080041-0017/9 on worker-20140409145028-ken-
VirtualBox-39159 (ken-VirtualBox:39159) with
14/04/10 08:00:42 INFO AppClient$ClientActor: Executor added:
app-20140410080041-0017/9 on worker-20140409145028-ken-
VirtualBox-39159 (ken-VirtualBox:39159) with 4 cores
14/04/10
08:00:42 INFO SparkDeploySchedulerBackend: Granted executor ID
app-20140410080041-0017/9 on hostPort ken
34 matches
Mail list logo