Re: Potability of dockers built on different cloud platforms

2023-04-05 Thread Ken Peng
ashok34...@yahoo.com.INVALID wrote: Is it possible to use Spark docker built on GCP on AWS without rebuilding from new on AWS? I am using the spark image from bitnami for running on k8s. And yes, it's deployed by helm. -- https://kenpeng.pages.dev/

Re: What do I loose if I run spark without using HDFS or Zookeeper?

2016-08-26 Thread Carlile, Ken
We use Spark with NFS as the data store, mainly using Dr. Jeremy Freeman’s Thunder framework. Works very well (and I see HUGE throughput on the storage system during loads). I haven’t seen (or heard from the devs/users) a need for HDFS or S3. —Ken On Aug 25, 2016, at 8:02 PM

Re: Spark crashes worker nodes with multiple application starts

2016-06-16 Thread Carlile, Ken
Hi Deepak,  Yes, that’s about the size of it. The spark job isn’t filling the disk by any stretch of the imagination; in fact the only stuff that’s writing to the disk from Spark in certain of these instances is the logging.  Thanks, —Ken On Jun 16, 2016, at 12:17 PM

Re: Spark crashes worker nodes with multiple application starts

2016-06-16 Thread Carlile, Ken
attempted to run with 15 cores out of 16 and 25GB of RAM out of 128. He still lost nodes.  4. He’s currently running storage benchmarking tests, which consist mainly of shuffles.  Thanks! Ken On Jun 16, 2016, at 8:00 AM, Deepak Goel <deic...@gmail.com> wrote: I am no expert, but some

Spark crashes worker nodes with multiple application starts

2016-06-16 Thread Carlile, Ken
anyone seen anything like this? Any ideas where to look next? Thanks, Ken - To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org

Re: spark.driver.memory meaning

2016-04-03 Thread Carlile, Ken
accordingly.  Thanks! —Ken On Apr 3, 2016, at 11:06 AM, Yong Zhang <java8...@hotmail.com> wrote: In the standalone mode, it applies to the Driver JVM processor heap size. You should consider giving enough memory space to it, in standalone mode, due to: 1) Any data you bring back

spark.driver.memory meaning

2016-04-03 Thread Carlile, Ken
both 256GB nodes and 128GB nodes available for use as the driver) Thanks, Ken - To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org

Re: Limit pyspark.daemon threads

2016-03-26 Thread Carlile, Ken
cluster work as far as failed workers.  Thanks again,  —Ken On Mar 26, 2016, at 4:08 PM, Sven Krasser <kras...@gmail.com> wrote: My understanding is that the spark.executor.cores setting controls the number of worker threads in the executor in the JVM. Each worker

Re: Limit pyspark.daemon threads

2016-03-26 Thread Carlile, Ken
threads?  Thanks! Ken On Mar 25, 2016, at 9:10 PM, Sven Krasser <kras...@gmail.com> wrote: Hey Ken, I also frequently see more pyspark daemons than configured concurrency, often it's a low multiple. (There was an issue pre-1.3.0 that caused this to be quite a

Re: Limit pyspark.daemon threads

2016-03-25 Thread Carlile, Ken
own, driving the load up. I’m hoping someone has seen something like this.  —Ken On Mar 21, 2016, at 3:07 PM, Carlile, Ken <carli...@janelia.hhmi.org> wrote: No further input on this? I discovered today that the pyspark.daemon threadcount was actually 48, which makes a littl

Re: Limit pyspark.daemon threads

2016-03-21 Thread Carlile, Ken
No further input on this? I discovered today that the pyspark.daemon threadcount was actually 48, which makes a little more sense (at least it’s a multiple of 16), and it seems to be happening at reduce and collect portions of the code.  —Ken On Mar 17, 2016, at 10:51 AM, Carlile

Re: Limit pyspark.daemon threads

2016-03-18 Thread Carlile, Ken
.  —Ken On Mar 17, 2016, at 10:50 AM, Ted Yu <yuzhih...@gmail.com> wrote: I took a look at docs/configuration.md Though I didn't find answer for your first question, I think the following pertains to your second question:   spark.python.worker.mem

Limit pyspark.daemon threads

2016-03-18 Thread Carlile, Ken
30GB to play with, assuming there is no overhead outside the JVM’s 90GB heap (ha ha.) Thanks, Ken Carlile Sr. Unix Engineer HHMI/Janelia Research Campus 571-209-4363

merge join already sorted data?

2016-02-25 Thread Ken Geis
shuffling via a merge join? I know that Flink supports this, but its JDBC support is pretty lacking in general. Thanks, Ken

building spark 1.6.0 fails

2016-01-28 Thread Carlile, Ken
java version as 1.8, but I just got the same error with invalid source release: 1.8 instead of 1.7. My java -version and javac -version are reporting as 1.8.0.45, and I have the JAVA_HOME env set. Anyone have any ideas? Incidentally, building 2.0.0 from source worked fine… Thanks, Ken

Re: Applicaiton Detail UI change

2015-12-21 Thread Carlile, Ken
today: https://unscrupulousmodifier.wordpress.com/2015/07/20/running-spark-as-a-job-on-a-grid-engine-hpc-cluster-part-1 —Ken > On Dec 21, 2015, at 4:00 PM, MegaLearn wrote: > > How do you start the Spark daemon, directly? > https://issues.apache.org/jira/browse/SPARK-11570 > >

Re: LDA prediction on new document

2015-05-21 Thread Ken Geis
Dani, this appears to be addressed in SPARK-5567, scheduled for Spark 1.5.0. Ken On May 21, 2015, at 11:12 PM, user-digest-h...@spark.apache.org wrote: > From: Dani Qiu > Subject: LDA prediction on new document > Date: May 21, 2015 at 8:48:40 PM PDT > To: user@spark.apache.or

Re: JAVA_HOME problem with upgrade to 1.3.0

2015-03-23 Thread Williams, Ken
> From: , Ken Williams > mailto:ken.willi...@windlogics.com>> > Date: Thursday, March 19, 2015 at 10:59 AM > To: Spark list mailto:user@spark.apache.org>> > Subject: JAVA_HOME problem with upgrade to 1.3.0 > > […] > Finally, I go and check the YARN app mast

Re: JAVA_HOME problem with upgrade to 1.3.0

2015-03-19 Thread Williams, Ken
export JAVA_HOME=/usr/jdk64/jdk1.6.0_31 -Ken CONFIDENTIALITY NOTICE: This e-mail message is for the sole use of the intended recipient(s) and may contain confidential and privileged information. Any unauthorized review, use, disclosure or distribution of

JAVA_HOME problem with upgrade to 1.3.0

2015-03-19 Thread Williams, Ken
/bin/bash: {{JAVA_HOME}}/bin/java: No such file or directory Log Type: stdout Log Length: 0 I’m not sure how to interpret that – is '{{JAVA_HOME}}' a literal (including the brackets) that’s somehow making it into a script? Is this coming from the

Data Source for Spark SQL

2014-11-25 Thread ken
I am using Spark SQL from Hive table with Parquet SerDe. Most queries are executed from Spark's JDBC Thrift server. Is there more efficient way to access/query data? For example, using saveAsParquetFile() and parquetFile() to save/load Parquet data and run queries directly? Thanks, Ken --

Re: Spark Worker node accessing Hive metastore

2014-10-29 Thread ken
Thanks Akhil. So the worker spark node doesn't need access to metastore to run Hive queries? If yes, which component accesses the metastore? For Hive, the Hive-cli accesses the metastore before submitting M/R jobs. Thanks, Ken -- View this message in context: http://apache-spark-user

Spark Worker node accessing Hive metastore

2014-10-24 Thread ken
Does a Spark worker node need access to Hive's metastore if part of a job contains Hive queries? Thanks, Ken -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Spark-Worker-node-accessing-Hive-metastore-tp17255.html Sent from the Apache Spark User

cache table with JDBC

2014-08-22 Thread ken
I am using Spark's Thrift server to connect to Hive and use JDBC to issue queries. Is there a way to cache table in Sparck by using JDBC call? Thanks, Ken -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/cache-table-with-JDBC-tp12675.html Sent fro

RE: Is Spark SQL Thrift Server part of the 1.0.2 release

2014-08-20 Thread Tam, Ken K
What is the best way to run Hive queries in 1.0.2? In my case. Hive queries will be invoked from a middle tier webapp. I am thinking to use the Hive JDBC driver. Thanks, Ken From: Michael Armbrust [mailto:mich...@databricks.com] Sent: Wednesday, August 20, 2014 9:38 AM To: Tam, Ken K Cc: user

Is Spark SQL Thrift Server part of the 1.0.2 release

2014-08-20 Thread Tam, Ken K
Is Spark SQL Thrift Server part of the 1.0.2 release? If not, which release is the target? Thanks, Ken

RE: Build times for Spark

2014-04-25 Thread Williams, Ken
en.apache.org/maven2/org/apache/hadoop/hadoop-yarn-server/2.2.0/hadoop-yarn-server-2.2.0.jar -Ken From: Shivaram Venkataraman [mailto:shiva...@eecs.berkeley.edu] Sent: Friday, April 25, 2014 4:31 PM To: user@spark.apache.org Subject: Re: Build times for Spark Are you by any chance building this

RE: Build times for Spark

2014-04-25 Thread Williams, Ken
No, I haven’t done any config for SBT. Is there somewhere you might be able to point me toward for how to do that? -Ken From: Josh Rosen [mailto:rosenvi...@gmail.com] Sent: Friday, April 25, 2014 3:27 PM To: user@spark.apache.org Subject: Re: Build times for Spark Did you configure SBT to use

Build times for Spark

2014-04-25 Thread Williams, Ken
wallclock time (88 minutes of CPU time). After that, I did 'SPARK_HADOOP_VERSION=2.2.0 SPARK_YARN=true sbt/sbt assembly' and that took 25 minutes wallclock, 73 minutes CPU. Is that typical? Or does that indicate some setup problem in my environment? -- Ken Williams, Senior Research Sc

RE: Problem connecting to HDFS in Spark shell

2014-04-21 Thread Williams, Ken
> -Original Message- > From: Marcelo Vanzin [mailto:van...@cloudera.com] > Hi Ken, > > On Mon, Apr 21, 2014 at 1:39 PM, Williams, Ken > wrote: > > I haven't figured out how to let the hostname default to the host > mentioned in our /etc/hadoop/conf/hdfs-si

RE: Problem connecting to HDFS in Spark shell

2014-04-21 Thread Williams, Ken
e the Hadoop command-line tools do, but that's not so important. -Ken > -Original Message- > From: Williams, Ken [mailto:ken.willi...@windlogics.com] > Sent: Monday, April 21, 2014 2:04 PM > To: Spark list > Subject: Problem connecting to HDFS in Spark shell > >

Problem connecting to HDFS in Spark shell

2014-04-21 Thread Williams, Ken
ructor.newInstance(Constructor.java:526) at org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:129) ... 41 more Is this recognizable to anyone as a build problem, or a config problem, or anything? Failing that, any way to get more information about where in the process it's faili

Re: /bin/java not found: JAVA_HOME ignored launching shark executor

2014-04-10 Thread Ken Ellinwood
Sorry, I forgot to mention this is spark-0.9.1 and shark-0.9.1. Ken On Thursday, April 10, 2014 9:02 AM, Ken Ellinwood wrote: 14/04/10 08:00:42 INFO AppClient$ClientActor: Executor added: app-20140410080041-0017/9 on worker-20140409145028-ken- VirtualBox-39159 (ken-VirtualBox:39159) with

/bin/java not found: JAVA_HOME ignored launching shark executor

2014-04-10 Thread Ken Ellinwood
14/04/10 08:00:42 INFO AppClient$ClientActor: Executor added: app-20140410080041-0017/9 on worker-20140409145028-ken- VirtualBox-39159 (ken-VirtualBox:39159) with 4 cores 14/04/10 08:00:42 INFO SparkDeploySchedulerBackend: Granted executor ID app-20140410080041-0017/9 on hostPort ken