Encounter 'Could not find or load main class' error when submitting spark job on kubernetes

2018-05-22 Thread Makoto Hashimoto
Hi, I am trying to run spark job on kubernetes. Using local spark job works fine as follows: $ ./bin/spark-submit --class org.apache.spark.examples.SparkPi --master local[4] examples/jars/spark-examples_2.11-2.3.0.jar 100 .. 2018-05-20 21:49:02 INFO DAGScheduler:54 - Job 0 finished: reduce a

Re: learning Spark

2017-12-05 Thread makoto
This gitbook explains Spark compotents in detail. 'Mastering Apache Spark 2' https://www.gitbook.com/book/jaceklaskowski/mastering-apache-spark/details 2017-12-04 12:48 GMT+09:00 Manuel Sopena Ballesteros < manuel...@garvan.org.au>: > Dear Spark community, > > > > Is there any resource (book

Re: pyspark configuration with Juyter

2017-11-04 Thread makoto
I setup environment variables in my ~/.bashrc as follows: export PYSPARK_PYTHON=/usr/local/oss/anaconda3/bin/python3.6 export PYTHONPATH=$(ls -a ${SPARK_HOME}/python/lib/py4j-*-src.zip):${SPARK_HOME}/python:$PYTHONPATH export PYSPARK_DRIVER_PYTHON=jupyter export PYSPARK_DRIVER_PYTHON_OPTS='noteboo

Re: Fwd: Dose pyspark supports python3.6?

2017-11-01 Thread makoto
I'm not sure whether pyspark supports python 3.6 but pyspark and python 3.6 is working on my environment. I found the following issue and it seems to be already resolved. https://issues.apache.org/jira/plugins/servlet/mobile#issue/SPARK-19019 2017/11/02 午前11:54 "Jun Shi" : Dear spark develop

count exceed int.MaxValue

2017-08-08 Thread makoto
Hello, I'd like to count more than Int.MaxValue. But I encountered the following error. scala> val rdd = sc.parallelize(1L to Int.MaxValue*2.toLong) rdd: org.apache.spark.rdd.RDD[Long] = ParallelCollectionRDD[28] at parallelize at :24 scala> rdd.count java.lang.IllegalArgumentException: More than

Re: akka disassociated on GC

2014-07-22 Thread Makoto Yui
Hi Xiangrui, By using your treeAggregate and broadcast patch, the evaluation has been processed successfully. I expect that these patches are merged in the next major release (v1.1?). Without them, it would be hard to use mllib for a large dataset. Thanks, Makoto (2014/07/16 15:05

Re: akka disassociated on GC

2014-07-16 Thread Makoto Yui
lem is lurking behind even though the consumed memory size is reduced by treeAggregate. Best, Makoto

akka disassociated on GC

2014-07-15 Thread Makoto Yui
l.org/message/p2i34frtf4iusdfn Are there any preferred configurations or workaround for this issue? Thanks, Makoto [The error log of the driver] 14/07/14 18:11:32 INFO scheduler.TaskSetManager: Serialized task 4.0:117 as 25300254 bytes in 35 ms 666.108

Re: news20-binary classification with LogisticRegressionWithSGD

2014-06-19 Thread Makoto Yui
Xiangrui, (2014/06/19 23:43), Xiangrui Meng wrote: It is because the frame size is not set correctly in executor backend. see spark-1112 . We are going to fix it in v1.0.1 . Did you try the treeAggregate? Not yet. I will wait the v1.0.1 release. Thanks, Makoto

Re: news20-binary classification with LogisticRegressionWithSGD

2014-06-19 Thread Makoto Yui
node. It took about 7.6m for aggregation for an iteration. Thanks, Makoto

Re: news20-binary classification with LogisticRegressionWithSGD

2014-06-17 Thread Makoto Yui
Hi Xiangrui, (2014/06/18 8:49), Xiangrui Meng wrote: Makoto, dense vectors are used to in aggregation. If you have 32 partitions and each one sending a dense vector of size 1,354,731 to master. Then the driver needs 300M+. That may be the problem. It seems that it could cuase certain problems

Re: news20-binary classification with LogisticRegressionWithSGD

2014-06-17 Thread Makoto Yui
value allocated for RDDs in the web UI was not changed by doing as follows: $ SPARK_DRIVER_MEMORY=6g bin/spark-shell I set "-verbose:gc" but full GC (or continuous GCs) does not happen during the aggregate at the driver. Thanks, Makoto

Re: news20-binary classification with LogisticRegressionWithSGD

2014-06-17 Thread Makoto Yui
d1 += grad2, loss1 + loss2) }, 2) - Rebuilding Spark is quite something to do evaluation. Thanks, Makoto

Re: news20-binary classification with LogisticRegressionWithSGD

2014-06-17 Thread Makoto Yui
am, initialWeightsWithIntercept) --- Thanks, Makoto 2014-06-17 21:32 GMT+09:00 Makoto Yui : > Hello, > > I have been evaluating LogisticRegressionWithSGD of Spark 1.0 MLlib on > Hadoop 0.20.2-cdh3u6 but it does not wor

news20-binary classification with LogisticRegressionWithSGD

2014-06-17 Thread Makoto Yui
/0/g' > news20.random.1000 You can find the dataset in https://dl.dropboxusercontent.com/u/13123103/news20.random.1000 https://dl.dropboxusercontent.com/u/13123103/news20.binary.1000 Thanks, Makoto