Hi,
I am trying to run spark job on kubernetes. Using local spark job works
fine as follows:
$ ./bin/spark-submit --class org.apache.spark.examples.SparkPi
--master local[4] examples/jars/spark-examples_2.11-2.3.0.jar 100
..
2018-05-20 21:49:02 INFO DAGScheduler:54 - Job 0 finished: reduce a
This gitbook explains Spark compotents in detail.
'Mastering Apache Spark 2'
https://www.gitbook.com/book/jaceklaskowski/mastering-apache-spark/details
2017-12-04 12:48 GMT+09:00 Manuel Sopena Ballesteros <
manuel...@garvan.org.au>:
> Dear Spark community,
>
>
>
> Is there any resource (book
I setup environment variables in my ~/.bashrc as follows:
export PYSPARK_PYTHON=/usr/local/oss/anaconda3/bin/python3.6
export PYTHONPATH=$(ls -a
${SPARK_HOME}/python/lib/py4j-*-src.zip):${SPARK_HOME}/python:$PYTHONPATH
export PYSPARK_DRIVER_PYTHON=jupyter
export PYSPARK_DRIVER_PYTHON_OPTS='noteboo
I'm not sure whether pyspark supports python 3.6 but pyspark and python
3.6 is working on my environment.
I found the following issue and it seems to be already resolved.
https://issues.apache.org/jira/plugins/servlet/mobile#issue/SPARK-19019
2017/11/02 午前11:54 "Jun Shi" :
Dear spark develop
Hello,
I'd like to count more than Int.MaxValue. But I encountered the following
error.
scala> val rdd = sc.parallelize(1L to Int.MaxValue*2.toLong)
rdd: org.apache.spark.rdd.RDD[Long] = ParallelCollectionRDD[28] at
parallelize at :24
scala> rdd.count
java.lang.IllegalArgumentException: More than
Hi Xiangrui,
By using your treeAggregate and broadcast patch, the evaluation has been
processed successfully.
I expect that these patches are merged in the next major release
(v1.1?). Without them, it would be hard to use mllib for a large dataset.
Thanks,
Makoto
(2014/07/16 15:05
lem is lurking
behind even though the consumed memory size is reduced by treeAggregate.
Best,
Makoto
l.org/message/p2i34frtf4iusdfn
Are there any preferred configurations or workaround for this issue?
Thanks,
Makoto
[The error log of the driver]
14/07/14 18:11:32 INFO scheduler.TaskSetManager: Serialized task 4.0:117
as 25300254 bytes in 35 ms
666.108
Xiangrui,
(2014/06/19 23:43), Xiangrui Meng wrote:
It is because the frame size is not set correctly in executor backend. see
spark-1112 . We are going to fix it in v1.0.1 . Did you try the treeAggregate?
Not yet. I will wait the v1.0.1 release.
Thanks,
Makoto
node.
It took about 7.6m for aggregation for an iteration.
Thanks,
Makoto
Hi Xiangrui,
(2014/06/18 8:49), Xiangrui Meng wrote:
Makoto, dense vectors are used to in aggregation. If you have 32
partitions and each one sending a dense vector of size 1,354,731 to
master. Then the driver needs 300M+. That may be the problem.
It seems that it could cuase certain problems
value allocated for RDDs in the web UI was not changed by
doing as follows:
$ SPARK_DRIVER_MEMORY=6g bin/spark-shell
I set "-verbose:gc" but full GC (or continuous GCs) does not happen
during the aggregate at the driver.
Thanks,
Makoto
d1 += grad2, loss1 + loss2)
}, 2)
-
Rebuilding Spark is quite something to do evaluation.
Thanks,
Makoto
am,
initialWeightsWithIntercept)
---
Thanks,
Makoto
2014-06-17 21:32 GMT+09:00 Makoto Yui :
> Hello,
>
> I have been evaluating LogisticRegressionWithSGD of Spark 1.0 MLlib on
> Hadoop 0.20.2-cdh3u6 but it does not wor
/0/g' >
news20.random.1000
You can find the dataset in
https://dl.dropboxusercontent.com/u/13123103/news20.random.1000
https://dl.dropboxusercontent.com/u/13123103/news20.binary.1000
Thanks,
Makoto
15 matches
Mail list logo