data size: text file, 315G
cmd:
./spark-submit --class com.spark.test.JavaWordCountWithSave --num-executors 7
--executor-memory 60g --driver-memory 2g --executor-cores 32
--master yarn-client /home/cjs/spark-test.jar hdfs://wordcount/input
hdfs://wordcount/output
code of JavaWordCountWithSave:
submit 12 spark applications in the same time. yarn web page shows: two task
fail.
the cmd:
./spark-submit--class org.apache.spark.examples.JavaWordCount --master
yarn-cluster ---executor-memory 2g ../lib/spark-examples_2.10-1.1.0.jar
hdfs://hacluster/bigData
driver log of one
For JIRA tickets like
SPARK-4046https://issues.apache.org/jira/browse/SPARK-4046 (Incorrect Java
example on site) is there a way to go about fixing those things? Its a trivial
fix, but I’m not seeing that code in the codebase anywhere. Is this something
the admins are going to have to take
The website is hosted on some svn server by ASF and unfortunately it
doesn't have a github mirror, so we will have to manually patch it ...
On Tue, Nov 25, 2014 at 11:12 AM, York, Brennon brennon.y...@capitalone.com
wrote:
For JIRA tickets like SPARK-4046
For the interested, the SVN repo for the site is viewable at
http://svn.apache.org/viewvc/spark/site/ and to check it out, you can
svn co https://svn.apache.org/repos/asf/spark/site;
I assume the best process is to make a diff and attach it to the JIRA.
How old school.
On Tue, Nov 25, 2014 at
Hi,
Looks like the latest SparkSQL with Hive 0.12 has a bug in Parquet support.
I got the following exceptions:
org.apache.hadoop.hive.ql.parse.SemanticException: Output Format must
implement HiveOutputFormat, otherwise it should be either
IgnoreKeyTextOutputFormat or SequenceFileOutputFormat
Oh, I found a explanation from
http://cmenguy.github.io/blog/2013/10/30/using-hive-with-parquet-format-in-cdh-4-dot-3/
The error here is a bit misleading, what it really means is that the class
parquet.hive.DeprecatedParquetOutputFormat isn’t in the classpath for Hive.
Sure enough, doing a ls