java.io.IOException: sendMessageReliably failed without being ACK'd

2014-11-25 Thread xukun
data size: text file, 315G cmd: ./spark-submit --class com.spark.test.JavaWordCountWithSave --num-executors 7 --executor-memory 60g --driver-memory 2g --executor-cores 32 --master yarn-client /home/cjs/spark-test.jar hdfs://wordcount/input hdfs://wordcount/output code of JavaWordCountWithSave:

java.util.concurrent.TimeoutException: Futures timed out after [10000 milliseconds]

2014-11-25 Thread xukun
submit 12 spark applications in the same time. yarn web page shows: two task fail. the cmd: ./spark-submit--class org.apache.spark.examples.JavaWordCount --master yarn-cluster ---executor-memory 2g ../lib/spark-examples_2.10-1.1.0.jar hdfs://hacluster/bigData driver log of one

How to resolve Spark site issues?

2014-11-25 Thread York, Brennon
For JIRA tickets like SPARK-4046https://issues.apache.org/jira/browse/SPARK-4046 (Incorrect Java example on site) is there a way to go about fixing those things? Its a trivial fix, but I’m not seeing that code in the codebase anywhere. Is this something the admins are going to have to take

Re: How to resolve Spark site issues?

2014-11-25 Thread Reynold Xin
The website is hosted on some svn server by ASF and unfortunately it doesn't have a github mirror, so we will have to manually patch it ... On Tue, Nov 25, 2014 at 11:12 AM, York, Brennon brennon.y...@capitalone.com wrote: For JIRA tickets like SPARK-4046

Re: How to resolve Spark site issues?

2014-11-25 Thread Sean Owen
For the interested, the SVN repo for the site is viewable at http://svn.apache.org/viewvc/spark/site/ and to check it out, you can svn co https://svn.apache.org/repos/asf/spark/site; I assume the best process is to make a diff and attach it to the JIRA. How old school. On Tue, Nov 25, 2014 at

Re: How to do broadcast join in SparkSQL

2014-11-25 Thread Jianshi Huang
Hi, Looks like the latest SparkSQL with Hive 0.12 has a bug in Parquet support. I got the following exceptions: org.apache.hadoop.hive.ql.parse.SemanticException: Output Format must implement HiveOutputFormat, otherwise it should be either IgnoreKeyTextOutputFormat or SequenceFileOutputFormat

Re: How to do broadcast join in SparkSQL

2014-11-25 Thread Jianshi Huang
Oh, I found a explanation from http://cmenguy.github.io/blog/2013/10/30/using-hive-with-parquet-format-in-cdh-4-dot-3/ The error here is a bit misleading, what it really means is that the class parquet.hive.DeprecatedParquetOutputFormat isn’t in the classpath for Hive. Sure enough, doing a ls