Hi Som, HdfsWordCount program counts words >From files you place in a directory with the name of argv [args.length -1] >while the program is running in a for (;;) loop until user press CTRL C.
Why does program name have prefix of HDFS ? HADOOP distributed FileSystem. Is it a program which demonstrates HDFS or streaming. I am really really confused with this program ExceptionHandlingTest. What exception handling is being tested, JVM's throw new exception syntax , if value greater than 0.75, or is it some thing meant to be testing SPARK API exception handling. spark.sparkContext.parallelize(0 until spark.sparkContext.defaultParallelism).foreach { i => if (math.random > 0.75) { throw new Exception("Testing exception handling") } } package org.apache.spark.examples import org.apache.spark.sql.SparkSession object ExceptionHandlingTest { def main(args: Array[String]): Unit = { val spark = SparkSession .builder .appName("ExceptionHandlingTest") .getOrCreate() spark.sparkContext.parallelize(0 until spark.sparkContext.defaultParallelism).foreach { i => if (math.random > 0.75) { throw new Exception("Testing exception handling") } } spark.stop() }} On Monday, 6 April 2020 Som Lima <somplastic...@gmail.com> wrote: Ok Try this one instead. (link below) It has both an EXIT which we know is rude and abusive instead of graceful structured programming and also includes half hearted user input validation. Do you think millions of spark users download and test these programmes and repeat this rude programming behaviour. I don't think they have any coding rules like the safety critical software industry But they do have strict emailing rules. Do you think email rules are far more important than programming rules and guidelines ? https://github.com/apache/spark/blob/master/examples/src/main/scala/org/apache/spark/examples/streaming/clickstream/PageViewStream.scala On Mon, 6 Apr 2020, 07:04 jane thorpe, <janethor...@aol.com.invalid> wrote: Hi Som , Did you know that simple demo program of reading characters from file didn't work ? Who wrote that simple hello world type little program ? jane thorpe janethor...@aol.com -----Original Message----- From: jane thorpe <janethor...@aol.com> To: somplasticllc <somplastic...@gmail.com>; user <user@spark.apache.org> Sent: Fri, 3 Apr 2020 2:44 Subject: Re: HDFS file hdfs://127.0.0.1:9000/hdfs/spark/examples/README.txt Thanks darling I tried this and worked hdfs getconf -confKey fs.defaultFS hdfs://localhost:9000 scala> :paste // Entering paste mode (ctrl-D to finish) val textFile = sc.textFile("hdfs://127.0.0.1:9000/hdfs/spark/examples/README.txt") val counts = textFile.flatMap(line => line.split(" ")) .map(word => (word, 1)) .reduceByKey(_ + _) counts.saveAsTextFile("hdfs://127.0.0.1:9000/hdfs/spark/examples/README7.out") // Exiting paste mode, now interpreting. textFile: org.apache.spark.rdd.RDD[String] = hdfs://127.0.0.1:9000/hdfs/spark/examples/README.txt MapPartitionsRDD[91] at textFile at <pastie>:27 counts: org.apache.spark.rdd.RDD[(String, Int)] = ShuffledRDD[94] at reduceByKey at <pastie>:30 scala> :quit jane thorpe janethor...@aol.com -----Original Message----- From: Som Lima <somplastic...@gmail.com> CC: user <user@spark.apache.org> Sent: Tue, 31 Mar 2020 23:06 Subject: Re: HDFS file Hi Jane Try this example https://github.com/apache/spark/blob/master/examples/src/main/scala/org/apache/spark/examples/streaming/HdfsWordCount.scala Som On Tue, 31 Mar 2020, 21:34 jane thorpe, <janethor...@aol.com.invalid> wrote: hi, Are there setup instructions on the website for spark-3.0.0-preview2-bin-hadoop2.7I can run same program for hdfs format val textFile = sc.textFile("hdfs://...") val counts = textFile.flatMap(line => line.split(" ")) .map(word => (word, 1)) .reduceByKey(_ + _) counts.saveAsTextFile("hdfs://...") val textFile = sc.textFile("/data/README.md")val counts = textFile.flatMap(line => line.split(" ")) .map(word => (word, 1)) .reduceByKey(_ + _)counts.saveAsTextFile("/data/wordcount") textFile: org.apache.spark.rdd.RDD[String] = /data/README.md MapPartitionsRDD[23] at textFile at <console>:28counts: org.apache.spark.rdd.RDD[(String, Int)] = ShuffledRDD[26] at reduceByKey at <console>:31 br Jane