Re: Spark RuntimeException hadoop output format

Mohit Anchlia Fri, 14 Aug 2015 16:39:10 -0700

I thought prefix meant the output path? What's the purpose of prefix and
where do I specify the path if not in prefix?


On Fri, Aug 14, 2015 at 4:36 PM, Ted Yu <yuzhih...@gmail.com> wrote:

> Please take a look at JavaPairDStream.scala:
>  def saveAsHadoopFiles[F <: OutputFormat[_, _]](
>       prefix: String,
>       suffix: String,
>       keyClass: Class[_],
>       valueClass: Class[_],
>       outputFormatClass: Class[F]) {
>
> Did you intend to use outputPath as prefix ?
>
> Cheers
>
>
> On Fri, Aug 14, 2015 at 1:36 PM, Mohit Anchlia <mohitanch...@gmail.com>
> wrote:
>
>> Spark 1.3
>>
>> Code:
>>
>> wordCounts.foreachRDD(*new* *Function2<JavaPairRDD<String, Integer>,
>> Time, Void>()* {
>>
>> @Override
>>
>> *public* Void call(JavaPairRDD<String, Integer> rdd, Time time) *throws*
>> IOException {
>>
>> String counts = "Counts at time " + time + " " + rdd.collect();
>>
>> System.*out*.println(counts);
>>
>> System.*out*.println("Appending to " + outputFile.getAbsolutePath());
>>
>> Files.*append*(counts + "\n", outputFile, Charset.*defaultCharset*());
>>
>> *return* *null*;
>>
>> }
>>
>> });
>>
>> wordCounts.saveAsHadoopFiles(outputPath, "txt", Text.*class*, Text.
>> *class*, TextOutputFormat.*class*);
>>
>>
>> What do I need to check in namenode? I see 0 bytes files like this:
>>
>>
>> drwxr-xr-x   - ec2-user supergroup          0 2015-08-13 15:45
>> /tmp/out-1439495124000.txt
>> drwxr-xr-x   - ec2-user supergroup          0 2015-08-13 15:45
>> /tmp/out-1439495125000.txt
>> drwxr-xr-x   - ec2-user supergroup          0 2015-08-13 15:45
>> /tmp/out-1439495126000.txt
>> drwxr-xr-x   - ec2-user supergroup          0 2015-08-13 15:45
>> /tmp/out-1439495127000.txt
>> drwxr-xr-x   - ec2-user supergroup          0 2015-08-13 15:45
>> /tmp/out-1439495128000.txt
>>
>>
>>
>> However, I also wrote data to a local file on the local file system for
>> verification and I see the data:
>>
>>
>> $ ls -ltr !$
>> ls -ltr /tmp/out
>> -rw-r--r-- 1 yarn yarn 5230 Aug 13 15:45 /tmp/out
>>
>>
>> On Fri, Aug 14, 2015 at 6:15 AM, Ted Yu <yuzhih...@gmail.com> wrote:
>>
>>> Which Spark release are you using ?
>>>
>>> Can you show us snippet of your code ?
>>>
>>> Have you checked namenode log ?
>>>
>>> Thanks
>>>
>>>
>>>
>>> On Aug 13, 2015, at 10:21 PM, Mohit Anchlia <mohitanch...@gmail.com>
>>> wrote:
>>>
>>> I was able to get this working by using an alternative method however I
>>> only see 0 bytes files in hadoop. I've verified that the output does exist
>>> in the logs however it's missing from hdfs.
>>>
>>> On Thu, Aug 13, 2015 at 10:49 AM, Mohit Anchlia <mohitanch...@gmail.com>
>>> wrote:
>>>
>>>> I have this call trying to save to hdfs 2.6
>>>>
>>>> wordCounts.saveAsNewAPIHadoopFiles("prefix", "txt");
>>>>
>>>> but I am getting the following:
>>>> java.lang.RuntimeException: class scala.runtime.Nothing$ not
>>>> org.apache.hadoop.mapreduce.OutputFormat
>>>>
>>>
>>>
>>
>

Re: Spark RuntimeException hadoop output format

Reply via email to