Hey Kristoffer, Couldn't reproduce that in my crunch-demo project against my test cluster:
https://github.com/jwills/crunch-demo/tree/cdh5 So I hate asking dumb questions, but are you running against a CDH5 cluster? J On Wed, Jun 11, 2014 at 9:11 AM, Josh Wills <[email protected]> wrote: > That's very odd; let me see if I can reproduce it. > > J > > > On Wed, Jun 11, 2014 at 7:23 AM, Kristoffer Sjögren <[email protected]> > wrote: > >> Hi >> >> Im trying out Crunch on YARN on CDH5 (0.9.0-cdh5.0.0) and get some >> errors when trying to materialize results (see below). The job itself >> is super simple. >> >> PCollection<String> lines = pipeline.read(new TextFileSource<String>( >> new Path("hdfs://....log"), Writables.strings())); >> >> lines = lines.parallelDo(new DoFn<String, String>() { >> @Override >> public void process(String s, Emitter<String> e) { >> e.emit(s); >> } >> }, Writables.strings()); >> >> for (String line : lines.materialize()) { >> System.out.println(line); >> } >> >> >> Seems like there's some kind of sync issue here because I can see the >> "correct" tmp dir in hdfs. Note that the p index is "p2" in hdfs while >> the client looks for "p1". >> >> -rw-r--r-- 1 kristoffersjogren supergroup 1748 2014-06-11 >> 15:36 /tmp/crunch-134908575/p2/MAP >> drwxr-xr-x - kristoffersjogren supergroup 0 2014-06-11 >> 15:36 /tmp/crunch-134908575/p2/output >> -rw-r--r-- 1 kristoffersjogren supergroup 0 2014-06-11 >> 15:36 /tmp/crunch-134908575/p2/output/_SUCCESS >> -rw-r--r-- 1 kristoffersjogren supergroup 42898831 2014-06-11 >> 15:36 /tmp/crunch-134908575/p2/output/out0-m-00000 >> -rw-r--r-- 1 kristoffersjogren supergroup 0 2014-06-11 >> 15:36 /tmp/crunch-134908575/p2/output/part-m-00000 >> >> >> If I try to write directly to HDFS using the following, the job finish >> successfully, but nothing is written instead? >> >> pipeline.write(lines, new TextFileSourceTarget<String>("/user/stoffe", >> Writables.strings()), WriteMode.OVERWRITE); >> >> >> Any ideas of what might go wrong? >> >> Cheers, >> -Kristoffer >> >> >> >> Exception in thread "main" java.lang.RuntimeException: >> org.apache.crunch.CrunchRuntimeException: java.io.IOException: No >> files found to materialize at: /tmp/crunch-1611606737/p1 >> at mapred.CrunchJob.<init>(CrunchJob.java:36) >> at mapred.tempjobs.DownloadFiles.<init>(DownloadFiles.java:16) >> at mapred.tempjobs.DownloadFiles.main(DownloadFiles.java:20) >> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) >> at >> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) >> at >> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) >> at java.lang.reflect.Method.invoke(Method.java:483) >> at com.intellij.rt.execution.application.AppMain.main(AppMain.java:134) >> Caused by: org.apache.crunch.CrunchRuntimeException: >> java.io.IOException: No files found to materialize at: >> /tmp/crunch-1611606737/p1 >> at >> org.apache.crunch.materialize.MaterializableIterable.materialize(MaterializableIterable.java:79) >> at >> org.apache.crunch.materialize.MaterializableIterable.iterator(MaterializableIterable.java:69) >> at mapred.tempjobs.DownloadFiles.run(DownloadFiles.java:37) >> at mapred.CrunchJob.run(CrunchJob.java:96) >> at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70) >> at mapred.CrunchJob.<init>(CrunchJob.java:34) >> ... 7 more >> Caused by: java.io.IOException: No files found to materialize at: >> /tmp/crunch-1611606737/p1 >> at >> org.apache.crunch.io.CompositePathIterable.create(CompositePathIterable.java:49) >> at org.apache.crunch.io.impl.FileSourceImpl.read(FileSourceImpl.java:136) >> at org.apache.crunch.io.seq.SeqFileSource.read(SeqFileSource.java:43) >> at >> org.apache.crunch.io.impl.ReadableSourcePathTargetImpl.read(ReadableSourcePathTargetImpl.java:37) >> at >> org.apache.crunch.materialize.MaterializableIterable.materialize(MaterializableIterable.java:76) >> ... 12 more >> > > -- Director of Data Science Cloudera <http://www.cloudera.com> Twitter: @josh_wills <http://twitter.com/josh_wills>
