Yes, a pseudo distributed CDH5, but I realize now that I haven't installed the apt packages for crunch. Im using the DistCache to upload crunch-core-0.9.0-cdh5.0.0.jar instead. Does it matter?
One thing i noticed is that you're running hadoop-client-2.3.0-cdh5.0.0 whereas i'm using hadoop-yarn-client-2.3.0-cdh5.0.0. Also when I try to install crunch using apt I see that it depends on hadoop-0.20-mapreduce and hadoop-client. I may be confused but I thought that yarn would be backward compatible with mrv1? On Wed, Jun 11, 2014 at 6:41 PM, Josh Wills <[email protected]> wrote: > Hey Kristoffer, > > Couldn't reproduce that in my crunch-demo project against my test cluster: > > https://github.com/jwills/crunch-demo/tree/cdh5 > > So I hate asking dumb questions, but are you running against a CDH5 cluster? > > J > > > On Wed, Jun 11, 2014 at 9:11 AM, Josh Wills <[email protected]> wrote: >> >> That's very odd; let me see if I can reproduce it. >> >> J >> >> >> On Wed, Jun 11, 2014 at 7:23 AM, Kristoffer Sjögren <[email protected]> >> wrote: >>> >>> Hi >>> >>> Im trying out Crunch on YARN on CDH5 (0.9.0-cdh5.0.0) and get some >>> errors when trying to materialize results (see below). The job itself >>> is super simple. >>> >>> PCollection<String> lines = pipeline.read(new TextFileSource<String>( >>> new Path("hdfs://....log"), Writables.strings())); >>> >>> lines = lines.parallelDo(new DoFn<String, String>() { >>> @Override >>> public void process(String s, Emitter<String> e) { >>> e.emit(s); >>> } >>> }, Writables.strings()); >>> >>> for (String line : lines.materialize()) { >>> System.out.println(line); >>> } >>> >>> >>> Seems like there's some kind of sync issue here because I can see the >>> "correct" tmp dir in hdfs. Note that the p index is "p2" in hdfs while >>> the client looks for "p1". >>> >>> -rw-r--r-- 1 kristoffersjogren supergroup 1748 2014-06-11 >>> 15:36 /tmp/crunch-134908575/p2/MAP >>> drwxr-xr-x - kristoffersjogren supergroup 0 2014-06-11 >>> 15:36 /tmp/crunch-134908575/p2/output >>> -rw-r--r-- 1 kristoffersjogren supergroup 0 2014-06-11 >>> 15:36 /tmp/crunch-134908575/p2/output/_SUCCESS >>> -rw-r--r-- 1 kristoffersjogren supergroup 42898831 2014-06-11 >>> 15:36 /tmp/crunch-134908575/p2/output/out0-m-00000 >>> -rw-r--r-- 1 kristoffersjogren supergroup 0 2014-06-11 >>> 15:36 /tmp/crunch-134908575/p2/output/part-m-00000 >>> >>> >>> If I try to write directly to HDFS using the following, the job finish >>> successfully, but nothing is written instead? >>> >>> pipeline.write(lines, new TextFileSourceTarget<String>("/user/stoffe", >>> Writables.strings()), WriteMode.OVERWRITE); >>> >>> >>> Any ideas of what might go wrong? >>> >>> Cheers, >>> -Kristoffer >>> >>> >>> >>> Exception in thread "main" java.lang.RuntimeException: >>> org.apache.crunch.CrunchRuntimeException: java.io.IOException: No >>> files found to materialize at: /tmp/crunch-1611606737/p1 >>> at mapred.CrunchJob.<init>(CrunchJob.java:36) >>> at mapred.tempjobs.DownloadFiles.<init>(DownloadFiles.java:16) >>> at mapred.tempjobs.DownloadFiles.main(DownloadFiles.java:20) >>> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) >>> at >>> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) >>> at >>> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) >>> at java.lang.reflect.Method.invoke(Method.java:483) >>> at com.intellij.rt.execution.application.AppMain.main(AppMain.java:134) >>> Caused by: org.apache.crunch.CrunchRuntimeException: >>> java.io.IOException: No files found to materialize at: >>> /tmp/crunch-1611606737/p1 >>> at >>> org.apache.crunch.materialize.MaterializableIterable.materialize(MaterializableIterable.java:79) >>> at >>> org.apache.crunch.materialize.MaterializableIterable.iterator(MaterializableIterable.java:69) >>> at mapred.tempjobs.DownloadFiles.run(DownloadFiles.java:37) >>> at mapred.CrunchJob.run(CrunchJob.java:96) >>> at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70) >>> at mapred.CrunchJob.<init>(CrunchJob.java:34) >>> ... 7 more >>> Caused by: java.io.IOException: No files found to materialize at: >>> /tmp/crunch-1611606737/p1 >>> at >>> org.apache.crunch.io.CompositePathIterable.create(CompositePathIterable.java:49) >>> at org.apache.crunch.io.impl.FileSourceImpl.read(FileSourceImpl.java:136) >>> at org.apache.crunch.io.seq.SeqFileSource.read(SeqFileSource.java:43) >>> at >>> org.apache.crunch.io.impl.ReadableSourcePathTargetImpl.read(ReadableSourcePathTargetImpl.java:37) >>> at >>> org.apache.crunch.materialize.MaterializableIterable.materialize(MaterializableIterable.java:76) >>> ... 12 more >> >> > > > > -- > Director of Data Science > Cloudera > Twitter: @josh_wills
