Oh, you're absolutely right. After checking my maven dependency tree I see that the mapreduce jars brought in through a transitive dependency from crunch.
Maybe I got this all wrong, but I tought this was only a API dependency? At runtime yarn will do all the scheduling and execution, or? Im pretty sure I managed to run jobs on the resource manager without the job tracker installed. Sorry for going a bit off topic. On Thu, Jun 12, 2014 at 7:47 PM, Josh Wills <[email protected]> wrote: > So I don't think using hadoop-yarn-client is right; that doesn't include all > of the hadoop-common stuff for accessing the filesystem or the mapreduce > stuff, so I'm honestly surprised the pipeline runs at all (I suppose that > technically it doesn't?) hadoop-yarn-client is what you would use if you > were writing a yarn app of your own, w/no mapreduce. > > > On Thu, Jun 12, 2014 at 12:56 AM, Kristoffer Sjögren <[email protected]> > wrote: >> >> Ok, so I got it working now after doing apt install crunch on the name >> node. Not really sure why it fixed the problem tough? >> >> And i'm submitting the job using the yarn client with following >> dependencies. >> >> <dependency> >> <groupId>org.apache.crunch</groupId> >> <artifactId>crunch-core</artifactId> >> <version>0.9.0-cdh5.0.0</version> >> </dependency> >> <dependency> >> <groupId>org.apache.hadoop</groupId> >> <artifactId>hadoop-yarn-client</artifactId> >> <version>2.3.0-cdh5.0.0</version> >> </dependency> >> >> >> On Thu, Jun 12, 2014 at 8:59 AM, Kristoffer Sjögren <[email protected]> >> wrote: >> > Yes, a pseudo distributed CDH5, but I realize now that I haven't >> > installed the apt packages for crunch. Im using the DistCache to >> > upload crunch-core-0.9.0-cdh5.0.0.jar instead. Does it matter? >> > >> > One thing i noticed is that you're running >> > hadoop-client-2.3.0-cdh5.0.0 whereas i'm using >> > hadoop-yarn-client-2.3.0-cdh5.0.0. Also when I try to install crunch >> > using apt I see that it depends on hadoop-0.20-mapreduce and >> > hadoop-client. >> > >> > I may be confused but I thought that yarn would be backward compatible >> > with mrv1? >> > >> > On Wed, Jun 11, 2014 at 6:41 PM, Josh Wills <[email protected]> wrote: >> >> Hey Kristoffer, >> >> >> >> Couldn't reproduce that in my crunch-demo project against my test >> >> cluster: >> >> >> >> https://github.com/jwills/crunch-demo/tree/cdh5 >> >> >> >> So I hate asking dumb questions, but are you running against a CDH5 >> >> cluster? >> >> >> >> J >> >> >> >> >> >> On Wed, Jun 11, 2014 at 9:11 AM, Josh Wills <[email protected]> >> >> wrote: >> >>> >> >>> That's very odd; let me see if I can reproduce it. >> >>> >> >>> J >> >>> >> >>> >> >>> On Wed, Jun 11, 2014 at 7:23 AM, Kristoffer Sjögren <[email protected]> >> >>> wrote: >> >>>> >> >>>> Hi >> >>>> >> >>>> Im trying out Crunch on YARN on CDH5 (0.9.0-cdh5.0.0) and get some >> >>>> errors when trying to materialize results (see below). The job itself >> >>>> is super simple. >> >>>> >> >>>> PCollection<String> lines = pipeline.read(new TextFileSource<String>( >> >>>> new Path("hdfs://....log"), Writables.strings())); >> >>>> >> >>>> lines = lines.parallelDo(new DoFn<String, String>() { >> >>>> @Override >> >>>> public void process(String s, Emitter<String> e) { >> >>>> e.emit(s); >> >>>> } >> >>>> }, Writables.strings()); >> >>>> >> >>>> for (String line : lines.materialize()) { >> >>>> System.out.println(line); >> >>>> } >> >>>> >> >>>> >> >>>> Seems like there's some kind of sync issue here because I can see the >> >>>> "correct" tmp dir in hdfs. Note that the p index is "p2" in hdfs >> >>>> while >> >>>> the client looks for "p1". >> >>>> >> >>>> -rw-r--r-- 1 kristoffersjogren supergroup 1748 2014-06-11 >> >>>> 15:36 /tmp/crunch-134908575/p2/MAP >> >>>> drwxr-xr-x - kristoffersjogren supergroup 0 2014-06-11 >> >>>> 15:36 /tmp/crunch-134908575/p2/output >> >>>> -rw-r--r-- 1 kristoffersjogren supergroup 0 2014-06-11 >> >>>> 15:36 /tmp/crunch-134908575/p2/output/_SUCCESS >> >>>> -rw-r--r-- 1 kristoffersjogren supergroup 42898831 2014-06-11 >> >>>> 15:36 /tmp/crunch-134908575/p2/output/out0-m-00000 >> >>>> -rw-r--r-- 1 kristoffersjogren supergroup 0 2014-06-11 >> >>>> 15:36 /tmp/crunch-134908575/p2/output/part-m-00000 >> >>>> >> >>>> >> >>>> If I try to write directly to HDFS using the following, the job >> >>>> finish >> >>>> successfully, but nothing is written instead? >> >>>> >> >>>> pipeline.write(lines, new >> >>>> TextFileSourceTarget<String>("/user/stoffe", >> >>>> Writables.strings()), WriteMode.OVERWRITE); >> >>>> >> >>>> >> >>>> Any ideas of what might go wrong? >> >>>> >> >>>> Cheers, >> >>>> -Kristoffer >> >>>> >> >>>> >> >>>> >> >>>> Exception in thread "main" java.lang.RuntimeException: >> >>>> org.apache.crunch.CrunchRuntimeException: java.io.IOException: No >> >>>> files found to materialize at: /tmp/crunch-1611606737/p1 >> >>>> at mapred.CrunchJob.<init>(CrunchJob.java:36) >> >>>> at mapred.tempjobs.DownloadFiles.<init>(DownloadFiles.java:16) >> >>>> at mapred.tempjobs.DownloadFiles.main(DownloadFiles.java:20) >> >>>> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) >> >>>> at >> >>>> >> >>>> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) >> >>>> at >> >>>> >> >>>> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) >> >>>> at java.lang.reflect.Method.invoke(Method.java:483) >> >>>> at >> >>>> com.intellij.rt.execution.application.AppMain.main(AppMain.java:134) >> >>>> Caused by: org.apache.crunch.CrunchRuntimeException: >> >>>> java.io.IOException: No files found to materialize at: >> >>>> /tmp/crunch-1611606737/p1 >> >>>> at >> >>>> >> >>>> org.apache.crunch.materialize.MaterializableIterable.materialize(MaterializableIterable.java:79) >> >>>> at >> >>>> >> >>>> org.apache.crunch.materialize.MaterializableIterable.iterator(MaterializableIterable.java:69) >> >>>> at mapred.tempjobs.DownloadFiles.run(DownloadFiles.java:37) >> >>>> at mapred.CrunchJob.run(CrunchJob.java:96) >> >>>> at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70) >> >>>> at mapred.CrunchJob.<init>(CrunchJob.java:34) >> >>>> ... 7 more >> >>>> Caused by: java.io.IOException: No files found to materialize at: >> >>>> /tmp/crunch-1611606737/p1 >> >>>> at >> >>>> >> >>>> org.apache.crunch.io.CompositePathIterable.create(CompositePathIterable.java:49) >> >>>> at >> >>>> org.apache.crunch.io.impl.FileSourceImpl.read(FileSourceImpl.java:136) >> >>>> at org.apache.crunch.io.seq.SeqFileSource.read(SeqFileSource.java:43) >> >>>> at >> >>>> >> >>>> org.apache.crunch.io.impl.ReadableSourcePathTargetImpl.read(ReadableSourcePathTargetImpl.java:37) >> >>>> at >> >>>> >> >>>> org.apache.crunch.materialize.MaterializableIterable.materialize(MaterializableIterable.java:76) >> >>>> ... 12 more >> >>> >> >>> >> >> >> >> >> >> >> >> -- >> >> Director of Data Science >> >> Cloudera >> >> Twitter: @josh_wills > > > > > -- > Director of Data Science > Cloudera > Twitter: @josh_wills
