Re: CDH5

Josh Wills Thu, 12 Jun 2014 10:48:33 -0700

So I don't think using hadoop-yarn-client is right; that doesn't include
all of the hadoop-common stuff for accessing the filesystem or the
mapreduce stuff, so I'm honestly surprised the pipeline runs at all (I
suppose that technically it doesn't?) hadoop-yarn-client is what you would
use if you were writing a yarn app of your own, w/no mapreduce.



On Thu, Jun 12, 2014 at 12:56 AM, Kristoffer Sjögren <[email protected]>
wrote:

> Ok, so I got it working now after doing apt install crunch on the name
> node. Not really sure why it fixed the problem tough?
>
> And i'm submitting the job using the yarn client with following
> dependencies.
>
>     <dependency>
>       <groupId>org.apache.crunch</groupId>
>       <artifactId>crunch-core</artifactId>
>       <version>0.9.0-cdh5.0.0</version>
>     </dependency>
>     <dependency>
>       <groupId>org.apache.hadoop</groupId>
>       <artifactId>hadoop-yarn-client</artifactId>
>       <version>2.3.0-cdh5.0.0</version>
>     </dependency>
>
>
> On Thu, Jun 12, 2014 at 8:59 AM, Kristoffer Sjögren <[email protected]>
> wrote:
> > Yes, a pseudo distributed CDH5, but I realize now that I haven't
> > installed the apt packages for crunch. Im using the DistCache to
> > upload crunch-core-0.9.0-cdh5.0.0.jar instead. Does it matter?
> >
> > One thing i noticed is that you're running
> > hadoop-client-2.3.0-cdh5.0.0 whereas i'm using
> > hadoop-yarn-client-2.3.0-cdh5.0.0. Also when I try to install crunch
> > using apt I see that it depends on hadoop-0.20-mapreduce and
> > hadoop-client.
> >
> > I may be confused but I thought that yarn would be backward compatible
> > with mrv1?
> >
> > On Wed, Jun 11, 2014 at 6:41 PM, Josh Wills <[email protected]> wrote:
> >> Hey Kristoffer,
> >>
> >> Couldn't reproduce that in my crunch-demo project against my test
> cluster:
> >>
> >> https://github.com/jwills/crunch-demo/tree/cdh5
> >>
> >> So I hate asking dumb questions, but are you running against a CDH5
> cluster?
> >>
> >> J
> >>
> >>
> >> On Wed, Jun 11, 2014 at 9:11 AM, Josh Wills <[email protected]>
> wrote:
> >>>
> >>> That's very odd; let me see if I can reproduce it.
> >>>
> >>> J
> >>>
> >>>
> >>> On Wed, Jun 11, 2014 at 7:23 AM, Kristoffer Sjögren <[email protected]>
> >>> wrote:
> >>>>
> >>>> Hi
> >>>>
> >>>> Im trying out Crunch on YARN on CDH5 (0.9.0-cdh5.0.0) and get some
> >>>> errors when trying to materialize results (see below). The job itself
> >>>> is super simple.
> >>>>
> >>>> PCollection<String> lines = pipeline.read(new TextFileSource<String>(
> >>>>     new Path("hdfs://....log"), Writables.strings()));
> >>>>
> >>>> lines = lines.parallelDo(new DoFn<String, String>() {
> >>>>   @Override
> >>>>   public void process(String s, Emitter<String> e) {
> >>>>     e.emit(s);
> >>>>   }
> >>>> }, Writables.strings());
> >>>>
> >>>> for (String line : lines.materialize()) {
> >>>>   System.out.println(line);
> >>>> }
> >>>>
> >>>>
> >>>> Seems like there's some kind of sync issue here because I can see the
> >>>> "correct" tmp dir in hdfs. Note that the p index is "p2" in hdfs while
> >>>> the client looks for "p1".
> >>>>
> >>>> -rw-r--r--   1 kristoffersjogren supergroup       1748 2014-06-11
> >>>> 15:36 /tmp/crunch-134908575/p2/MAP
> >>>> drwxr-xr-x   - kristoffersjogren supergroup          0 2014-06-11
> >>>> 15:36 /tmp/crunch-134908575/p2/output
> >>>> -rw-r--r--   1 kristoffersjogren supergroup          0 2014-06-11
> >>>> 15:36 /tmp/crunch-134908575/p2/output/_SUCCESS
> >>>> -rw-r--r--   1 kristoffersjogren supergroup   42898831 2014-06-11
> >>>> 15:36 /tmp/crunch-134908575/p2/output/out0-m-00000
> >>>> -rw-r--r--   1 kristoffersjogren supergroup          0 2014-06-11
> >>>> 15:36 /tmp/crunch-134908575/p2/output/part-m-00000
> >>>>
> >>>>
> >>>> If I try to write directly to HDFS using the following, the job finish
> >>>> successfully, but nothing is written instead?
> >>>>
> >>>> pipeline.write(lines, new TextFileSourceTarget<String>("/user/stoffe",
> >>>> Writables.strings()), WriteMode.OVERWRITE);
> >>>>
> >>>>
> >>>> Any ideas of what might go wrong?
> >>>>
> >>>> Cheers,
> >>>> -Kristoffer
> >>>>
> >>>>
> >>>>
> >>>> Exception in thread "main" java.lang.RuntimeException:
> >>>> org.apache.crunch.CrunchRuntimeException: java.io.IOException: No
> >>>> files found to materialize at: /tmp/crunch-1611606737/p1
> >>>> at mapred.CrunchJob.<init>(CrunchJob.java:36)
> >>>> at mapred.tempjobs.DownloadFiles.<init>(DownloadFiles.java:16)
> >>>> at mapred.tempjobs.DownloadFiles.main(DownloadFiles.java:20)
> >>>> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> >>>> at
> >>>>
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
> >>>> at
> >>>>
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> >>>> at java.lang.reflect.Method.invoke(Method.java:483)
> >>>> at
> com.intellij.rt.execution.application.AppMain.main(AppMain.java:134)
> >>>> Caused by: org.apache.crunch.CrunchRuntimeException:
> >>>> java.io.IOException: No files found to materialize at:
> >>>> /tmp/crunch-1611606737/p1
> >>>> at
> >>>>
> org.apache.crunch.materialize.MaterializableIterable.materialize(MaterializableIterable.java:79)
> >>>> at
> >>>>
> org.apache.crunch.materialize.MaterializableIterable.iterator(MaterializableIterable.java:69)
> >>>> at mapred.tempjobs.DownloadFiles.run(DownloadFiles.java:37)
> >>>> at mapred.CrunchJob.run(CrunchJob.java:96)
> >>>> at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
> >>>> at mapred.CrunchJob.<init>(CrunchJob.java:34)
> >>>> ... 7 more
> >>>> Caused by: java.io.IOException: No files found to materialize at:
> >>>> /tmp/crunch-1611606737/p1
> >>>> at
> >>>>
> org.apache.crunch.io.CompositePathIterable.create(CompositePathIterable.java:49)
> >>>> at
> org.apache.crunch.io.impl.FileSourceImpl.read(FileSourceImpl.java:136)
> >>>> at org.apache.crunch.io.seq.SeqFileSource.read(SeqFileSource.java:43)
> >>>> at
> >>>>
> org.apache.crunch.io.impl.ReadableSourcePathTargetImpl.read(ReadableSourcePathTargetImpl.java:37)
> >>>> at
> >>>>
> org.apache.crunch.materialize.MaterializableIterable.materialize(MaterializableIterable.java:76)
> >>>> ... 12 more
> >>>
> >>>
> >>
> >>
> >>
> >> --
> >> Director of Data Science
> >> Cloudera
> >> Twitter: @josh_wills
>



-- 
Director of Data Science
Cloudera <http://www.cloudera.com>
Twitter: @josh_wills <http://twitter.com/josh_wills>

Re: CDH5

Reply via email to