Hi Vasil! Thanks a lot for replying with your solution. Hopefully someone else will find it useful. I know that the pi example (amongst others) is in hadoop-mapreduce-examples-2.7.3.jar . I'm sorry I do not know of Matrix-vector multiplication bundled in the Apache Hadoop source. I'm sure lots of people on github may have tried that though.
Glad it worked for you finally! :-) Regards Ravi On Thu, Feb 23, 2017 at 5:41 AM, Васил Григоров <vask...@abv.bg> wrote: > Dear Ravi, > > Even though I was unable to understand most of what you suggested me to > try due to my lack of experience in the field, one of your suggestions did > guide me in the right direction and I was able to solve my error. I decided > to share it as you mentioned that you're adding this conversation to user > mailing list for other people to see in case they run into a similiar > problem. > > It turns out that my Windows username being consisted of 2 words: "Vasil > Grigorov" has messed up the paths for the application somewhere due to the > space inbetween the words. I thought I had fixed it by setting the > HADOOP_IDENT_STRING variable to equal "Vasil Grigorov" from the default > %USERNAME%, but that only disregarded my actual username. Since there is no > way of changing my Windows username, I decided to make another account > called "Vadoop" and tested running the code there. And to my surprise, the > WordCount code ran with no issue, completing both the Map and Reduce tasks > to 100% and giving me the correct output in the output directory. It's a > bit annoying that I had to go through all this trouble just because the > hadoop application hasn't been modified to escape space characters in > people's username but yet again, I don't know how hard that would be to do. > Anyway, I really appreciate the help and I hope this would help someone > else in the future. > > Additionally, I'm about to test out some more examples provided in the > hadoop documentation just to get more familiar with how it works. I have > heard about these famous examples of *Matrix-vector multiplication* and > *Estimate > the value of pi *but I have been unable to find them myself online. Do > you know if the documentation provides those examples and if so, could you > please reference them to me? Thank you in advance! > > Best regards, > Vasil Grigorov > > > > >-------- Оригинално писмо -------- > >От: Ravi Prakash ravihad...@gmail.com > >Относно: Re: WordCount MapReduce error > >До: Васил Григоров <vask...@abv.bg>, user <user@hadoop.apache.org> > >Изпратено на: 23.02.2017 02:22 > > Hi Vasil! > > I'm taking the liberty of adding back user mailing list in the hope that > someone in the future may chance on this conversation and find it useful. > > Could you please try by setting HADOOP_IDENT_STRING="Vasil" , although I > do see https://issues.apache.org/jira/browse/HADOOP-10978 and I'm not > sure it was fixed in 2.7.3. > > Could you please inspect the OS process that is launched for the Map Task? > What user does it run as? In Linux, we have the strace utility that would > let me see all the system calls that a process makes. Is there something > similar in Windows? > If you can ensure only 1 Map Task, you could try setting > "mapred.child.java.opts" to "-Xdebug -Xrunjdwp:transport=dt_socket, > server=y,suspend=y,address=1047", then connecting with a remote debugger > like eclipse / jdb and stepping through to see where the failure happens. > > That is interesting. I am guessing the MapTask is trying to write > intermediate results to "mapreduce.cluster.local.dir" which defaults to > "${hadoop.tmp.dir}/mapred/local" . hadoop.tmp.dir in turn defaults to > "/tmp/hadoop-${ user.name}" > > Could you please try setting mapreduce.cluster.local.dir (and maybe even > hadoop.tmp.dir) to preferably some location without space? Once that works, > you could try narrowing down the problem. > > HTH > Ravi > > > On Wed, Feb 22, 2017 at 4:02 PM, Васил Григоров <vask...@abv.bg> wrote: > > Hello Ravi, thank you for the fast reply. > > 1. I did have a problem with my username having a space, however I solved > it by changing the *set HADOOP_IDENT_STRING=%USERNAME% *to * set > HADOOP_IDENT_STRING="Vasil Grigorov" *in the last line of hadoop-env.cmd. > I can't change my windows username however, so if you know another file > where I should specify it? > 2. I do have a D:\tmp directory and about 500GB free space on that drive > so space shouldn't be the issue. > 3. The application has all the required permissions. > > Additionally, something I've tested is that if I set the number of reduce > tasks in the WordCount.java file to 0 (job.setNumReduceTask = 0) then I get > the success files for the Map task in my output directory. So the Map tasks > work fine but the Reduce is messing up. Is it possible that my build is > somewhat incorrect even though it said everything was successfully built? > > Thanks again, I really appreciate the help! > > > > >-------- Оригинално писмо -------- > >От: Ravi Prakash ravihad...@gmail.com > >Относно: Re: WordCount MapReduce error > >До: Васил Григоров < vask...@abv.bg> > >Изпратено на: 22.02.2017 21:36 > > Hi Vasil! > > It seems like the WordCount application is expecting to open the > intermediate file but failing. Do you see a directory under > D:/tmp/hadoop-Vasil Grigirov/ . I can think of a few reasons. I'm sorry I > am not familiar with the Filesystem on Windows 10. > 1. Spaces in the file name are not being encoded / decoded properly. Can > you try changing your name / username to remove the space? > 2. There's not enough space on the D:/tmp directory? > 3. The application does not have the right permissions to create the file. > > HTH > Ravi > > On Wed, Feb 22, 2017 at 10:51 AM, Васил Григоров <vask...@abv.bg> wrote: > > Hello, I've been trying to run the WordCount example provided on the > website on my Windows 10 machine. I have built the latest hadoop version > (2.7.3) successfully and I want to run the code on the Local (Standalone) > Mode. Thus, I have not specified any configurations, apart from setting the > JAVA_HOME path in the "hadoop-env.cmd" file. When I try to run the > WordCount file it fails to run the Reduce task but it completes the Map > tasks. I get the following output: > > > *D:\Programs\hadoop-2.7.3-src\hadoop-dist\target\hadoop-2.7.3\WordCount>hadoop > jar wc.jar WordCount > D:\Programs\hadoop-2.7.3-src\hadoop-dist\target\hadoop-2.7.3\WordCount\input > D:\Programs\hadoop-2.7.3-src\hadoop-dist\target\hadoop-2.7.3\WordCount\output* > *17/02/22 18:40:43 INFO Configuration.deprecation: session.id > <http://session.id> is deprecated. Instead, use dfs.metrics.session-id* > *17/02/22 18:40:43 INFO jvm.JvmMetrics: Initializing JVM Metrics with > processName=JobTracker, sessionId=* > *17/02/22 18:40:43 WARN mapreduce.JobResourceUploader: Hadoop command-line > option parsing not performed. Implement the Tool interface and execute your > application with ToolRunner to remedy this.* > *17/02/22 18:40:43 WARN mapreduce.JobResourceUploader: No job jar file > set. User classes may not be found. See Job or Job#setJar(String).* > *17/02/22 18:40:44 INFO input.FileInputFormat: Total input paths to > process : 2* > *17/02/22 18:40:44 INFO mapreduce.JobSubmitter: number of splits:2* > *17/02/22 18:40:44 INFO mapreduce.JobSubmitter: Submitting tokens for job: > job_local334410887_0001* > *17/02/22 18:40:45 INFO mapreduce.Job: The url to track the job: > http://localhost:8080/ <http://localhost:8080/>* > *17/02/22 18:40:45 INFO mapreduce.Job: Running job: > job_local334410887_0001* > *17/02/22 18:40:45 INFO mapred.LocalJobRunner: OutputCommitter set in > config null* > *17/02/22 18:40:45 INFO output.FileOutputCommitter: File Output Committer > Algorithm version is 1* > *17/02/22 18:40:45 INFO mapred.LocalJobRunner: OutputCommitter is > org.apache.hadoop.mapreduce.lib.output.FileOutputCommitter* > *17/02/22 18:40:45 INFO mapred.LocalJobRunner: Waiting for map tasks* > *17/02/22 18:40:45 INFO mapred.LocalJobRunner: Starting task: > attempt_local334410887_0001_m_000000_0* > *17/02/22 18:40:45 INFO output.FileOutputCommitter: File Output Committer > Algorithm version is 1* > *17/02/22 18:40:45 INFO util.ProcfsBasedProcessTree: > ProcfsBasedProcessTree currently is supported only on Linux.* > *17/02/22 18:40:45 INFO mapred.Task: Using ResourceCalculatorProcessTree > : org.apache.hadoop.yarn.util.WindowsBasedProcessTree@3019d00f* > *17/02/22 18:40:45 INFO mapred.MapTask: Processing split: > file:/D:/Programs/hadoop-2.7.3-src/hadoop-dist/target/hadoop-2.7.3/WordCount/input/file02:0+27* > *17/02/22 18:40:45 INFO mapred.MapTask: (EQUATOR) 0 kvi > 26214396(104857584)* > *17/02/22 18:40:45 INFO mapred.MapTask: mapreduce.task.io.sort.mb: 100* > *17/02/22 18:40:45 INFO mapred.MapTask: soft limit at 83886080* > *17/02/22 18:40:45 INFO mapred.MapTask: bufstart = 0; bufvoid = 104857600* > *17/02/22 18:40:45 INFO mapred.MapTask: kvstart = 26214396; length = > 6553600* > *17/02/22 18:40:45 INFO mapred.MapTask: Map output collector class = > org.apache.hadoop.mapred.MapTask$MapOutputBuffer* > *17/02/22 18:40:45 INFO mapred.LocalJobRunner:* > *17/02/22 18:40:45 INFO mapred.MapTask: Starting flush of map output* > *17/02/22 18:40:45 INFO mapred.MapTask: Spilling map output* > *17/02/22 18:40:45 INFO mapred.MapTask: bufstart = 0; bufend = 44; bufvoid > = 104857600* > *17/02/22 18:40:45 INFO mapred.MapTask: kvstart = 26214396(104857584); > kvend = 26214384(104857536); length = 13/6553600* > *17/02/22 18:40:45 INFO mapred.MapTask: Finished spill 0* > *17/02/22 18:40:45 INFO mapred.Task: > Task:attempt_local334410887_0001_m_000000_0 is done. And is in the process > of committing* > *17/02/22 18:40:45 INFO mapred.LocalJobRunner: map* > *17/02/22 18:40:45 INFO mapred.Task: Task > 'attempt_local334410887_0001_m_000000_0' done.* > *17/02/22 18:40:45 INFO mapred.LocalJobRunner: Finishing task: > attempt_local334410887_0001_m_000000_0* > *17/02/22 18:40:45 INFO mapred.LocalJobRunner: Starting task: > attempt_local334410887_0001_m_000001_0* > *17/02/22 18:40:46 INFO output.FileOutputCommitter: File Output Committer > Algorithm version is 1* > *17/02/22 18:40:46 INFO util.ProcfsBasedProcessTree: > ProcfsBasedProcessTree currently is supported only on Linux.* > *17/02/22 18:40:46 INFO mapred.Task: Using ResourceCalculatorProcessTree > : org.apache.hadoop.yarn.util.WindowsBasedProcessTree@39ef3a7* > *17/02/22 18:40:46 INFO mapred.MapTask: Processing split: > file:/D:/Programs/hadoop-2.7.3-src/hadoop-dist/target/hadoop-2.7.3/WordCount/input/file01:0+25* > *17/02/22 18:40:46 INFO mapred.MapTask: (EQUATOR) 0 kvi > 26214396(104857584)* > *17/02/22 18:40:46 INFO mapred.MapTask: mapreduce.task.io.sort.mb: 100* > *17/02/22 18:40:46 INFO mapred.MapTask: soft limit at 83886080* > *17/02/22 18:40:46 INFO mapred.MapTask: bufstart = 0; bufvoid = 104857600* > *17/02/22 18:40:46 INFO mapred.MapTask: kvstart = 26214396; length = > 6553600* > *17/02/22 18:40:46 INFO mapred.MapTask: Map output collector class = > org.apache.hadoop.mapred.MapTask$MapOutputBuffer* > *17/02/22 18:40:46 INFO mapred.LocalJobRunner:* > *17/02/22 18:40:46 INFO mapred.MapTask: Starting flush of map output* > *17/02/22 18:40:46 INFO mapred.MapTask: Spilling map output* > *17/02/22 18:40:46 INFO mapred.MapTask: bufstart = 0; bufend = 42; bufvoid > = 104857600* > *17/02/22 18:40:46 INFO mapred.MapTask: kvstart = 26214396(104857584); > kvend = 26214384(104857536); length = 13/6553600* > *17/02/22 18:40:46 INFO mapred.MapTask: Finished spill 0* > *17/02/22 18:40:46 INFO mapred.Task: > Task:attempt_local334410887_0001_m_000001_0 is done. And is in the process > of committing* > *17/02/22 18:40:46 INFO mapred.LocalJobRunner: map* > *17/02/22 18:40:46 INFO mapreduce.Job: Job job_local334410887_0001 running > in uber mode : false* > *17/02/22 18:40:46 INFO mapred.Task: Task > 'attempt_local334410887_0001_m_000001_0' done.* > *17/02/22 18:40:46 INFO mapreduce.Job: map 100% reduce 0%* > *17/02/22 18:40:46 INFO mapred.LocalJobRunner: Finishing task: > attempt_local334410887_0001_m_000001_0* > *17/02/22 18:40:46 INFO mapred.LocalJobRunner: map task executor complete.* > *17/02/22 18:40:46 INFO mapred.LocalJobRunner: Waiting for reduce tasks* > *17/02/22 18:40:46 INFO mapred.LocalJobRunner: Starting task: > attempt_local334410887_0001_r_000000_0* > *17/02/22 18:40:46 INFO output.FileOutputCommitter: File Output Committer > Algorithm version is 1* > *17/02/22 18:40:46 INFO util.ProcfsBasedProcessTree: > ProcfsBasedProcessTree currently is supported only on Linux.* > *17/02/22 18:40:46 INFO mapred.Task: Using ResourceCalculatorProcessTree > : org.apache.hadoop.yarn.util.WindowsBasedProcessTree@13ac822f* > *17/02/22 18:40:46 INFO mapred.ReduceTask: Using ShuffleConsumerPlugin: > org.apache.hadoop.mapreduce.task.reduce.Shuffle@6c4d20c4* > *17/02/22 18:40:46 INFO reduce.MergeManagerImpl: MergerManager: > memoryLimit=334338464, maxSingleShuffleLimit=83584616, > mergeThreshold=220663392, ioSortFactor=10, memToMemMergeOutputsThreshold=10* > *17/02/22 18:40:46 INFO reduce.EventFetcher: > attempt_local334410887_0001_r_000000_0 Thread started: EventFetcher for > fetching Map Completion Events* > *17/02/22 18:40:46 INFO mapred.LocalJobRunner: reduce task executor > complete.* > *17/02/22 18:40:46 WARN mapred.LocalJobRunner: job_local334410887_0001* > *java.lang.Exception: > org.apache.hadoop.mapreduce.task.reduce.Shuffle$ShuffleError: error in > shuffle in localfetcher#1* > * at > org.apache.hadoop.mapred.LocalJobRunner$Job.runTasks(LocalJobRunner.java:462)* > * at > org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:529)* > *Caused by: org.apache.hadoop.mapreduce.task.reduce.Shuffle$ShuffleError: > error in shuffle in localfetcher#1* > * at > org.apache.hadoop.mapreduce.task.reduce.Shuffle.run(Shuffle.java:134)* > * at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:376)* > * at > org.apache.hadoop.mapred.LocalJobRunner$Job$ReduceTaskRunnable.run(LocalJobRunner.java:319)* > * at > java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)* > * at java.util.concurrent.FutureTask.run(FutureTask.java:266)* > * at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)* > * at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)* > * at java.lang.Thread.run(Thread.java:745)* > *Caused by: java.io.FileNotFoundException: > D:/tmp/hadoop-Vasil%20Grigorov/mapred/local/localRunner/Vasil%20Grigorov/jobcache/job_local334410887_0001/attempt_local334410887_0001_m_000000_0/output/file.out.index* > * at > org.apache.hadoop.fs.RawLocalFileSystem.open(RawLocalFileSystem.java:200)* > * at org.apache.hadoop.fs.FileSystem.open(FileSystem.java:769)* > * at org.apache.hadoop.io > <http://org.apache.hadoop.io>.SecureIOUtils.openFSDataInputStream(SecureIOUtils.java:156)* > * at > org.apache.hadoop.mapred.SpillRecord.<init>(SpillRecord.java:71)* > * at > org.apache.hadoop.mapred.SpillRecord.<init>(SpillRecord.java:62)* > * at > org.apache.hadoop.mapred.SpillRecord.<init>(SpillRecord.java:57)* > * at > org.apache.hadoop.mapreduce.task.reduce.LocalFetcher.copyMapOutput(LocalFetcher.java:124)* > * at > org.apache.hadoop.mapreduce.task.reduce.LocalFetcher.doCopy(LocalFetcher.java:102)* > * at > org.apache.hadoop.mapreduce.task.reduce.LocalFetcher.run(LocalFetcher.java:85)* > *17/02/22 18:40:47 INFO mapreduce.Job: Job job_local334410887_0001 failed > with state FAILED due to: NA* > *17/02/22 18:40:47 INFO mapreduce.Job: Counters: 18* > * File System Counters* > * FILE: Number of bytes read=1158* > * FILE: Number of bytes written=591978* > * FILE: Number of read operations=0* > * FILE: Number of large read operations=0* > * FILE: Number of write operations=0* > * Map-Reduce Framework* > * Map input records=2* > * Map output records=8* > * Map output bytes=86* > * Map output materialized bytes=89* > * Input split bytes=308* > * Combine input records=8* > * Combine output records=6* > * Spilled Records=6* > * Failed Shuffles=0* > * Merged Map outputs=0* > * GC time elapsed (ms)=0* > * Total committed heap usage (bytes)=574095360* > * File Input Format Counters* > * Bytes Read=52* > > I have followed every tutorial available and looked for a potention > solution to the error I get, but I have been unsuccessful. As I mentioned > before, I have not set any further configurations to any files because I > want to run it in Standalone mode, rather than pseudo-distributed or fully > distributed mode. I've spent a lot of time and effort to get this far and > I've hit a brick wall with this error, so any help would be GREATLY > appreciated. > > Thank you in advance! > > > >