I have configured Hadoop Yarn in a VM with 2048MB of RAM and 1 CPU core.
Then, I have configured the max and the min limits of memory to be used
in Yarn in the |mapred-site.xml|
| yarn.scheduler.minimum-allocation-mb
1024
yarn.scheduler.maximum-allocation-mb 2048
Hi,
If we run a job without reduce tasks, the map output is going to be
saved into HDFS. Now, I would like to launch another job that reads the
map output and compute the reduce phase. Is it possible to execute a job
that reads the map output from HDFS and just runs the reduce phase?
I have found the error. It was a bug in my code.
On 09/24/2015 07:47 PM, xeonmailinglist wrote:
No, I am looking to the right place. Here is the output of a maptask.
I also thought that the print would come up, but it isn't showing.
That is why I am asking.
[1] Output from from one map task
Hi,
1.
I have this example of MapReduce [1], and I want to print info in
the stdout and in a log file. It seems that the logs isn’t print
anything. How can I make my class print these words?
2.
I also have set in the |yarn-site.xml| to retain log. Although the
logs are retained
, 2015 at 9:54 AM, xeonmailinglist
<xeonmailingl...@gmail.com <mailto:xeonmailingl...@gmail.com>> wrote:
Does anyone know this question about logging in MapReduce. I can't
find an example or an explanation on how to print something inside
user map and reduce functions.
Does anyone know this question about logging in MapReduce. I can't find
an example or an explanation on how to print something inside user map
and reduce functions.
On 09/24/2015 04:19 PM, xeonmailinglist wrote:
Hi,
1.
I have this example of MapReduce [1], and I want to print info
Hi,
Is there a way to pass system properties to hadoop jar command?
My example is not working. |System.getProperty("file.path")| gives me
|null|:
|$ hadoop jar medusa-java.jar mywordcount
-Dfile.path=/home/xubuntu/Programs/medusa-2.0/temp/1442932326/job.attributes
/input1 /output1 |
Hi,
I want to execute `hadoop jar myexample.jar` but I want to pass spring
framework jars. How can I add spring framework jars to hadoop classpath?
Thanks,
Hi,
Hadoop YARN uses HDFS to store and read data from the filesystem. But
what communication technology uses to transfer data between map and
reduces, and for the node managers contact the resource manager? All
communication in point to point?
Thanks,
The mapreduce jobs can be used in application that accept error rates in
the result. For example, application that calculate approximations of
values? One example of this applications is the one that can produce a
different results that are lower than a threshold.
Hi,
In [1] I show an wordcount example. I am trying to pointcut the
invocations of |output.collect| and the method |cleanup| with AOP, but
it is very difficult to do this.
I have tried to set |org.apache.hadoop.mapred.JobConf| and
|org.apache.hadoop.mapreduce.Job;| in beans in order to
I am setting my wordcount example, which is very similar to the
Wordcount example that we find in the Internet.
1.
The MyMap class extends and implements the same classes as the ones
defined in the original wordcount example, but in my case I get the
error of “Interface expected
I am setting my wordcount example, which is very similar to the
Wordcount example that we find in the Internet.
1.
The MyMap class extends and implements the same classes as the ones
defined in the original wordcount example, but in my case I get the
error of “Interface expected
to
read only thenn you cannot write to it. They must be set to
read/write to copy files.
Cheers
Vern
On Mon, 2015-08-17 at 21:46 +0100, xeonmailinglist wrote:
Hi,
I am trying to understand why I can't copy files between HDFS filesytems
to a directory /input1, but I can copy files
Hi,
I am trying to understand why I can't copy files between HDFS filesytems
to a directory /input1, but I can copy files to the root dir /.
Eg, I have 2 hosts (hadoop-coc-2, and hadoop-coc3), and I can't copy
files between the dir input1 [1], but I can copy the file to the root
dir [2].
Is it possible to get the hadoop working dir using the bash commands?
I am running the hadoop MRv2 in a cluster with 4 nodes. Java 8 is installed.
I start the resource manager and the node manager normally, but during
the execution the resource manager crashes with the error below. Any
help to solve this? Is it a problem related to java heap, or memory?
|
Jul
line for the resource manager ?
Thanks
On Wed, Jul 1, 2015 at 9:38 AM, xeonmailinglist-gmail
xeonmailingl...@gmail.com mailto:xeonmailingl...@gmail.com wrote:
I am running the hadoop MRv2 in a cluster with 4 nodes. Java 8 is
installed.
I start the resource manager and the node
Hi,
I have this map class that is accepting input files with a key as
LongWritable and a value of Text.
The input file is in [1]. Here we can see that it contains a key as a
Long (I think) and bytes as value.
In [2], it is my map class. The goal of the map class is to read the
input data,
-dir/name
value/home/xubuntu/Programs/hadoop/logs/history/done/value /property
property namemapreduce.jobhistory.intermediate-done-dir/name
value/home/xubuntu/Programs/hadoop/logs/history/intermediate-done-dir/value /property
|
On 05/28/2015 02:14 PM, xeonmailinglist-gmail wrote:
Hi,
I have
Hi,
I am trying to launch a job that I have configured in in java, but I get
an error related to the containers [1]. I don’t understand why I can’t
submit the a job. Why get this error? What can I do to fix it?
Thanks,
[1] Log of |logs/yarn-xubuntu-nodemanager-hadoop-coc-1.log|
|15/05/28
.
```
On 05/28/2015 03:59 PM, xeonmailinglist-gmail wrote:
The error that I got is [1], but I still don’t understand why I get
this error. I couldn’t find more detail about it. Any suggestion?
[1]
|
Application application_1432817967879_0003 failed 2 times due to AM Container
I have found why I couldn’t access the container. I needed to have in
|yarn-site.xml| the property:
|property nameyarn.log.server.url/name
valuehttp://192.168.56.101:19888/jobhistory/logs//value /property
|
On 05/28/2015 04:01 PM, xeonmailinglist-gmail wrote:
If I click in the logs link, I
://mailto:user@hadoop.apache.org
Have you checked the link
http://192.168.56.101:9046/proxy/application_1432817967879_0003
http://192.168.56.101:9046/proxy/application_1432817967879_0003/Then ?
You should get come clue from logs of the 2 attempts.
On Thu, May 28, 2015 at 6:42 AM, xeonmailinglist-gmail
),
new FsPermission(JobSubmissionFiles.JOB_FILE_PERMISSION), splitVersion,
info);
}
|
Thanks,
On 05/18/2015 03:56 PM, xeonmailinglist-gmail wrote:
Shabab, I think so, but the Hadoop’s site says |The user@ mailing list
is the preferred mailing list for end-user questions and
discussion
Regards,
Shahab
On Mon, May 18, 2015 at 9:42 AM, xeonmailinglist-gmail
xeonmailingl...@gmail.com mailto:xeonmailingl...@gmail.com wrote:
Why Remove?
On 05/18/2015 02:25 PM, Gopy Krishna wrote:
REMOVE
On Mon, May 18, 2015 at 6:54 AM, xeonmailinglist-gmail
xeonmailingl
Why Remove?
On 05/18/2015 02:25 PM, Gopy Krishna wrote:
REMOVE
On Mon, May 18, 2015 at 6:54 AM, xeonmailinglist-gmail
xeonmailingl...@gmail.com mailto:xeonmailingl...@gmail.com wrote:
Hi,
I am trying to submit a remote job in Yarn MapReduce, but I can’t
because I get the error
Hi,
I am trying to submit a remote job in Yarn MapReduce, but I can’t
because I get the error [1]. I don’t have more exceptions in the other logs.
My Mapreduce runtime have 1 /ResourceManager/ and 3 /NodeManagers/, and
the HDFS is running properly (all nodes are alive).
I have looked to
I also can't find a good site/book that explains well how to submit
remote jobs. Also, can anyone know where can I get more useful info?
Forwarded Message
Subject:can't submit remote job
Date: Mon, 18 May 2015 11:54:56 +0100
From: xeonmailinglist-gmail
Hi,
1 - I am trying to launch a remote job, so I have set in my hosts where
MapReduce is running the following params in |core-site.xml| [1].
The remote user that is launching the job is |xeon|. But I still get the
error [2].
For extra-information, here are the |/tmp| dir [3].
[1]
|
Hi,
I want to run a remote mapreduce job. So, I have created programatically
a job [1], but when I submit it remotely, I get the error [2].
First, I have thought that it was a security issue because the client
username is |xeon| and the remote usernomu in |ubuntu|, but I have
noticed that
Hi,
I have an host |HostA| that has a public and a private IP address [1]. I
want to configure Yarn MapReduce so that it is possible to submit jobs
remotely.
I have put the public IP in the yarn-site.xml [2], but still can’t
launch the resource manager [3]. How can I enable MapReduce to
When I am trying to |distcp| between 2 MapReduces, the job remains lots
of time in this position, and eventually it ends successfully. In other
words, the job is quick to reach this position, and then it gets stuck
in here for lots of time. I would like to understand why it takes so
much time
Hi,
I have several directories that contain several files of
|SequenceFileOutputFormat| with |org.apache.hadoop.io.Text| as key and
value. I want to merge all these files into one.
I have looked to the join example [1], but it is not working. How I
merge SequenceFiles?
|+
I am trying to submit a remote job in mapreduce, but I get the error [1].
I even have set in |hdfs-site.xml| in the remote hadoop the content [2],
and changed permissions [3], but the problem remains.
How should I get rid of this problem?
|2015-04-23 05:57:35,648 WARN
Hi,
I have a MapReduce runtime where I put several jobs running in
concurrency. How I manage the job scheduler so that it won't run a job
at a time?
Thanks,
--
--
Gridmix2 contains the Webdatascan, webdatasort, and monsterquery, but I
don't know what kind of job Gridmix3 offers. What kind of jobs the
Gridmix3 package has?
--
--
Hi
I am trying to understand what the
|/src/test/org/apache/hadoop/mapred/GenericMRLoadGenerator.java| does. I
have noticed that the map and reduce functions uses this method, but I
don’t understand what it does. What is the purpose of this class? What
this method is doing?
|
protected
Hi,
I would like to run the |Webdatascan| example from Gridmix2, but I can’t
find the classes in Mapreduce 2.6. How I run the gridmix example in YARN?
[1] http://www.programdevelop.com/2458744/
--
--
Thanks
Hi,
I have run the command [1] to create compressed data from my Sequence
files that are in the |/user/root/out1| dir, but I got the error [2].
How I compress data in hadoop?
[1]
|hadoop jar ./share/hadoop/tools/lib/hadoop-streaming-2.6.0.jar -D
mapreduce.job.reduces=30 -D
Hi,
I want to create 3GB of compressed sequence file in Hadoop. How can I do
this?
Thanks,
--
--
Hi,
I have created a Mapper class[3] that filters out key values pairs that
go to a specific partition. When I set the partition class in my code
[1], I get the error in [2] and I don’t understand why this is
happening. Any help to fix this?
[1]
|Configuration conf = cj.getConfiguration();
Hi,
I am trying to see what are the current tests that the Gridmix2 package
do in MapReduce2.x. I see that the Gridmix jar
|hadoop-gridmix-2.6.0.jar|does not have anymore the WebDataSort,
WebDataScan, MonsterQuery, Combiner, and Streaming tests. The current
tests are in [1].
From all of
Hi,
Is there a way to tell which Reduce tasks will run which partition? E.g,
I want that reduce task 0 will read partition 0, reduce task 1
will read partition 1, etc...
Thanks,
--
--
Hi,
I am trying to understand how |HashPartitioner.java| works. Thus, I ran
a mapreduce job with 5 reducers and 5 input files. I thought that the
output of |getPartition(K2 key, V2 value, int numReduceTasks)| was the
number of reduce task that |K2| and |V2| will execute. Is this correct?
What tells with partition will run on which reduce task?
On 18-03-2015 09:30, xeonmailinglist-gmail wrote:
Hi,
I am trying to understand how |HashPartitioner.java| works. Thus, I
ran a mapreduce job with 5 reducers and 5 input files. I thought that
the output of |getPartition(K2 key, V2
Hi,
With this configuration in mapreduce (see [1] and [2]), I can’t see the
map and reduce logs of the job when it ends. When I try to look to the
history, I get this error |Not Found: job_1426267326549_0005|. But if I
list the log dir in the hdfs (see [3]), I have some logs about the job,
.
Is there any obstacle from doing this in your map method ?
Regards,
Naga
*From:* xeonmailinglist-gmail [xeonmailingl...@gmail.com]
*Sent:* Thursday, March 12, 2015 22:17
*To:* user@hadoop.apache.org
*Subject:* Fwd: Re: Prune
could use Partitioner.class to solve your problem.
On Mar 11, 2015, at 6:28 AM, xeonmailinglist-gmail
xeonmailingl...@gmail.com mailto:xeonmailingl...@gmail.com wrote:
Hi,
I have this job that has 3 map tasks and 2 reduce tasks. But, I want
to excludes data that will go to the reduce task 2
reducer, even if one of them does not get input data.
On 11-03-2015 10:28, xeonmailinglist-gmail wrote:
Hi,
I have this job that has 3 map tasks and 2 reduce tasks. But, I want
to excludes data that will go to the reduce task 2. This means that,
only reducer 1 will produce data, and the other one
Maybe the correct question is, how can I filter data in mapreduce in Java?
On 11-03-2015 10:36, xeonmailinglist-gmail wrote:
To exclude data to a specific reducer, should I build a partitioner
that do this? Should I have a map function that checks to which reduce
task the output goes?
Can
Hi,
I have this job that has 3 map tasks and 2 reduce tasks. But, I want to
excludes data that will go to the reduce task 2. This means that, only
reducer 1 will produce data, and the other one will be empty, or even it
doesn't execute.
How can I do this in MapReduce?
Example Job Execution
Hi,
I am looking to YARN MapReduce internals, and I would like know if it
possible to know which file a map/reduce function is reading or writing
from inside a map or reduce function defined by the user, or simply by
the client?
Thanks,
--
--
Hi,
I am looking to the Yarn mapreduce internals to try to understand how
reduce tasks know which partition of the map output they should read.
Even, when they re-execute after a crash?
I am also looking to the mapreduce source code. Is there any class that
I should look to try to
() and setup()) contains
information about the input split, such as the file name.
From the top of my head: String fileName = ((FileSplit)
context.getInputSplit()).getPath().getName();
Kai
Am 09.03.2015 um 16:39 schrieb xeonmailinglist-gmail
xeonmailingl...@gmail.com mailto:xeonmailingl
Hi,
During a mapreduce execution, there are some temp configuration files
that are created.
I want that these temp files won't be removed when a job ends. How I
configure this?
Thanks
--
--
Hi,
I have configured in 2 hosts (hadooop-coc-1, and hadoop-coc-2) a
federation of HDFS. In the configuration, I have set a namespace in each
host, and a single data node (see image). The service is running
properly. You can check the output of the |jps| commands in [1].
The strange part is
Hi,
I was reading about Federation of HDFS, which is possible in YARN
(http://www.devx.com/opensource/enhance-existing-hdfs-architecture-with-hadoop-federation.html),
and I started to wonder if is it possible to have 2 YARN runtimes that
share the same HDFS namespace?
Thanks,
Hi,
1 - I have HDFS running with WebHDFS protocol. I want to copy data from
local disk to HDFS, but I get the error below. How I copy data from the
local disk to HDFS?
|xubuntu@hadoop-coc-1:~/Programs/hadoop$ hdfs dfs -copyFromLocal ~/input1
webhdfs://192.168.56.101:8080/
Java HotSpot(TM)
Hi,
I would like to have a mapreduce job that reads input data from 2 HDFS.
Is this possible?
Thanks,
Is there a way to clone a *org.apache.hadoop.mapreduce.Job *that was
created by a user?
-hdfs-.jar into your class path ?
From: xeonmailinglist xeonmailingl...@gmail.com
mailto:xeonmailingl...@gmail.com
Reply-To: user@hadoop.apache.org mailto:user@hadoop.apache.org
user@hadoop.apache.org mailto:user@hadoop.apache.org
Date: Tuesday, February 24, 2015 at 10:57 AM
To: user
Hi,
I noticed that when we have a mapreduce job with no reduce tasks, YARN
saves the map output is the HDFS. I want that the job still save the map
output in the local disk.
In YARN, is it possible to have a mapreduce job that only executes map
tasks (no reduce tasks to execute), and that
Hi,
Is there a way to submit a job using the YARN REST API?
Thanks,
Hi,
I would like to submit a mapreduce job in a remote YARN cluster. Can I
do this in java, or using a REST API?
Thanks,
Hi,
Is it possible to use SHA-256, or MD5 as a checksum in a file in HDFS?
Thanks,
Hi,
1.
I am trying to find ways to serialize and clone a job in MapReduce.
Is it possible?
2.
Is there a way to convert |org.apache.hadoop.mapreduce.Job| into a
xml or json file, and then the opposite?
3. If I use Oozie I can clone a job, by duplicating the job definition
in
I have this map function that is printing something to the system
output, but I can't find the output.
Where YARN prints the logs of the function of map and reduce defined by
the user?
|public static class TokenizerMapper
extends MapperObject, Text, Text, IntWritable{
private
By job, I mean an mapreduce job. I would like to suspend and resume the
mapreduce job whilst it is executing.
On 18-02-2015 12:10, xeonmailinglist wrote:
Hi,
I want to suspend a job that it is in execution when all maptasks
finishes, and then resume the job later.
Can I do this in yarn
Hi,
I want to suspend a job that it is in execution when all maptasks
finishes, and then resume the job later.
Can I do this in yarn? Is there an API for that, or I must use the
command line?
Thanks,
Hi,
I want to execute a job remotely. So, I was thinking in serialize the
org.apache.hadoop.mapreduce.Job class and send it to a remote component
that I create that launches the job there, or find a way to transform
the Job class into a configuration file that my remote component will
I want to compile MapReduce of the YARN 2.6.
Does anyone have a tutorial on how to compile YARN 2.6?
Thanks
Hi,
In YARN 2.6, which classes call the |map|, |combine| and |reduce|
functions created by the client?
I found it.
Thanks anyway.
On 06-02-2015 16:17, xeonmailinglist wrote:
Hi,
In YARN 2.6, which classes call the |map|, |combine| and |reduce|
functions created by the client?
Hi,
I want to list files in the HDFS using the |FileUtil.listFiles| but all
I get is IOException errors. The code, error and the output is below.
How I list files in HDFS?
|Exception in thread main java.io.IOException: Invalid directory or I/O error
occurred for dir: /outputmp
|
I have
Hi,
I am trying to run |distcp| using a java class, but I get the error of
class not found |DistCpOptions|. I have used the argument |-libjars
./share/hadoop/tools/lib/hadoop-distcp-2.6.0.jar| to pass the jar file,
but it seems that is not right. How I pass the lib properly?
Output:
I have found the problem. I started to use `webhdfs` and everything is ok.
On 03-02-2015 10:40, xeonmailinglist wrote:
What do you mean by no path is given? Even if I launch this command, I
get the same error…. What path should I put here?
|$ hadoop distcp hdfs://hadoop-coc-1:50070/input1
, xeonmailinglist wrote:
Hi,
I am trying to run |distcp| using a java class, but I get the error of
class not found |DistCpOptions|. I have used the argument |-libjars
./share/hadoop/tools/lib/hadoop-distcp-2.6.0.jar| to pass the jar
file, but it seems that is not right. How I pass the lib
, xeonmailinglist wrote:
Hi,
I am trying to run |distcp| using a java class, but I get the error of
class not found |DistCpOptions|. I have used the argument |-libjars
./share/hadoop/tools/lib/hadoop-distcp-2.6.0.jar| to pass the jar
file, but it seems that is not right. How I pass the lib properly
Hi,
I want this because I want to create depency between 2 jobs. The first
job execute the wordcount example, and the second job copy the output of
the wordcount to another HDFS.
Therefore, I want to create a job (job 2) that includes the code to copy
data to another HDFS. The code is below.
.
On 02 Feb 2015, at 20:52, xeonmailinglist xeonmailingl...@gmail.com
mailto:xeonmailingl...@gmail.com wrote:
Hi,
I am trying to copy data using |distcp| but I get this error. Both
hadoop runtime are working properly. Why is this happening?
|
vagrant@hadoop-coc-1:~/Programs/hadoop$ hadoop
:
|hdfs://hadoop-coc-2:50070/|
No Path is given.
On 02 Feb 2015, at 20:52, xeonmailinglist xeonmailingl...@gmail.com
mailto:xeonmailingl...@gmail.com wrote:
Hi,
I am trying to copy data using |distcp| but I get this error. Both
hadoop runtime are working properly. Why is this happening
Hi,
I am trying to copy data using |distcp| but I get this error. Both
hadoop runtime are working properly. Why is this happening?
|
vagrant@hadoop-coc-1:~/Programs/hadoop$ hadoop distcp
hdfs://hadoop-coc-1:50070/input1 hdfs://hadoop-coc-2:50070/
15/02/02 19:46:37 ERROR tools.DistCp: Invalid
But can I use discp inside my job, or I need to program something that
executes distcp after executing my job?
On 02-02-2015 10:20, Daniel Haviv wrote:
an use distcp
Daniel
On 2 בפבר׳ 2015, at 11:12,
84 matches
Mail list logo