Hi,
The classic option exists to provide backward compatibility for users
wanting to run an MR1 cluster (with JT, etc.).
With the inclusion of YARN and MR2 modes of runtime, Apache Hadoop
removed MR1 services support:
"""
➜ mapred jobtracker
Sorry, the jobtracker command is no longer supported.
Sorry for double posting
Hi!
The services are running on master(NN,JT,TT,DN) and slave(TT,DN)
according to jps. In the WebUI`s the slaves are shown as up and
running, I'm getting heartbeats and everything. When running a job, it
completes and logs everything to the command prompt. The only
Thanks for replying.
I'm using the 0.23.3 release as distributed, no previous versions.
So what's the point in documenting a classic option, then, if it is not
available? I thought distributions were self-contained, or at least the
docs don't mention that you need any previous versions.
Wh
What is your 'classic' MapReduce bundle version? 0.23 ships no classic
MapReduce services bundle in it AFAIK, only YARN+(MR2-App).
Whatever version you're trying to use, make sure it is not using the
older HDFS jars?
On Wed, Oct 3, 2012 at 10:13 AM, Alexander Hristov wrote:
> Hi again
>
> Why do
Hi again
Why does it seem to me that everything Hadoop 0.23-related is an uphill
battle? :-(
I'm trying something as simple as running a classic(MapReduce 1) Hadoop
cluster. Here's my configuration:
core-site.xml:
fs.default.name
hdfs://samplehost.com:9000
hdfs-site
Hi!
According to the article @YDN*
"The on-node parallelism is controlled by the
mapred.tasktracker.map.tasks.maximum parameter."
[http://developer.yahoo.com/hadoop/tutorial/module4.html]
Also i think its better to set the min size instead of teh max size,
so the algorithm tries to slice
Good points.
I'm not trying to be exhaustive in the discussion of real time systems.
My only intent was to point out the difference between real time and fast
response.
There are lots of real time requirements that do not require a particularly
fast response but the response needs to be on time.
This is reasonable if you have any kind of trends in the ordering of your data
or any computation in the mappers.
You can use a smaller input split to
Reduce the load on each individual mapper so that large blocks of records that
take a long time To Process are less likely to clog one mapper.
Hello,
I have a small portion of map tasks whose output is much larger than others
(more spills). So the reducer is mainly waiting for these a few map tasks. Is
there a good solution for this problem ?
Thank you.
Best,
Huanchen
Owned.
From: Ted Dunning [mailto:tdunn...@maprtech.com]
Sent: Tuesday, October 02, 2012 4:13 PM
To: user@hadoop.apache.org
Subject: Re: HADOOP in Production
On Tue, Oct 2, 2012 at 7:05 PM, Hank Cohen
mailto:hank.co...@altior.com>> wrote:
There is an important difference between real time and re
What I guess might be happening is that your data may contain some text
data that pig is not fully parsing because the data contains characters
that pig uses as delimiters (i.e commas and curly brackets). Thus, you can
probably take a look at the data and see if you can find any of the
characters
On Tue, Oct 2, 2012 at 7:05 PM, Hank Cohen wrote:
> There is an important difference between real time and real fast
>
> Real time means that system response must meet a fixed schedule.
> Real fast just means sooner is better.
>
Good thought, but real-time can also include a fixed schedule and a
Bertrand/Mohamed,
You guys are awesome. Thanks a million… Commenting out the Combiner class
in the driver solved the issue.
p.s. I have one more small dilemma.
I am trying to create xml from two files. The input for my 3rd MR job is the
(Text,Text) output from two MapReds. I feed my inputto
I only have one big input file.
Shing
From: Bejoy KS
To: user@hadoop.apache.org; Shing Hing Man
Sent: Tuesday, October 2, 2012 6:46 PM
Subject: Re: How to lower the total number of map tasks
Hi Shing
Is your input a single file or set of small files? If
I have done the following.
1) stop-all.sh
2) In mapred-site.xml, added
mapred.max.split.size
134217728
(df.block.size remain unchanged at 67108864)
3) start-all.sh
4) Use hadoop fs -cp src destn, to copy my original file to another hdfs
directory.
5) Run my mapReduce progra
There is an important difference between real time and real fast
Real time means that system response must meet a fixed schedule.
Real fast just means sooner is better.
Real time systems always have hard schedules. The schedule could be in
microseconds to control a laser for making masks for s
Hi Shing
Is your input a single file or set of small files? If latter you need to use
CombineFileInputFormat.
Regards
Bejoy KS
Sent from handheld, please excuse typos.
-Original Message-
From: Shing Hing Man
Date: Tue, 2 Oct 2012 10:38:59
To: user@hadoop.apache.org
Reply-To: user@ha
Chris - You are absolutely correct in what I am trying to accomplish -
decrease the number of files going to the maps. Admittedly, I haven't run
through all the suggestions yet today. I hope to do that by days' end.
Thank you and I will give an update later on what worked.
On Tue, Oct 2, 2012 at 1
I have tried
Configuration.setInt("mapred.max.split.size",134217728);
and setting mapred.max.split.size in mapred-site.xml. ( dfs.block.size is left
unchanged at 67108864).
But in the job.xml, I am still getting mapred.map.tasks =242 .
Shing
Fro
Shing
This doesn't change the block size of existing files in hdfs, only new files
written to hdfs will be affected. To get this in effect for old files you need
to re copy them atleast within hdfs.
hadoop fs -cp src destn.
Regards
Bejoy KS
Sent from handheld, please excuse typos.
-Origi
I agree with Bertrand. Try disabling the combiner.
Envoyé de mon iPhone
Le 2 oct. 2012 à 19:02, Bertrand Dechoux a écrit :
> Combiner? And you are only using 'Text' as type?
>
> Please do a real test with a specified input. We can only guess.
>
> Bertrand
>
> On Tue, Oct 2, 2012 at 6:52 PM,
I set the block size using
Configuration.setInt("dfs.block.size",134217728);
I have also set it in mapred-site.xml.
Shing
From: Chris Nauroth
To: user@hadoop.apache.org; Shing Hing Man
Sent: Tuesday, October 2, 2012 6:00 PM
Subject: Re: How to lowe
Sorry for the typo, the property name is mapred.max.split.size
Also just for changing the number of map tasks you don't need to modify the
hdfs block size.
On Tue, Oct 2, 2012 at 10:31 PM, Bejoy Ks wrote:
> Hi
>
> You need to alter the value of mapred.max.split size to a value larger
> than you
Combiner? And you are only using 'Text' as type?
Please do a real test with a specified input. We can only guess.
Bertrand
On Tue, Oct 2, 2012 at 6:52 PM, Chris Nauroth wrote:
> Is there also a Mapper? Is there any chance that logic in the Mapper
> wrapped the values with the tags too, so that
Hi
You need to alter the value of mapred.max.split size to a value larger than
your block size to have less number of map tasks than the default.
On Tue, Oct 2, 2012 at 10:04 PM, Shing Hing Man wrote:
>
>
>
> I am running Hadoop 1.0.3 in Pseudo distributed mode.
> When I submit a map/reduce j
Those numbers make sense, considering 1 map task per block. 16 GB file /
64 MB block size = ~242 map tasks.
When you doubled dfs.block.size, how did you accomplish that? Typically,
the block size is selected at file write time, with a default value from
system configuration used if not specified
Is there also a Mapper? Is there any chance that logic in the Mapper
wrapped the values with the tags too, so that the records were already
wrapped when they entered the reducer logic?
Thank you,
--Chris
On Tue, Oct 2, 2012 at 9:01 AM, Kartashov, Andy wrote:
> I want:
>
> Key
> Valu
I am running Hadoop 1.0.3 in Pseudo distributed mode.
When I submit a map/reduce job to process a file of size about 16 GB, in
job.xml, I have the following
mapred.map.tasks =242
mapred.min.split.size =0
dfs.block.size = 67108864
I would like to reduce mapred.map.tasks to see if it i
According to hdfs, -lsr is deprecated and -ls -R is to be used instead. In any
case, it doesn't matter as the result is exactly the same.
El 02/10/2012 2:12, Alexander Hristov escribió:
Hello
I'm trying to test the Hadoop archive functionality under 0.23 and I
can't get it working.
I have
I want:
Key
Value1
Value2
I get double tags:
Key
Value1
Value2
Here is my last proposition that also failed in Reduce.
...
public void reduce (.
StringBuilder sb = new StringBuilder();
while (values.hasNext()){
sb.appen
Great ,
Thank you for the such detailed information,
By the way what type of Disk Controller do you use?
Thanks
Oleg.
On Tue, Oct 2, 2012 at 6:34 AM, Alexander Pivovarov wrote:
> Privet Oleg
>
> Cloudera and Dell setup the following cluster for my company
> Company receives 1.5 TB raw data pe
I haven't tried it but this should also work
hadoop fs -Ddfs.block.size= -cp src dest
Raj
>
> From: Anna Lahoud
>To: user@hadoop.apache.org; bejoy.had...@gmail.com
>Sent: Tuesday, October 2, 2012 7:17 AM
>Subject: Re: File block size use
>
>
>Thank you.
Thank you. I will try today.
On Tue, Oct 2, 2012 at 12:23 AM, Bejoy KS wrote:
> **
> Hi Anna
>
> If you want to increase the block size of existing files. You can use a
> Identity Mapper with no reducer. Set the min and max split sizes to your
> requirement (512Mb). Use SequenceFileInputFormat a
Ulrich,
It is fine to run the "hadoop dfsadmin -finalizeUpgrade" command to
complete the upgrade but since it has been a while that you haven't
done that, I expect it may take quite a while to finish as there'll
now be lots of blocks to process for upgrades. However, there should
be no risk in run
Hi,
Could you clarify your post to show what you expect your code to have
actually printed and what it has printed?
On Tue, Oct 2, 2012 at 7:01 PM, Kartashov, Andy wrote:
> Guys, have been stretching my head for the past couple of days. Why are my
> tags duplicated while the content they wrap a
Hello,
in april we have upgraded our version of Hadoop 0.21.0 to version 1.0.1.
We stuck to the exact instructions in the upgrade documentation as
always. After a few months we have discovered on the Namenode that here
again "upgrade for version -32 has been completed. Upgrade is not
finalize
Guys, have been stretching my head for the past couple of days. Why are my
tags duplicated while the content they wrap around i.e.my StringBuilder sb is
not?
My Reduce code is:
while (values.hasNext()){
sb.append(values.next().toString());
}
output.collect(key,new Text("\n\n"+sb.to
El 02/10/2012 2:12, Alexander Hristov escribió:
Hello
I'm trying to test the Hadoop archive functionality under 0.23 and I
can't get it working.
I have in HDFS a /test folder with several text files. I created a
hadoop archive using
hadoop archive -archiveName test.har -p /test *.txt /s
Hi again,
i executed a slightly different script again, that included some more
operations. The logs look similar, but this time i have 2 attempt files
for the same job-package:
(1) _temporary/_attempt_201210021204_0001_r_01_0/part-r-1
(2) _temporary/_attempt_201210021204_0001_r_01
Funny that the OP asks about 'real time'...
This comes up quiet often and its always misunderstood.
First, when we say 'real time' many take it to mean subjective real time. Real
'real time' would require some sort of RTOS underneath.
Second Hadoop is a parallelized framework. You have sever
Hi,
There are too many issues to discuss I guess. I would recommend
reading Hadoop The Definitive Guide by Tom White. There are some
chapters for the answers.
Also what did you mean my 'real time"? Hadoop is not designed for
giving real time results of queries. It is rather for offline data
analys
Hello!
Please add a new meetup:
http://www.meetup.com/Hadoop-Moscow/
to page:
http://wiki.apache.org/hadoop/RussiaHadoopUserGroup
Thanks in advance!
For Hadoopers - please visit the upcoming meetup.
For Apache/Hortonworks/Cloudera/MapR, etc - please contact the meetup
organizer (me) for any type
于 2012/10/2 15:15, Tatsuo Kawasaki 写道:
Hi,
Could you please hit jps command on your Jobtracker node?
And please check your firewall settings.
(If you are using RHEL/CentOS, run iptables -L)
Cheers,
--
Tatsuo
-- Replied Message --
From: Romedius Weiss
To: user@h
Hi,
Could you please hit jps command on your Jobtracker node?
And please check your firewall settings.
(If you are using RHEL/CentOS, run iptables -L)
Cheers,
--
Tatsuo
-- Replied Message --
From: Romedius Weiss
To: user@hadoop.apache.org
Date: Tue, 02 Oct 2012 0
44 matches
Mail list logo