You may get more updated information from folks at Yahoo!, but here is a
mail on hadoop-general mailing list that has some statistics:
http://www.mail-archive.com/general@hadoop.apache.org/msg05592.html
Please note it is a little dated, so things should be better now :-)
Thank
hemanth
On Tue,
Hi,
This issue issues.apache.org/jira/browse/MAPREDUCE-2911 talks about
executing hadoop and MPI on the same cluster.Even though the comments
suggest Ralph has finished writing the code, am not able to find the patch.
Can someone guide me towards finding the same?
--
Regards,
R.V.
Hi,
I read from documents that in MapReduce, the reduce tasks only start
after a percentage (by default 90%) of maps end. This means that the
slowest maps can delay the start of reduce tasks, and the input data
that is consumed by the reduce tasks is represented as a batch of
data. This means
Hi Pedro,
Yes, Hadoop Streaming has the same property. The reduce method is not
called until the mappers are done, and the reducers are not scheduled
before the threshold set by mapred.reduce.slowstart.completed.maps is
reached.
On Tue, Jan 15, 2013 at 3:06 PM, Pedro Sá da Costa
I set mapred.reduce.tasks from -1 to AutoReduce
And the hadoop created 450 tasks for Map. But 1 task for Reduce. It seems
that this reduce only run on 1 slave (I have two slaves).
But when it was running on 66%, the error report again Task
attempt_201301150318_0001_r_00_0 failed to report
Hi, I would like to report a bug .
I get negative value of function unix_timestamp.
I want to get a duration of format HH:mm:ss
unix_timestamp(duration,'HH:mm:ss')
When I test it backwards ,it works , I get my duration in a well format
with correct values. This:
from_unixtime(
Hi all,
I´m only testing the new HA feature. I´m not in a production system,
Well, let´s talk about the number of nodes and the ZKFC daemons.
In this url:
Hi,
I fail to see your confusion.
ZKFC != ZK
ZK is a quorum software, like QJM is. The ZK peers are to be run odd in
numbers, such as JNs are to be.
ZKFC is something the NN needs for its Automatic Failover capability. It is
a client to ZK and thereby demands ZK's presence; for which the odd #
Hi Harsh,
Now I´m confussed at all :-
as you pointed ZKFC runs only in the NN. That´s looks right.
So, what are ZK peers (the odd number I´m looking for) and where I have to
run them? on another 3 nodes?
As I can read from the previous url:
In a typical deployment, ZooKeeper daemons are
Hi, yaotian
I think you should check logs on the very tasktracker, it'll tell you why.
And here's some tips on deploying a MR job.
http://blog.cloudera.com/blog/2009/12/7-tips-for-improving-mapreduce-performance/
Charlie
At 2013-01-15 16:34:27,yaotian yaot...@gmail.com wrote:
I set
No, ZooKeeper daemons == http://zookeeper.apache.org.
On Tue, Jan 15, 2013 at 3:38 PM, ESGLinux esggru...@gmail.com wrote:
Hi Harsh,
Now I´m confussed at all :-
as you pointed ZKFC runs only in the NN. That´s looks right.
So, what are ZK peers (the odd number I´m looking for) and
ok,
Thats the origin of my confussion, I thought they were the same.
I´m going to read this doc to bring me a bit of light about ZooKeeper..
thank you very much for your help,
ESGLinux,
2013/1/15 Harsh J ha...@cloudera.com
No, ZooKeeper daemons == http://zookeeper.apache.org.
On Tue,
How many maps reduces did your job have? Also, what release are you using?
I'd recommend at least 2.0.2-alpha, though we should be able to release
2.0.3-alpha very soon.
Arun
On Jan 14, 2013, at 4:35 AM, Krishna Kishore Bonagiri wrote:
Hi,
I am getting the following error in
I don't think this is the right list for your query. Moving hadoop list
to bcc and cc'ing hive list.
Also I don't get how you can get unix timestamp from a field with just
hour granularity, are you missing the date information in your date
format ?
Viral
From: Carolina Vizuete Martinez
Sent:
YARN implies 2 pieces:
# Application specific 'master' to co-ordinate your application (mainly to get
resources i.e. containers for it's application from the ResourceManager and use
them)
# Application specific code which runs in the allocated Containers.
Given your use case, I'd recommend that
Hello,
I was wondering if hadoop performs the map reduce operations on the data in
maintaining he order or sequence of data in which it received the data.
I have a hadoop cluster that is receiving json files.. Which are processed
and then stored on base.
For correct calculation it is essential
As per the Mapreduce behavior, mapper will process all the input file(s) in
parallel. i.e., no order is guaranteed among the input files.
If you want to process each file separately and maintain the order then
you need to process each file separately (in an independent mapreduce job)
so that your
I think it will help for Ouch to clarify what is meant by in order. If one
JSON file must be completely processed before the next file starts, there is
not much point to using MapReduce at all, since your problem cannot be
partitioned. On the other hand, there may be ways around this, for
Helllo Shagun,
Make sure you have set the required config parameters correctly.
Also, modify the line containing 127.0.1.1 to 127.0.0.1 in your
etc/hosts file and add the hostname of your VM along with the IP in the
etc/hosts file if you are using FQDN.
If you still face any issue, have
On the dev mailing list, Harsh pointed out that there is another join
related package:
http://svn.apache.org/repos/asf/hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapreduce/lib/join/
This seems to be available
Hi,
AFAIK, the mapred.local.dir property refers to a set of directories under
which different types of data related to mapreduce jobs are stored - for
e.g. intermediate data, localized files for a job etc. The working
directory for a mapreduce job is configured under a sub directory within
one of
what do you mean by workingdir for a filesystem ? I never thought that a
fileSystem should
need or have a special workingDir ?
On Tue, Jan 15, 2013 at 12:43 PM, Hemanth Yamijala
yhema...@thoughtworks.com wrote:
Hi,
AFAIK, the mapred.local.dir property refers to a set of directories under
Jay,
For your FS question: http://en.wikipedia.org/wiki/Working_directory
On Wed, Jan 16, 2013 at 12:17 AM, Jay Vyas jayunit...@gmail.com wrote:
what do you mean by workingdir for a filesystem ? I never thought that
a fileSystem should
need or have a special workingDir ?
On Tue, Jan 15,
ah okay. so - in default hadoop dfs, the workingDir is (i beleive)
/user/hadoop/ , because as i recall when putting a file into hdfs, that
seems to be where the files naturally end up if there is no path
specified.
Hi ,
I have created a table a below
--use case 2 to test basics
DROP TABLE USER_DATA4;
CREATE TABLE USER_dATA4
(
userid INT,
movieid INT,
rating INT,
unixtime int
)
ROW FORMAT DELIMITED
FIELDS TERMINATED BY '\t'
STORED AS TEXTFILE;
--loading data into user_table
LOAD DATA LOCAL INPATH
I have a mapper
public class BuildGraph{
public void config(JobConf job){ *==this block doesnt seems to be
exexcuting at all :(*
super.configure(job);
this.currentId = job.getInt(currentId,0);
if (this.currentId!=0){
// I call a method from differnt
What happens to the NN and/or performance if there's a problem with the
NFS server? Or the network?
Thanks,
randy
On 01/14/2013 11:36 PM, Harsh J wrote:
Its very rare to observe an NN crash due to a software bug in
production. Most of the times its a hardware fault you should worry about.
On
Hi Jamal
i think you use the old MR API, you should implements the Mapper interface.
Maybe you can take a look an WordCount.java example in
example.org.apache.hadoop.examples.
On Wed, Jan 16, 2013 at 10:03 AM, jamal sasha jamalsha...@gmail.com wrote:
Hi,
The relevant code snippet posted
The NFS mount is to be soft-mounted; so if the NFS goes down, the NN ejects
it out and continues with the local disk. If auto-restore is configured, it
will re-add the NFS if its detected good again later.
On Wed, Jan 16, 2013 at 7:04 AM, randy randy...@comcast.net wrote:
What happens to the
Hi,
One place where I could find the capacity-scheduler.xml was from source -
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/resources.
AFAIK, the masters file is only used for starting the secondary namenode -
which has in 2.x been replaced by a
The masters file was removed cause we now look into config files directly
and pull out the right host to start the various configured master servers
(NNs, SNN, etc.) at.
On Wed, Jan 16, 2013 at 9:56 AM, Hemanth Yamijala yhema...@thoughtworks.com
wrote:
Hi,
One place where I could find the
Hi,
I feel the most reliable approach is using NN-HA features with shared storage.
Here the idea is having two Namenodes. Both the Active, Standby(Secondary)
Namenodes will be pointing to the shared device and writes the editlogs to it.
When the Active crashes, Standby will take over and
is :: your delimiter or : only?
in any case you can try by specifing '\072' as column separator (thats for
one : )
On Wed, Jan 16, 2013 at 2:49 AM, Monkey2Code monkey2c...@gmail.com wrote:
Hi ,
I have created a table a below
--use case 2 to test basics
DROP TABLE USER_DATA4;
CREATE
kidding Wipe your OS out. /kidding
Please read: http://search-hadoop.com/m/9Qwi9UgMOe
On Wed, Jan 16, 2013 at 1:16 PM, Vikas Jadhav vikascjadha...@gmail.comwrote:
how to remove non dfs space from hadoop cluster
--
*
*
*
Thanx and Regards*
* Vikas Jadhav*
--
Harsh J
:)
Check whats consuming space other then core OS files.
Check temp spaces and other areas
On Wed, Jan 16, 2013 at 6:53 PM, Harsh J ha...@cloudera.com wrote:
kidding Wipe your OS out. /kidding
Please read: http://search-hadoop.com/m/9Qwi9UgMOe
On Wed, Jan 16, 2013 at 1:16 PM, Vikas
35 matches
Mail list logo