i have used jmx to monitor the topics and queues hosted in message server.
i am a new hadooper.
so i am very interested in this topic. i think that, if the map and
reduce tasks are wrapped into MBean, we can easily monitor the tasks'
status.
2011/9/27 patrick sang silvianhad...@gmail.com
hi
Hi all,
I have a 32-bit binary that uses libhdfs for accessing hdfs (on the cloudera
VM) and am trying to run it on cluster with 64-bit machines. But
unfortunately it crashes with error while loading shared libraries:
libjvm.so: wrong ELF class: ELFCLASS64. (libhdfs needs libjvm.so). I tried
Hi Brian
Thanks for a prompt response.
The machines on cluster didn't have libhdfs.so.0 file. So I copied my
libhdfs.so (that came with cloudera vm - libhdfs0 and libhdfs0-dev) on the
cluster machine. So it should be 32-bit.
The wrong ELF class error pops up when I try to use the libjvm.so on
Here is the output of file libhdfs.so:
libhdfs.so.0: ELF 32-bit LSB shared object, Intel 80386, version 1 (SYSV),
dynamically linked, stripped
Vivek
--
On Tue, Sep 27, 2011 at 10:24 AM, Vivek K hadoop.v...@gmail.com wrote:
Hi Brian
Thanks for a prompt response.
The machines on cluster
On Sep 27, 2011, at 9:24 AM, Vivek K wrote:
Hi Brian
Thanks for a prompt response.
The machines on cluster didn't have libhdfs.so.0 file. So I copied my
libhdfs.so (that came with cloudera vm - libhdfs0 and libhdfs0-dev) on the
cluster machine. So it should be 32-bit.
The wrong ELF
Hi Vivek,
That's a difficult question to answer due to the vagaries of Java on Linux
distros (I could probably give an answer valid on SL5.7, but nothing else).
You'll need to work that out with your sysadmin.
I think the answer should be yes, but depending on your distribution, that
yes may
The problem is the step 4 in the breaking sequence. Currently the TaskTracker
never looks at the disk to know if a file is in the distributed cache or not.
It assumes that if it downloaded the file and did not delete that file itself
then the file is still there in its original form. It does
Thanks Raif.
On Mon, Sep 26, 2011 at 2:01 PM, Ralf Heyde ralf.he...@gmx.de wrote:
Hi Bikash,
every map-/reduce task is - as far as I know - a single jvm instance - you
can configure and/or run with jvm options.
Maybe you can track these jvm's by using some system tools.
Regards,
Ralf
Hi -- Can we specify a different set of slaves for each mapreduce job run.
I tried using the --config option and specify different set of slaves in
slaves config file. However, it does not use the selective slaves set but
the one initially configured.
Any help?
Thanks,
Biksah
Who is in charge of getting the files there for the first time? The
addCacheFile call in the mapreduce job? Or a manual setup by the
user/operator?
On Tue, Sep 27, 2011 at 11:35 AM, Robert Evans ev...@yahoo-inc.com wrote:
The problem is the step 4 in the breaking sequence. Currently the
addCacheFile sets a config value in your jobConf that indicates which files
your particular job depends on. When the TaskTracker is assigned to run part
of your job (map task or reduce task), it will download your jobConf, read it
in, and then download the files listed in the conf, if it has
Slaves file is used only by control scripts like {start/stop}-dfs.sh,
{start/stop}-mapred.sh to start the data nodes and task trackers on
specified set of slave machines.. they can not be used effectively to change
the size of the cluster for each M/R job (unless you want to restart the
task
Thanks Suhas. I will try using HOD. The use case for me is some research
experiments with different set of slaves for each job run.
On Tue, Sep 27, 2011 at 1:03 PM, Vitthal Suhas Gogate
gog...@hortonworks.com wrote:
Slaves file is used only by control scripts like {start/stop}-dfs.sh,
If you are never ever going to use that file again for any map/reduce task in
the future then yes you can delete it, but I would not recommend it. If you
want to reduce the amount of space that is used by the distributed cache there
is a config parameter for that.
local.cache.size it is the
I'm not concerned about disk space usage -- the script we used that deleted
the taskTracker cache path has been fixed not to do so.
I'm curious about the exact behavior of jobs that use DistributedCache
files. Again, it seems safe from your description to delete files between
completed runs. How
Yes, all of the state for the task tracker is in memory. It never looks at the
disk to see what is there, it only maintains the state in memory.
--bobby Evans
On 9/27/11 1:00 PM, Meng Mao meng...@gmail.com wrote:
I'm not concerned about disk space usage -- the script we used that deleted
the
So the proper description of how DistributedCache normally works is:
1. have files to be cached sitting around in HDFS
2. Run Job A, which specifies those files to be put into DistributedCache
space. Each worker node copies the to-be-cached files from HDFS to local
disk, but more importantly, the
Desktop edition was chosen just to run the namemode and to monitor cluster
statistics. Workernodes were chosen to run on ubuntu server edition because
we find this configuration in several research papers. One of such
configuration can be found in the paper for LATE scheduler (is maybe some
source
That is correct, However, it is a bit more complicated then that. The Task
Tracker's in memory index of the distributed cache is keyed off of the path of
the file and the HDFS creation time of the file. So if you delete the original
file off of HDFS, and then recreate it with a new time
The simplest route I can think of is to ingest the data directly into HDFS
using Sqoop if there is a driver currently made for your database. At that
point it would be relatively simple just to read directly from HDFS in your MR
code.
Matt
-Original Message-
From: lessonz
Hi
I am a scheduling/optimization algorithm expert with many years of
experience (in applying scheduling to manufacturing and transportation
industry). I want to see if I can improve the scheduling algorithms in
Hadoop like the Fair Scheduler. However, I am struggling to get a
basic hadoop
Amal,
Welcome to Hadoop!
Currently we have two, very different, versions of MapReduce: MRv1 and MRv2.
Most of the active developers on MapReduce are working on MRv2. If you want to
take a look please see the sources under trunk/branch-0.23:
Hi,
A development platform is the system (s) which are used mainly for the
developers to write / unit test code for the project.
There are generally NO end users in the Development system.
Production platform is where the end users actually work and the
project is generally moved here only
Special Thanks for your help Arko,
You mean in Hadoop, NameNode, DataNodes, JobTracker, TaskTrackers and all
the clusters should deployed on Linux machines???
We have lots of data (on windows OS) and code (written in C#) for data
mining, we wana to use Hadoop and make connection between
our
Bourne,
There is only one datanode? The Verification succeeded messages are from
a Datanode background housekeeping task, DataBlockScanner, which attempts to
discover any replicas that have become corrupt. If it finds one (which
should be rare), it tells the Namenode the replica has become
Currently Windows is not a supported production platform for Hadoop. You
should run all of your daemons on Linux machines. You can move your data to
HDFS on those nodes easily, the C# piece you can use Hadoop Streaming (
http://hadoop.apache.org/common/docs/r0.20.2/streaming.html#Hadoop+Streaming)
So, I thought about that, and I'd considered writing to the HDFS and then
copying the file into the DistributedCache so each mapper/reducer doesn't
have to reach into the HDFS for these files. Is that the best way to
handle this?
On Tue, Sep 27, 2011 at 4:01 PM, GOEKE, MATTHEW (AG/1000)
hello,
Subscribe to List common-user-subscr...@hadoop.apache.org
thx
Most likely the easiest and fastest way as you will be leveraging the
distributed ingestion of Sqoop, rather than a single-thread import some
other way.
On Wed, Sep 28, 2011 at 12:27 AM, lessonz less...@q.com wrote:
So, I thought about that, and I'd considered writing to the HDFS and then
Hi,
You necessarily don't need to execute the C# codes on Linux.
You can write a middleware application to bring the data from the Win
boxes to the Linux (Hadoop) boxes if you want to.
Cheers
Arko
On Tue, Sep 27, 2011 at 10:19 PM, Hamedani, Masoud
mas...@agape.hanyang.ac.kr wrote:
Special
Hadoop Streaming :)
On Wed, Sep 28, 2011 at 12:30 AM, Arko Provo Mukherjee
arkoprovomukher...@gmail.com wrote:
Hi,
You necessarily don't need to execute the C# codes on Linux.
You can write a middleware application to bring the data from the Win
boxes to the Linux (Hadoop) boxes if you
Thanks for your nice help Arko,
maybe because im new in hadoop i cant get some of points,
im studying hadoop manual more deeply to have better info.
B.S
Masoud.
2011/9/28 Arko Provo Mukherjee arkoprovomukher...@gmail.com
Hi,
You necessarily don't need to execute the C# codes on Linux.
You
Hey folks,
I have my job tracker GUI which shows a lot of information about the
running/completed jobs.
I am interested in the field Reduce shuffle bytes. I want to know how it
is computed... Is it just the sum of all the bytes received per reducer
during shuffle ?
Any help?
Thanks
33 matches
Mail list logo