Hi Manoj,
Reply inline.
On Mon, Aug 13, 2012 at 3:42 PM, Manoj Babu manoj...@gmail.com wrote:
Hi All,
Normal Hadoop job submission process involves:
Checking the input and output specifications of the job.
Computing the InputSplits for the job.
Setup the requisite accounting information
Hi,
I have an HDFS folder and M/R job that periodically updates it by replacing the
data with newly generated data.
I have a different M/R job that periodically or ad-hoc process the data in the
folder.
The second job ,naturally, fails sometime, when the data is replaced by newly
generated
How about introducing a distributed coordination and locking mechanism?
ZooKeeper would be a good candidate for that kind of thing.
On Mon, Aug 13, 2012 at 12:52 PM, David Ginzburg ginz...@hotmail.comwrote:
Hi,
I have an HDFS folder and M/R job that periodically updates it by
replacing the
Hi Harsh,
Thanks for your reply.
Consider from my main program i am doing so
many activities(Reading/writing/updating non hadoop activities) before
invoking JobClient.runJob(conf);
Is it anyway to separate the process flow by programmatic instead of going
for any workflow engine?
Cheers!
Manoj.
Sure, you may separate the logic as you want it to be, but just ensure
the configuration object has a proper setJar or setJarByClass done on
it before you submit the job.
On Mon, Aug 13, 2012 at 4:43 PM, Manoj Babu manoj...@gmail.com wrote:
Hi Harsh,
Thanks for your reply.
Consider from my
David,
While ZK can solve this, locking may only make you slower. Lets try to
keep it simple?
Have you considered keeping two directories? One where the older data
is moved to (by the first job, instead of replacing files), for
consumption by the second job, which triggers by watching this
We have all documents moved to HDFS. I understand with our 1st option we
need more I/O than what you say but let's say that's not a problem for now.
Could you please point me on 2) option? how could we do that? any tutorial
or example?
Thanks
2012/8/13 Bertrand Dechoux decho...@gmail.com
1) A
Hi Bertrand
-libjars option works well with the 'hadoop jar' command. Instead of
executing your runnable with the plain java 'jar' command use 'hadoop jar'
. When you use hadoop jar you can ship the dependent jars/files etc as
1) include them in the /lib folder in your jar
2) use -libjars /
You mean like that:
hadoop jar Rdg.jar my.hadoop.Rdg -libjars Rdg_lib/* tester rdg_output
Where Rdg_lib is the a folder containing all reqd classes/jars stored on
HDFS.
We get this error though. We do something wrong?
12/08/10 08:16:24 ERROR security.UserGroupInformation:
Hi Rishab,
Please provide the outputs of:
$ uname -a; lsb_release -a
$ file $HADOOP_HOME/bin/fuse_dfs
$ $HADOOP_HOME/bin/hadoop version
On Mon, Aug 13, 2012 at 1:25 PM, Rishabh Agrawal
rishabh.agra...@impetus.co.in wrote:
So do I have to download fuse libraries and install it before running
Subho,
Can you try to tweak the mapred.task.tracker.http.address in
mapred-site.xml, and set it to always bind to localhost? (i.e. set it
to localhost:50060, instead of default 0.0.0.0:50060) and then see
if you get this behavior?
On Mon, Aug 13, 2012 at 12:37 PM, Subho Banerjee
Hi Sandeep,
You may try JackHare: http://sourceforge.net/projects/jackhare/.
Regards,
Jeff Hung
Thanks Harsh. I think I have resolved the issue. Now another problem has come
after I add
fuse-dfs#dfs://localhost:8020 mount point fuse allow_other,usetrash,rw 2 0
to fstab and execute mount mount point I get /bin/sh: fuse-dfs: not found
Any tip on that.
-Rishabh
-Original Message-
Hello Harsh,
I tried setting it. But it doesnt seem to help.
There is also something else that I found out
The link
http://localhost:50060/tasklog?plaintext=trueattemptid=attempt_201208131655_0001_m_00_0filter=stderr
works and actually returns me the error, however
Hi,
i am currently trying to run my hadoop program on a cluster. Sadly
though my datanodes and tasktrackers seem to have difficulties with
their communication as their logs say:
* Some datanodes and tasktrackers seem to have portproblems of some kind
as it can be seen in the logs below. I
Hello there,
Could you please share your /etc/hosts file, if you don't mind.
Regards,
Mohammad Tariq
On Mon, Aug 13, 2012 at 6:01 PM, Björn-Elmar Macek
ma...@cs.uni-kassel.dewrote:
Hi,
i am currently trying to run my hadoop program on a cluster. Sadly though
my datanodes and
If the nodes can communicate and distribute data, then the odds are that the
issue isn't going to be in his /etc/hosts.
A more relevant question is if he's running a firewall on each of these
machines?
A simple test... ssh to one node, ping other nodes and the control nodes at
random to see
I am not sure to understand and I guess I am not the only one.
1) What's a worker in your context? Only the logic inside your Mapper or
something else?
2) You should clarify your cases. You seem to have two cases but both are
in overhead so I am assuming there is a baseline? Hadoop vs sequential,
SIGIS Soluciones Integrales GIS C.A
Hi Michael,
I asked for hosts file because there seems to be some loopback prob
to me. The log shows that call is going at 0.0.0.0. Apart from what you
have said, I think disabling IPv6 and making sure that there is no prob
with the DNS resolution is also necessary. Please correct me if I
Hello Subho,
Please check the permission of mapred.local.dir is. This is the
place where map outputs are store. Reduce phase is sending the read
requests but this directory is not accessible. As a result 403 is thrown.
Regards,
Mohammad Tariq
On Mon, Aug 13, 2012 at 9:51 AM,
Hello Astie,
Please make sure your datanode is up. I think you have not included
hadoop.tmp.dir, dfs.name.dir and dfs.data.dir properties. The value
of these props default to the /tmp dir, which gets emptied on each restart.
As a result you loose all your data and meta information.
Regards,
Ok, I try to clarify:
1) The worker is the logic inside my mapper and the same for both cases.
2) I have two cases. In the first one I use hadoop to execute my worker and
in a second one, I execute my worker without hadoop (simple read of the
file).
Now I measured, for both cases, the time the
Hi Matthais
When an mapreduce program is being used there are some extra steps like
checking for input and output dir, calclulating input splits, JT assigning TT
for executing the task etc.
If your file is non splittable , then one map task per file will be generated
irrespective of the
Aji,
The best place would be to ask on Apache Accumulo's own user lists,
subscrib-able at http://accumulo.apache.org/mailing_list.html
That said, if Accumulo bases itself on HDFS, then its data safety
should be the same or nearly the same as what HDFS itself can offer.
Note that with 2.1.0
0.0.0.0 means that the call is going to all interfaces on the machine.
(Shouldn't be an issue...)
IPv4 vs IPv6? Could be an issue, however OP says he can write data to DNs and
they seem to communicate, therefore if its IPv6 related, wouldn't it impact all
traffic and not just a specific port?
Thank you so very much for the detailed response Michael. I'll keep the tip
in mind. Please pardon my ignorance, as I am still in the learning phase.
Regards,
Mohammad Tariq
On Mon, Aug 13, 2012 at 8:29 PM, Michael Segel michael_se...@hotmail.comwrote:
0.0.0.0 means that the call is
It was almost what I was getting at but I was not sure about your problem.
Basically, Hadoop is only adding overhead due to the way your job is
constructed.
Now the question is : why do you need a single mapper? Is your need truly
not 'parallelisable'?
Bertrand
On Mon, Aug 13, 2012 at 4:49 PM,
On 13 August 2012 07:55, Harsh J ha...@cloudera.com wrote:
Note that with 2.1.0 (upcoming) and above releases of HDFS, we offer a
working hsync() API that allows you to write files with guarantee that
it has been written to the disk (like the fsync() *nix call).
A guarantee that the OS
Hi Astie,
Live Nodes:0
That the live nodes = 0 is the real issue here.
If you're running off of default configs (i.e. haven't overriden
hadoop.tmp.dir, dfs.name.dir, nor dfs.data.dir), do this:
$ rm -rf /tmp/hadoop-$(whoami)/dfs/data
And then:
$ $HADOOP_HOME/bin/start-all.sh
And you should
@Bejoy KS: Thanks for your advice.
@Bertrand: It is parallelisable, this is just a test case. In later cases
there will be a lot of big files which should be processed completly each
in one map step. We want to minimize the overhead of network traffic. The
idea is to execute some worker (could be
Seems like you want to misuse Hadoop but maybe I still don't understand
your context.
The standard way would be to split your files into multiples maps. Each map
could profit from data locality. Do a part of the worker stuff in the
mapper and then use a reducer to aggregate all the results (which
On 13 August 2012 08:42, Harsh J ha...@cloudera.com wrote:
Hey Steve,
Interesting, thanks for pointing that out! I didn't know that it
disables this by default :)
It's always something to watch out for: someone implementing a disk FS, OS,
VM environment discovering that they get great
Mohamed,
Currently Hadoop native code does not compile/run in any flavor of OS X.
Thanks.
Alejandro
On Mon, Aug 13, 2012 at 2:59 AM, J Mohamed Zahoor jmo...@gmail.com wrote:
Hi
I have problems compiling native's in OS X 10.8 for trunk. Especially in
Yarn projects.
Anyone faced similar
I am wondering how Hadoop assign groups when dirs/files are being created
by a user and below are some tests I have done. In my cluster, group hadoop
is configured as the supergroup.
hadoop fs -ls /tmp
drwxrwxrwx - abc hadoop 0 2012-08-10 23:02 /tmp/abc
drwxrwxrwx - def other_group
Astie,
Since you've overriden these, do:
$ rm -rf /home/astie/hdfs/data
And then re-run your start-all command. After this works, please never
re-issue a namenode -format unless you really want to wipe
everything away and start over.
On Mon, Aug 13, 2012 at 9:48 PM, Astie Darmayantie
the logs indicate already in use exception. is that some sign? :)
On 13 Aug 2012 20:36, Mohammad Tariq donta...@gmail.com wrote:
Thank you so very much for the detailed response Michael. I'll keep the
tip in mind. Please pardon my ignorance, as I am still in the learning
phase.
Regards,
You may see similar problem compiling HDFS native code too since it's not
supported on OS X yet.
Brandon
On Sun, Aug 12, 2012 at 10:49 PM, J Mohamed Zahoor jmo...@gmail.com wrote:
Hi
I have problems compiling native's in OS X 10.8 for trunk. Especially in
Yarn projects.
Anyone faced
Where do I set this?
On Mon, Aug 13, 2012 at 7:52 PM, Mohammad Tariq donta...@gmail.com wrote:
Hello Subho,
Please check the permission of mapred.local.dir is. This is the
place where map outputs are store. Reduce phase is sending the read
requests but this directory is not
unsubscribe
40 matches
Mail list logo