Hi,
As I suspected, cache files are symlinked after a child JVM is
started: TaskRunner.setupWorkDir is being called from
org.apache.hadoop.mapred.Child.main.
This is unfortunate as it makes impossible to leverage distributed
cache for the purpose of deploying JVM agents. I could submit a jira
Hi all.
I'm writing to you to ask for advice or a hint to the right direction.
In our department, more and more researchers ask us (IT administrators)
to assemble (or to buy) GPGPU powered workstations to do parallel computing.
As I already manage a small CPU cluster (resources managed using
Thanks for replies.
Finally after trying many ways to resolve problem, e.g.:
- Number open files for mapreduce user
- xcievers and handler threads number options in datanode
- And others
Problem was obvious from apache wiki page below - there was a lack of disk
space.
But it was about 100-200Gb
Thanks a lot!
That was it. There was following line in our code:
jobConf.setKeepTaskFilesPattern(.*);
On Fri, Jan 11, 2013 at 2:20 PM, Hemanth Yamijala yhema...@thoughtworks.com
wrote:
Hmm. Unfortunately, there is another config variable that may be affecting
this: keep.task.files.pattern
Hi,
My log files are generated and saved in a windows machine.
Now I have to move those remote files to the Hadoop cluster (HDFS)
either in synchronous or asynchronous way.
I have gone through flume (Various source types) but was not helpful.
Please suggest whether there
I am not sure you actually test if the file exists in your code:
System.out.println(File exists : + theXMLFile.length());
This is really only the file path.
You could try to load the file to make sure:
File file = new File(theXMLFile);
// Test if exists
// ...
// Load in ressource, I had
ftp auto upload?
2013/1/17 Mahesh Balija balijamahesh@gmail.com:
the Hadoop cluster (HDFS) either in synchronous or asynchronou
Sunil Sharma
Please do not print this email unless it is absolutely necessary
MapR was the first vendor to remove the NN as a SPOF.
They did this w their 1.0 release when it first came out. The downside is that
their release is proprietary and very different in terms of the underlying
architecture from Apace based releases.
Horton works relies on VMware as a key piece of
Yes. It is possible. I haven't tries windows+flume+hadoop combo
personally, but it should work. You may find this
linkhttp://mapredit.blogspot.in/2012/07/run-flume-13x-on-windows.htmluseful.
Alex
has explained beautifully how to run Flume on a windows box.If I
get time i'll try to simulate your
That link talks about just installing Flume on Windows machine (NOT even
have configs to push logs to the Hadoop cluster), but what if I have to
collect logs from various clients, then I will endup installing in all
clients.
I have installed Flume successfully on Linux but I have to configure it
One approach I used in my lab was the data-gateway,
which is a small linux box which just mounts Windows Shares
and a single flume node on the gateway corresponds to the
HDFS cluster. With tail or periodic log rotation you have control
over all logfiles, depending on your use case. Either grab all
Hi everyone,
I am using Hadoop 1.0.3.
I write logs to an Hadoop sequence file into HDFS, I call syncFS() after
each bunch of logs but I never close the file (except when I am performing
daily rolling).
What I want to guarantee is that the file is available to readers while the
file is still
Roberto Nunnari wrote:
Hi all.
I'm writing to you to ask for advice or a hint to the right direction.
In our department, more and more researchers ask us (IT administrators)
to assemble (or to buy) GPGPU powered workstations to do parallel
computing.
As I already manage a small CPU cluster
Randy,
A very slow NFS is certainly troublesome, and slows down the write
performance for every edit waiting to get logged to disk (there's a
logSync_avg_time metric that could be monitored for this), and therefore a
dedicated NFS mount is required if you are unwilling to use the proper
HA-HDFS
Just a tip: Better to share whole error messages and stacktraces when
seeking out help.
On Thu, Jan 17, 2013 at 1:52 PM, Stuti Awasthi stutiawas...@hcl.com wrote:
Hi,
Issue is fixed at my end. I didn’t used patched jar provided by Todd.
Configured EclipsePlugin with correct
Hi Vikas,
You might want to check your logs. MR can generate huge logs depending
on what you are logging, and they are not on the DFS. The are on
non-dfs. If it's comming from there, you can change the loglevel to
reduce the size of the output.
On my own cluster, I turned the logs to debug and
Hello Glen,
Pl find my comments embedded below :
1.) The Standalone Operation (http://hadoop.apache.org/**
docs/r1.1.1/single_node_setup.**html#Localhttp://hadoop.apache.org/docs/r1.1.1/single_node_setup.html#Local),
just to confirm, can run without any DFS filesystem? (We're not being
What is the amount of data you are attempting to crunch in one MR job? Note
that Map intermediate outputs are written to disk before being sent to
reducers and this counts for non-DFS usage. So to say grossly, if your
input is 14 GB, you surely need more than 2 or 3 x 14G free space overall
to do
Can someone help me how to unsubscribe from this group?
I dropped a mail to unsubscribe mailing list still receiving mails
Did you follow the instructions given on the same page you used to
subscribe? http://hadoop.apache.org/mailing_lists.html#User
On 01/17/2013 09:33 AM, Siva Gudavalli wrote:
Can someone help me how to unsubscribe from this group?
I dropped a mail to unsubscribe mailing list still receiving
Hi,
I have recently installed hadoop-1.0.4 on a linux machine. Whilst working
through the post-install instructions contained in the “Quick Start” guide, I
incurred the following catastrophic Java runtime error (See below). I have
attached the error report file “hs_err_pid24928.log”. I
Hey, thanks for that!:)
On Thu, Jan 17, 2013 at 2:01 AM, Harsh J ha...@cloudera.com wrote:
The patch has not been contributed yet. Upstream at open-mpi there does
seem to be a branch that makes some reference to Hadoop, but I think the
features are yet to be made available there too.
Try usign Sun(Oracle now) Java6.
Warm Regards,
Tariq
https://mtariq.jux.com/
cloudfront.blogspot.com
On Thu, Jan 17, 2013 at 8:22 PM, Jean-Marc Spaggiari
jean-m...@spaggiari.org wrote:
Hi Sean,
This is an issue with your JVM. Not related to hadoop.
Which JVM are you using, and can you
Hi,
My Java version is
java version 1.6.0_25
Java(TM) SE Runtime Environment (build 1.6.0_25-b06)
Java HotSpot(TM) Client VM (build 20.0-b11, mixed mode, sharing)
Would you advise obtaining a later Java version?
Sean
-Original Message-
From: Jean-Marc Spaggiari
Sent: Thursday,
No, Java 6 is fine and preferable.
Warm Regards,
Tariq
https://mtariq.jux.com/
cloudfront.blogspot.com
On Thu, Jan 17, 2013 at 8:26 PM, Sean Hudson sean.hud...@ostiasolutions.com
wrote:
Hi,
My Java version is
java version 1.6.0_25
Java(TM) SE Runtime Environment (build 1.6.0_25-b06)
Good catch with that string.length() - you're right, that was a silly
mistake. --- sorry - im not sure what i was thinking. it was a late night
:)
In any case, the same code with file.exists() fails... i've validated that
path many ways.
On a broader note: Shouldn't the Configuration class
On 01/17/2013 08:58 AM, Mohammad Tariq wrote:
Hello Glen,
Pl find my comments embedded below :
1.) The Standalone Operation
(http://hadoop.apache.org/docs/r1.1.1/single_node_setup.html#Local),
just to confirm, can run without any DFS filesystem? (We're not being
asked to run
You can control whr to store hdfs data set path of following property in
core-site.xml
also u can refer to Yahoo tutorial
http://developer.yahoo.com/hadoop/tutorial/module2.html
configuration
property
namefs.default.name/name
valuehdfs://*your.server.name.com*:9000/value
/property
Hello Glen,
That is the default behavior. I is advisable to include hadoop.tmp.dir
property in your core-site.xml file and dfs.name.dirdfs.data.dir
properties in your hdfs-site.xml in order to avoid any problem when you
reboot the machine. Looks like you have already achieved the goal. But,
Hi!
We have following problem.
There are three target hosts to send metrics: 192.168.1.111:8649,
192.168.1.113:8649,192.168.1.115:8649 (node01, node03, node05).
But for example datanode (using
org.apache.hadoop.metrics2.sink.ganglia.GangliaSink31) sends one metrics to
first target host and the
imo, as long as the javadoc is clear enough, any behavior is ok.
What's written in current version is:
name - resource to be added, the classpath is examined for a file with that
name.
So nothing in this javadoc lets you believe that the file exists or will be
loaded. Also you could make sure
Hi!
I'm trying to set dmax value for metrics hadoop sends to ganglia. Our HDFS
version uses metrics2 context so I tried approach from here:
https://svn.apache.org/repos/asf/hadoop/common/branches/HDFS-1073/common/conf/hadoop-metrics2.properties
But It didn't work for me. Also there are examples
I agree, but the fact that the configuration is doing the loading, means (i
think) that it should (at least) do some error handling for that loading,
correct?
On Thu, Jan 17, 2013 at 10:36 AM, Julien Muller julien.mul...@ezako.comwrote:
imo, as long as the javadoc is clear enough, any behavior
Hi!
There is fixed issue in hadoop saying: jvm metrics all use the same
namespace - https://issues.apache.org/jira/browse/HADOOP-7507
I was able to apply this fix in our cluster using following line in
hadoop-metrics2.properties:
datanode.sink.ganglia.tagsForPrefix.jvm=*
So now I could see
unsubscribe
https://www.google.ca/search?q=unsubscribe+hadoop+mailing+list
Just follow the first link...
2013/1/17, Ignacio Aranguren iaran...@nd.edu:
unsubscribe
I don't think running hadoop on a GPU cluster is a common use case; the
types of workloads for a hadoop vs. gpu cluster are very different although
a quick google search did turn up some. So this is probably not the best
mailing list for your question.
J
On Thu, Jan 17, 2013 at 5:18 AM, Roberto
Since this is a Hadoop question, it should be sent
user@hadoop.apache.org (which I'm now sending this to and I put
user@hbase in BCC).
J-D
On Thu, Jan 17, 2013 at 9:54 AM, Brennon Church bren...@getjar.com wrote:
Hello,
Is there a way to throttle the speed at which under-replicated blocks are
Hi Dhanasekaran,
The issue is not with Hadoop streaming. You can try this yourself:
On your local disk, touch a bunch of files, like this:
mkdir stream
cd stream
touch 1 2 3 4 5 6 7 8 9 9 10
Then, put the files into HDFS:
hadoop fs -put stream stream
Now, put a unix sleep command into a
Use Sun/Oracle 1.6.0_32+ Build should be 20.7-b02+
1.7 causes failure and AFAIK, not supported, but you are free to try the
latest version and report back.
-Original Message-
From: Sean Hudson [mailto:sean.hud...@ostiasolutions.com]
Sent: Thursday, January 17, 2013 6:57 AM
To:
You can limit the bandwidth in bytes/second values applied
via dfs.balance.bandwidthPerSec in each DN's hdfs-site.xml. Default is 1
MB/s (1048576).
Also, unsure if your version already has it, but it can be applied at
runtime too via the dfsadmin -setBalancerBandwidth command.
On Thu, Jan 17,
hi,
it might be a stupid question but I really have no answer.
From FSEditLog.java, it only supports 14 operations. So, if one file size
is changed(e.g. new content is written in), FSEditLog won't log these
change?
If not, when the node is down, how to restore these information in namenode?
Do you know what causes 1.7 to fail? I am running 1.7 and so far have
not done whatever it takes to make it fail.
On 1/17/2013 1:46 PM, Leo Leung wrote:
Use Sun/Oracle 1.6.0_32+ Build should be 20.7-b02+
1.7 causes failure and AFAIK, not supported, but you are free to try the
latest
Oi, You might want to report your version(s) and the eco-system that you've
validated to the community.
see http://wiki.apache.org/hadoop/HadoopJavaVersions
JDK 1.6.0_32 to .38 seems safe
-Original Message-
From: Chris Mawata [mailto:chris.maw...@gmail.com]
Sent: Thursday, January
Hi,
I am back with my original problem. I am trying to bootstrap child
JVM via -javaagent. I am doing what Harsh and Arun suggested, which
also agrees with the documentation.
In theory this should work, but it doesn't. Any ideas before I start
digging into the code? Thanks.
Here is the
Hi,
I've found MRUnit a very easy to unit test jobs, is it possible as well to
test mappers reading data from DisributedCache? If yes, can you share an
example how the test' setup() should look like?
Thanks.
I wouldn't call it validation since all I am running are examples. it is
more likely to be I haven't yet run into the brick wall that
others have already found!
On 1/17/2013 2:30 PM, Leo Leung wrote:
Oi, You might want to report your version(s) and the eco-system that you've
validated to the
That doesn't seem to work for under-replicated blocks such as when
decommissioning (or losing) a node, just for the balancer. I've got
mine currently set to 10MB/s, but am seeing rates of 3-4 times that
after decommissioning a node while it works on bringing things back up
to the proper
Not true per the sources, it controls all DN-DN copy/move rates, although
the property name is misleading. Are you noticing a consistent rise in the
rate or is it spiky?
On Fri, Jan 18, 2013 at 2:20 AM, Brennon Church bren...@getjar.com wrote:
That doesn't seem to work for under-replicated
Pretty spiky. I'll throttle it back to 1MB/s and see if it reduces
things as expected.
Thanks!
--Brennon
On 1/17/13 1:41 PM, Harsh J wrote:
Not true per the sources, it controls all DN-DN copy/move rates,
although the property name is misleading. Are you noticing a
consistent rise in the
One reason (for spikes) may be that the throttler actually runs
periodically (instead of controlling the rate at source, we detect and
block work if we exceed limits, at regular intervals). However, this period
is pretty short so it generally does not cause any ill effects on the
cluster.
On
Some of the unit tests fail with 1.7. HDFS and MR mostly work OK, but
if you run into problems with 1.7 the first question will be does it
work in 1.6?.
-andy
On Thu, Jan 17, 2013 at 11:19 AM, Chris Mawata chris.maw...@gmail.com wrote:
Do you know what causes 1.7 to fail? I am running 1.7 and
And what's happening if you give the complete path, say
hdfs://your_namenode:9000/user/hduser/data/input1.txt ??
Warm Regards,
Tariq
https://mtariq.jux.com/
cloudfront.blogspot.com
On Fri, Jan 18, 2013 at 5:36 AM, jamal sasha jamalsha...@gmail.com wrote:
No.
Its not working. :( same error.
One very ugly way to confirm that its a problem with your config is to add
the config properties in code.
The problem with the Configuration object is it doesnt tell you if the path
to the file is bad :( I really beleive this should be changed because it is
a major cause of frustration.
Worst
Thanks -- that would explain why I have not got into trouble yet as I am
only using
MR and HDFS.
On 1/17/2013 5:40 PM, Andy Isaacson wrote:
Some of the unit tests fail with 1.7. HDFS and MR mostly work OK, but
if you run into problems with 1.7 the first question will be does it
work in 1.6?.
Hadoop streaming can do this, and there's been some discussion in the past,
but it's not a core use case. Check the list archives.
Russell Jurney http://datasyndrome.com
On Jan 17, 2013, at 9:25 AM, Jeremy Lewi jer...@lewi.us wrote:
I don't think running hadoop on a GPU cluster is a common use
I'm thinking 'Downfall'
But I could be wrong.
On Jan 17, 2013, at 6:56 PM, Yongzhi Wang wang.yongzhi2...@gmail.com wrote:
Who can tell me what is the name of the original film? Thanks!
Yongzhi
On Thu, Jan 17, 2013 at 3:05 PM, Mohammad Tariq donta...@gmail.com wrote:
I am sure you will
Which Hadoop version are you using? That post is quite old.
Try the same same thing using the new API. Also, modify the
above 2 lines to :
conf.addResource(new
File(/hadoop/projects/hadoop-1.0.4/conf/core-site.xml).getAbsoluteFile().toURI().toURL());
conf.addResource(new
You are right Michael, as always :)
Warm Regards,
Tariq
https://mtariq.jux.com/
cloudfront.blogspot.com
On Fri, Jan 18, 2013 at 6:33 AM, Michael Segel michael_se...@hotmail.comwrote:
I'm thinking 'Downfall'
But I could be wrong.
On Jan 17, 2013, at 6:56 PM, Yongzhi Wang
Hello!
Is common to see this sentence: Hadoop Scales Linearly. But, is there any
performance evaluation to confirm this?
In my evaluations, Hadoop processing capacity scales linearly, but not
proportional to number of nodes, the processing capacity achieved with 20
nodes is not the double of the
I've seen some academic researches on this direction, with good results.
Some computations can be expressed by GPGPU, but it is still a restrict
number of cases. If is not easy to solve problems using MapReduce, solve
some problems with SIMD is harder.
--
Thiago Vieira
On Thu, Jan 17, 2013 at
Please add -server in front of your JVM options
just a curiosity: why not using 64-bit version ?
发件人: Sean Hudson [sean.hud...@ostiasolutions.com]
发送时间: 2013年1月17日 22:56
收件人: user@hadoop.apache.org
主题: Re: Problems
Hi,
My Java version is
java
Hi,
Just saw your email. I was so tired with this issue that the moment it
ran, I took a time off. I will get back to you soon :)
thanks
On Thu, Jan 17, 2013 at 5:04 PM, Mohammad Tariq donta...@gmail.com wrote:
Which Hadoop version are you using? That post is quite old.
Try the same same
Not an issue :)
Warm Regards,
Tariq
https://mtariq.jux.com/
cloudfront.blogspot.com
On Fri, Jan 18, 2013 at 9:38 AM, jamal sasha jamalsha...@gmail.com wrote:
Hi,
Just saw your email. I was so tired with this issue that the moment it
ran, I took a time off. I will get back to you soon :)
I missed the key information: The servers are *Amazon EC2* *M1 Medium
Instance*
2013/1/18 yaotian yaot...@gmail.com
Hi,
*=My machine environment:*
1 master 1 CPU core, 2G Mhz, 1G Memory
2 Slaves(datanode): 1 CPU core, 2G Mhz, 4G memory
hadoop: hadoop-0.20.205.0
*= My data:*
User GPS
Hi,
Not sure how to do it using MRUnit, but should be possible to do this using
a mocking framework like Mockito or EasyMock. In a mapper (or reducer),
you'd use the Context classes to get the DistributedCache files. By mocking
these to return what you want, you could potentially run a true unit
Hello Sameer,
Pl find my comments embedded below :
Warm Regards,
Tariq
https://mtariq.jux.com/
cloudfront.blogspot.com
On Fri, Jan 18, 2013 at 11:21 AM, Sameer Jain
sameer.j...@evalueserve.comwrote:
Hi,
I am trying to understand the different data analysis algorithms available
in
Note: You are running/asking about a pseudo-distributed mode, not
'standalone' exactly. Standalone does not have a running HDFS and uses the
local filesystem for MR execution.
On Fri, Jan 18, 2013 at 11:42 AM, yiyu jia jia.y...@gmail.com wrote:
Hi,
I tried to run hadoop in standalone mode
What are your number of map and reduce slots configured to, per node? Also
noticed you seem to be requesting 4 GB memory from Reducers when your
slaves' maximum RAM itself nears that - the result may not be so good here
and can certainly cause slowdowns (due to swapping/etc.).
On Fri, Jan 18,
71 matches
Mail list logo