refer to https://issues.apache.org/jira/browse/HDFS-2827
when do this operation: hadoop fs -mv /a/b / , maybe reappear this issue.
2012/5/10 Alex Levin ale...@gmail.com
Hi,
I have an issue with crashing secondary namenode due to a simple move
operation
Appreciate any ideas on the
the yarn resource manager require a lot of memory as jobtracker?
Hi,
LineRecordWriter.write(..) is synchronized. I did not find any other
RecordWriter implementations define the write as synchronized.
Any specific reason for this.
regards,
sathwik
because we may use multi-threads to write a single file.
On Aug 8, 2013 2:54 PM, Sathwik B P sath...@apache.org wrote:
Hi,
LineRecordWriter.write(..) is synchronized. I did not find any other
RecordWriter implementations define the write as synchronized.
Any specific reason for this.
Hi,
Thanks for your reply.
May I know where does hadoop fork multiple threads to use a single
RecordWriter.
regards,
sathwik
On Thu, Aug 8, 2013 at 7:06 AM, Azuryy Yu azury...@gmail.com wrote:
because we may use multi-threads to write a single file.
On Aug 8, 2013 2:54 PM, Sathwik B P
Hi,
In my company we sometimes uses KVM to launch test or small demo clusters.
Every developer also has a KVM pseudo distributed cluster on its computer.
Nonetheless, I would not recommend using KVM for production clusters.
Check this link about all the theory of Hadoop with
hi,all:
My company need build a 10 node hadoop cluster (2 namenode and
8 datanode node manager ,for both data storage and data analysis ) ,we
have hbase ,hive on the hadoop cluster, 10G data increment per day.
we use CDH4.3 ( for dual - namenode HA),my plan is
its not hadoop forked threads, we may create a line record writer, then
call this writer concurrently.
On Aug 8, 2013 4:00 PM, Sathwik B P sathwik...@gmail.com wrote:
Hi,
Thanks for your reply.
May I know where does hadoop fork multiple threads to use a single
RecordWriter.
regards,
if you want HA, then do you want to deploy journal node on the DN?
On Aug 8, 2013 5:09 PM, ch huang justlo...@gmail.com wrote:
hi,all:
My company need build a 10 node hadoop cluster (2 namenode and
8 datanode node manager ,for both data storage and data analysis ) ,we
have hbase
Hey Jitendra,
I ensured those two things you mentioned, still i'm facing the same issue.
Regards,
Kasa
On Wed, Aug 7, 2013 at 7:32 PM, Jitendra Yadav
jeetuyadav200...@gmail.comwrote:
Hi,
I hope below points might help you.
*Approach 1#*
You need to change the sshd_config file in the
*logs:*
*
*
*
2013-08-08 06:03:51,627 INFO
org.apache.oozie.command.wf.ActionStartXCommand: USER[root] GROUP[-]
TOKEN[] APP[clickstream-wf] JOB[044-130719141217337-oozie-oozi-W]
ACTION[044-130719141217337-oozie-oozi-W@:start:] Start action
[044-130719141217337-oozie-oozi-W@:start:]
While we don't fork by default, we do provide a MultithreadedMapper
implementation that would require such synchronization. But if you are
asking is it necessary, then perhaps the answer is no.
On Aug 8, 2013 3:43 PM, Azuryy Yu azury...@gmail.com wrote:
its not hadoop forked threads, we may
I may be nitpicking here but if perhaps the answer is no then I conclude:
Perhaps the other implementations of RecordWriter are a race condition/file
corruption ready to occur.
On Thu, Aug 8, 2013 at 12:50 PM, Harsh J ha...@cloudera.com wrote:
While we don't fork by default, we do provide a
HI:
I configure the FairScheduler with default settings and my job has 19
reduce tasks. I found that all the reduce tasks are schedule to run in one
node.
While with default FIFO schedule, the 19 reduce tasks are scheduled into
diffrent nodes.
How can I configure FairSchedule to load more
Thanks,
at all files I changed to master (cloud6) and I take off this property
namehadoop.tmp.dir/name.
Felipe
On Wed, Aug 7, 2013 at 3:20 PM, Shekhar Sharma shekhar2...@gmail.comwrote:
Disable the firewall on data node and namenode machines..
Regards,
Som Shekhar Sharma
+91-8197243810
if you have removed this property from the slave machines then your DN
information will be created under /tmp folder and once you reboot your data
node machines, the information will be lost..
Sorry i have not seen the logs..but you dont have play around the
properties..
...see datanode will not
Then is this a bug? Synchronization in absence of any race condition is
normally considered bad.
In any case id like to know why this writer is synchronized whereas the other
one are not.. That is, I think, then point at issue: either other writers
should be synchronized or else this one
keep the configuration same in the datanodes as well for the time
being..Only thing that data node or slave machine should know is Masters
files ( that means who is the master)
and you need to tell the slave machine where is your namenode running,
which you need to specify in the property
Thanks for the hints Shekhar.
My cluster is running well.
Felipe
On Thu, Aug 8, 2013 at 8:56 AM, Shekhar Sharma shekhar2...@gmail.comwrote:
keep the configuration same in the datanodes as well for the time
being..Only thing that data node or slave machine should know is Masters
files ( that
Hi Harsh,
Do you want me to raise a Jira on this.
regards,
sathwik
On Thu, Aug 8, 2013 at 5:23 PM, Jay Vyas jayunit...@gmail.com wrote:
Then is this a bug? Synchronization in absence of any race condition is
normally considered bad.
In any case id like to know why this writer is
Hello Ch Huang,
Do you know this book?
Hadoop Operations http://shop.oreilly.com/product/0636920025085.do
I think, it answers most of the questions in detail.
For a production cluster you should consider MRv1.
And I suggest you, to go with more hard drives per slave node to have a
higher
IO
I would say yes make this a Jira.
The actual change can fall (as proposed by Jay) in two directions: Put
in synchronization
in all implementations OR take it out of all implementations.
I think the first thing to determine is why the synchronization was put
into the LineRecordWriter in the first
sequence writer is also synchronized, I dont think this is bad.
if you call HDFS api to write concurrently, then its necessary.
On Aug 8, 2013 7:53 PM, Jay Vyas jayunit...@gmail.com wrote:
Then is this a bug? Synchronization in absence of any race condition is
normally considered bad.
In
Hi devdoer,
What version are you using?
-Sandy
On Thu, Aug 8, 2013 at 4:25 AM, devdoer bird devd...@gmail.com wrote:
HI:
I configure the FairScheduler with default settings and my job has 19
reduce tasks. I found that all the reduce tasks are schedule to run in one
node.
While with
We are exploring the possibility of hosting Hadoop outside of our data centers.
I am aware that Hadoop in general isn't exactly designed to run on virtual
hardware. So a few questions:
1. Are there any providers out there who would host Hadoop on dedicated
physical hardware?
2. Has anyone had
Well, all depends, because many companies use Cloud Computing
platforms like Amazon EMR. Vmware, Rackscpace Cloud for Hadoop
hosting:
http://aws.amazon.com/elasticmapreduce
http://www.vmware.com/company/news/releases/vmw-mapr-hadoop-062013.html
http://bitrefinery.com/services/hadoop-hosting
Thanks for the list Marcos. I will go through the slides/links. I think that's
helpful
Regards,
Dhaval
From: Marcos Luis Ortiz Valmaseda marcosluis2...@gmail.com
To: Dhaval Shah prince_mithi...@yahoo.co.in
Cc: user@hadoop.apache.org
Sent: Thursday, 8 August
Are there any standard or recommended scripts for the mapred.healthChecker
options in the mapred-site.xml configuration file for a linux box?
-Jeff
Is there a reliable way of converting an HDFS Path object into a String?
Invoking path.toUri().toString() does not work with special characters
(e.g., if there are spaces in the original path string). So, for instance,
in the following example
String address = ...; // Path string without the
Remember that in YARN, the two main responsibilities of the JobTracker is
divided in two different components:
- Resource Management by ResourceManager (this is a global component)
- Job scheduling and monitoring by the NodeManager (this is a per-node
component)
- Resource negotiation and task
Hi,
I start learning about mapreduce with Hadoop by wordcount example. I am a
bit confused about the frontier between the map and reduce program. Is
there a standard format for the map output and the reduce input? Is there
a full explanation of java classes used somewhere? I also appreciate to
so ,from performance aspect i need seperate RN with NN,because each of them
is memory hungry
On Fri, Aug 9, 2013 at 8:00 AM, Marcos Luis Ortiz Valmaseda
marcosluis2...@gmail.com wrote:
Remember that in YARN, the two main responsibilities of the JobTracker is
divided in two different
On 08/08/2013 09:23 PM, John Hancock wrote:
Where else might one put .jar files that a map/reduce job will need?
Why do you need an alternative location? Is there a constraint on being
able to place your library jars under $HADOOP_HOME/lib?
Given that your questions are very broad and at high level, I would suggest
that you should pick up a book or such to go through that. The Hadoop:
Definitive Guide by Tom White is a great book to start with it.
Meanwhile some links to start with:
I suppose I should have been clearer. There's no problem out of box if
people stick to the libraries we offer :)
Yes the LRW was marked synchronized at some point over 8 years ago [1]
in support for multi-threaded maps, but the framework has changed much
since then. The MultithreadedMapper/etc.
You are partially incorrect - NameNode is not an SPOF any longer in
2.x releases. Please look at the docs that cover HA in the release you
use, such as
http://hadoop.apache.org/docs/current/hadoop-yarn/hadoop-yarn-site/HDFSHighAvailabilityWithQJM.html.
RM HA is not yet present, but is incoming.
This is appropriate. You are making use of both disk mounts and have
the directories for each service isolated as well.
On Fri, Aug 9, 2013 at 7:37 AM, ch huang justlo...@gmail.com wrote:
hi,all:
i plan to put DN together with NM, i want to use 2*1TB disk , one
disk mount on /data/1 and
John,
I assume you do not wish to be using the DistributedCache (or a HDFS
location for DistributedCache), which is the most ideal way to ship
jars.
You can place your jars onto the TT classpaths by placing them at an
arbitrary location such as /opt/jars, and editing the TT's
hadoop-env.sh to
I use hadoop-2.0.5, there are MutableCounterLong and MutableCounterLong
class in metrics v2.
I am studing metrics v2 code.
What are difference MutableCounterLong and MutableCounterLong class ?
I find the MutableCounterLong is used to calculate throughput, is that
right? How does the metrics
39 matches
Mail list logo