Thanks Arun Murthy
On Tue, Apr 15, 2014 at 11:32 AM, Arun Murthy wrote:
> Lots of folks use Apache Ambari (http://ambari.apache.org/) to deploy and
> monitor their Hadoop cluster. Ambari uses Ganglia/Nagios as underlying
> technology and has much better UI etc.
>
> hth,
> Arun
>
>
> On Mon, Apr
Lots of folks use Apache Ambari (http://ambari.apache.org/) to deploy and
monitor their Hadoop cluster. Ambari uses Ganglia/Nagios as underlying
technology and has much better UI etc.
hth,
Arun
On Mon, Apr 14, 2014 at 9:08 PM, Shashidhar Rao
wrote:
> Hi,
>
> Can somebody please help me in clari
If you want to use start-all.sh, you need to configure ssh keys or you can
login the target machine to start service
On Tue, Apr 15, 2014 at 11:56 AM, Shashidhar Rao wrote:
> Hi ,
>
> Can somebody please clarify in real production environment with multiple
> nodes in cluster does ssh is impleme
Hi,
Can somebody please explain how to decide on putting nodes in Rack or not.
Say , I have 30 nodes , is there a rule that if the number of nodes reach
certain number then it is better to put those nodes in Rack.
How to decide whether to use Rack or not.
Regards
Shashi
Hi,
I am using cloudera and run mapreduce job written with pig latin, I met
the following exception in a map task:
014-04-15 11:30:39,532 WARN org.apache.hadoop.mapred.Child: Error running child
java.lang.ClassCastException: java.lang.String cannot be cast to
org.apache.pig.data.DataBag
You can just follow any instruction on deploying distributed cluster, just
put several different services on the same host;
Regards,
*Stanley Shi,*
On Tue, Apr 15, 2014 at 12:02 PM, Mohan Radhakrishnan <
radhakrishnan.mo...@gmail.com> wrote:
> Hi,
> I have 2 nodes, one is OSX and the o
Thanks Ted Yu
On Tue, Apr 15, 2014 at 9:37 AM, Ted Yu wrote:
> When Region Servers are co-located with Datanodes, you can utilize short
> circuit read feature.
> See 12.11.2 of http://hbase.apache.org/book.html#perf.hdfs
>
> Factors to consider co-location include the allocation of memory of se
Hi,
Can somebody please help me in clarifying how hadoop cluster is monitored
and profiled in real production environment.
What are the tools and links if any. I heard Ganglia and HPROF.
For HPROF , can somebody share some experience of how to configure to use
HPROF to use with Hadoop
Regards
S
When Region Servers are co-located with Datanodes, you can utilize short
circuit read feature.
See 12.11.2 of http://hbase.apache.org/book.html#perf.hdfs
Factors to consider co-location include the allocation of memory of server
- so that region server and Data node can have ample memory to fulfil
Hi,
I have 2 nodes, one is OSX and the other is linux. How is a
distributed cluster installed in this case ? What other networking
equipment do I need ?
Can I ask for pointers to instructions ? I am new.
Thanks,
Mohan
Hi,
Please somebody clarify how hadoop and hbase are both used in real
production environment.
Do the Region Servers of Hbase can be installed in Hadoop Datanodes or
Region servers are separated from Hadoop data nodes in multiple clusters.
I know it's Hbase centric but still if someone has exper
Hi ,
Can somebody please clarify in real production environment with multiple
nodes in cluster does ssh is implemented with or without it. I have seen
examples where keys are generated and those keys are copied in authorized
files in other nodes to login to other nodes.
Is this the same way done
On Mac, there is no default program to open .msg file.
Can you send in text ?
Cheers
On Mon, Apr 14, 2014 at 8:48 PM, lei liu wrote:
>
>
Thanks stantley shi
On Tue, Apr 15, 2014 at 6:25 AM, Stanley Shi wrote:
> Rough estimation: since word count requires very little computation, it is
> io centric, we can do estimation based on disk speed.
>
> Assume 10 disk with each 100MBps for each node, that is about 1GBps per
> node; assume
Hi,
Is it correct to say that the offline image viewer does not accounts for
any edits that are not yet merged into the fsimage?
Thanks,
Hi,
In the SnapshotDiffReport class
public enum DiffType {
CREATE("+"),
MODIFY("M"),
DELETE("-"),
RENAME("R");
...
If I do a "mv" on a file, in the snapshot diff, it shows as delete of old
name and creation of new name. What constitutes a "RENAME" ?
Thanks,
Manoj
it just combine several file into one file ,no zip happened
On Fri, Apr 11, 2014 at 9:10 PM, Peyman Mohajerian wrote:
> There is: http://hadoop.apache.org/docs/r1.2.1/hadoop_archives.html
> But not sure if it compresses the data or not.
>
>
> On Thu, Apr 10, 2014 at 9:57 PM, Stanley Shi wrote:
>
Rough estimation: since word count requires very little computation, it is
io centric, we can do estimation based on disk speed.
Assume 10 disk with each 100MBps for each node, that is about 1GBps per
node; assume 70% utilization in mapper, we have 700MBps for each node. For
30 nodes, it is total
Add -Dhadoop.root.logger=DEBUG to Something like
HADOOP_resourcemanager_opts in yarn-env.sh
On Tuesday, April 15, 2014, Ashwin Shankar
wrote:
> Hi,
> How do we set log level to debug for lets say only Resource manager
> and not the other hadoop daemons ?
>
> --
> Thanks,
> Ashwin
>
>
>
--
Rega
This quick? 2.4 has just been released for a few weeks.
On Monday, April 14, 2014, Azuryy wrote:
> Hadoop 2.5 would be released on mid May.
>
>
> Sent from my iPhone5s
>
> On 2014年4月14日, at 17:47, lei liu
> >
> wrote:
>
> When is hadoop released?
>
>
>
>
> 2014-04-14 17:04 GMT+08:00 Stanley Shi
Hi,
How do we set log level to debug for lets say only Resource manager
and not the other hadoop daemons ?
--
Thanks,
Ashwin
Thanks Jing,
The Jira is open since Nov 12 but seems a design doc was added just few
days back ...
Would you have any ETA on this ?
Thanks again !
Manoj
On Mon, Apr 14, 2014 at 2:47 PM, Jing Zhao wrote:
> Hi Manoj,
>
> You're right, right now we do not have a complete snapshot
> rollbac
Hi Manoj,
You're right, right now we do not have a complete snapshot
rollback/restore functionality in HDFS. Thus users have to manually
copy/delete files according to the snapshot diff report. There's an open
jira HDFS-4167 for it. We plan to provide this support soon.
Thanks,
-Jing
On Mon
Hi Biswa,
Are you sure that the replication factor of the files are three? Please run a
‘hadoop fsck / -blocks -files -locations’ and see the replication factor for
each file. Also, Post the configuration of
dfs.datanode.du.reserved and please check the real space presented
by a DataNode by
Hi,
It seems the only restore from a HDFS snapshot using hdfs command line is
copy snapshot files to a target path.
If the use cases are
0. stuff ...
1. Take snapshot s_N
2. Add some files, delete other files
3. Take snapshot s_N+1
then copying s_N+1 to target just copies the newly ad
thanks for the response, yep y r right!
sorry i didn't make it clear.
i need this feature throw the java api ??
On 04/15/2014 12:04 AM, Peyman Mohajerian wrote:
hadoop fsck -files -blocks -locations
On Mon, Apr 14, 2014 at 4:43 PM, Alexandros Papadopoulos
mailto:alex.pap...@gmail.com>> wr
hadoop fsck -files -blocks -locations
On Mon, Apr 14, 2014 at 4:43 PM, Alexandros Papadopoulos <
alex.pap...@gmail.com> wrote:
> hi all,
>
> in some cases as hdfs-client, i would like to know the file block path
> in a datanode.
> Is there a way to get a file block path for a datanode ??
>
hi all,
in some cases as hdfs-client, i would like to know the file block
path in a datanode.
Is there a way to get a file block path for a datanode ??
Thanks in advance,
alexpap
Hi,
You can just use put command to load file into HDFS
https://hadoop.apache.org/docs/r0.18.3/hdfs_shell.html#put
Copying files into hdfs won't require mapper or map-reduce job;
It depends on your processing logic ( map-reduce code ) if you really
require to have single merged file.
Also, you ca
Any thoughts ?
On Wed, Apr 9, 2014 at 10:08 AM, Manoj Samel wrote:
> Hi,
>
> If I take HDFS snapshot and then restore is to some other directory using
>
> hdfs dfs -cp /xxx/.snapshot/nnn /aaa/bbb
>
> want to confirm that there is a copy of data from files under snapshot to
> the target director.
Hi,
Can somebody provide me a rough estimate of the time taken in hours/mins
for a cluster of say 30 nodes to run a map reduce job to perform a word
count on say 10 TB of data, assuming that the hardware and the map reduce
program is tuned optimally.
Just a rough estimate, it could be 5TB,10 TB o
You need to differentiate slots from tasks.
Tasks are spawned by TT and assigned to a free slot in the cluster.
The number of map tasks for a Hadoop job is typically controlled by the
input data size and the split size.
The number of reduce tasks for a Hadoop job is controlled by the
*mapreduce.job
Hi,
Can somebody clarify what are map and reduce slots and how Hadoop
calculates these slots. Are these slots calculated based on the number of
splits?
I am getting different answers please help
Regards
Shashidhar
Thank you Dave, I got it. Needed a few other .jars as well (commons-cli and
protobuf-java). But most importantly, the port was wrong. 50070 is for HTTP
access, but using 8020 is correct for direct HDFS access.
Thanks again,
~Roger
From: dlmarion
Sent: Friday,
-- Forwarded message --
From: Mahesh Khandewal
Date: Mon, 14 Apr 2014 08:42:16 +0530
Subject: Re: Changing default scheduler in hadoop
To: user@hadoop.apache.org
Cc: Ekta Agrawal ,
"common-u...@hadoop.apache.org" ,
"hdfs-u...@hadoop.apache.org"
Hi i have patch file of Resource Aw
Also, "Source Compatibility" also means ONLY a recompile is needed.
No code changes should be needed.
On Mon, Apr 14, 2014 at 10:37 AM, John Meagher wrote:
> Source Compatibility = you need to recompile and use the new version
> as part of the compilation
>
> Binary Compatibility = you can take s
Source Compatibility = you need to recompile and use the new version
as part of the compilation
Binary Compatibility = you can take something compiled against the old
version and run it on the new version
On Mon, Apr 14, 2014 at 9:19 AM, Radhe Radhe
wrote:
> Hello People,
>
> As per the Apache s
unsubscribe
1. In real production environment do we copy these 10 files in hdfs under a
folder one by one. If this is the case then how many mappers do we specify
10 mappers. And do we use put command of hadoop to transfer this file.
Ans: This will depend on what you want to do with files. There is no rule
wh
Hi,
Please can somebody clarify my doubts. Say. I have a cluster of 30 nodes
and I want to put the files in HDFS. And all the files combine together the
size is 10 TB but each file is roughly say 1GB only and the total number
of files 10 files
1. In real production environment do we copy these 1
Hello People,
As per the Apache site
http://hadoop.apache.org/docs/r2.3.0/hadoop-mapreduce-client/hadoop-mapreduce-client-core/MapReduce_Compatibility_Hadoop1_Hadoop2.html
Binary CompatibilityFirst, we ensure binary compatibility
to the applications that use old mapred APIs.
Sorry for the error. Did not have a proper look at the logs. This seems to
be a JT issue. Ignore the previous email.
Thanks
Divye Sheth
On Apr 14, 2014 6:06 PM, "divye sheth" wrote:
> This usually occurs when the task takes more memory and exceeds its heap
> space. You can increase the memory of
This usually occurs when the task takes more memory and exceeds its heap
space. You can increase the memory of the tasks by setting a property in
mapred-site.xml
Mapred.child.opts specify the -Xmx to a higher value.
P.S the property name might not be correct, you may look up the
mapred-default.xm
Hi,
Try setting this property in hdfs-site.xml on the destination cluster.
dfs.datanode.max.xcievers
4096
4096 would work if needed you may increase this to a higher number. A
restart would be required in this case. Also make sure you have the ulimit
set to a high number as had
unsubscribe
2014-04-12 23:02 GMT+08:00 Viswanathan J :
> Hi,
>
> I'm using Hadoop v1.2.1 and it is running fine so long(3 months) without
> any issues.
>
> Suddenly I got the below error in Jobtracker and jobs are failed to run.
>
> Is this issue in JT or TT or Jetty issue?
>
> 2014-04-12 02:13:
If by resetting the list of dead datanodes you mean the web-console or the
report command not showing the datanode that you removed, in this case you
will have to do the following:
1. Remove the entry from the slaves file corresponding to the dead datanode.
2. Remove entry from exclude file
3. Run
Hi Viswanathan,
this looks like your job history is full, and is filling up your jobtracker
heap:
> 2014-04-12 02:25:47,963 ERROR org.apache.hadoop.mapred.JobHistory: Unable to
> move history file to DONE canonical subfolder.
> java.lang.OutOfMemoryError: Java heap space
Have you set any value
Seems like an ordinary Linux file permission thing.
Are you logged in as user 'software'
Does your user have permission to create dirs in /home/software?
/th
On Mon, 2014-04-14 at 16:12 +0800, EdwardKing wrote:
> I want to use hive in hadoop2.2.0, so I execute following steps:
>
> [hadoop@mast
Hadoop 2.5 would be released on mid May.
Sent from my iPhone5s
> On 2014年4月14日, at 17:47, lei liu wrote:
>
> When is hadoop released?
>
>
>
>
> 2014-04-14 17:04 GMT+08:00 Stanley Shi :
>> Please find it in this page: https://wiki.apache.org/hadoop/Roadmap
>>
>> hadoop 2.3.0 only include
When is hadoop released?
2014-04-14 17:04 GMT+08:00 Stanley Shi :
> Please find it in this page: https://wiki.apache.org/hadoop/Roadmap
>
> hadoop 2.3.0 only include "phase 1" of the heterogeneous storage; "phase
> 2" will be included in 2.5.0;
>
> Regards,
> *Stanley Shi,*
>
>
>
> On Mon, Ap
Please find it in this page: https://wiki.apache.org/hadoop/Roadmap
hadoop 2.3.0 only include "phase 1" of the heterogeneous storage; "phase 2"
will be included in 2.5.0;
Regards,
*Stanley Shi,*
On Mon, Apr 14, 2014 at 4:38 PM, ascot.m...@gmail.com
wrote:
> hi,
>
> From 2.3.0
> 20 February,
hi,
From 2.3.0
20 February, 2014: Release 2.3.0 available
Apache Hadoop 2.3.0 contains a number of significant enhancements such as:
Support for Heterogeneous Storage hierarchy in HDFS.
Is it already there?
Ascot
On 14 Apr, 2014, at 4:34 pm, lei liu wrote:
> On April 11 hadoop-2.4 is rele
On April 11 hadoop-2.4 is released, the hadoop-2.4 does not include
heterogeneous storages function, when does hadoop include the function?
Thanks,
LiuLei
I want to use hive in hadoop2.2.0, so I execute following steps:
[hadoop@master /]$ tar �Cxzf hive-0.11.0.tar.gz
[hadoop@master /]$ export HIVE_HOME=/home/software/hive
[hadoop@master /]$ export PATH=${HIVE_HOME}/bin:${PATH}
[hadoop@master /]$ hadoop fs -mkdir /tmp
[hadoop@master /]$ hadoop fs -m
You can find the reduce container from RM's web page.
BTW: from above log, you can check if application master crashes.
On Mon, Apr 14, 2014 at 3:12 PM, Rahul Singh wrote:
> how do i identify an reduce container? there are multiple container dirs
> in my application id folder in userlogs.
>
>
>
how do i identify an reduce container? there are multiple container dirs in
my application id folder in userlogs.
On Mon, Apr 14, 2014 at 12:29 PM, Gordon Wang wrote:
> Hi Rahul,
>
> What is the log of reduce container ? Please paste the log and we can see
> the reason.
>
>
> On Mon, Apr 14, 20
I cleaned up the log directory before running the job. Now there is no
nodemanager jobs. when i see in userlogs directory i am getting some syslog
files with the following error:
2014-04-14 11:58:23,472 INFO [main] org.apache.hadoop.ipc.Client: Retrying
connect to server: poc-hadoop06/127.0.1.1:40
You can add the hostname/IP in exclude file and run this command
hadoop dfsadmin -refreshNodes.
On Mon, Apr 14, 2014 at 11:34 AM, Stanley Shi wrote:
> I believe there's some command to show list of datanodes from CLI, using
> parsing HTML is not a good idea. HTML page is intended to be read by
Please check your logs directory usage.
On Mon, Apr 14, 2014 at 12:08 PM, Biswajit Nayak
wrote:
> Whats the replication factor you have? I believe it should be 3. hadoop
> dus shows that disk usage without replication. While name node ui page
> gives with replication.
>
> 38gb * 3 =114gb ~ 1TB
Hi Rahul,
What is the log of reduce container ? Please paste the log and we can see
the reason.
On Mon, Apr 14, 2014 at 2:38 PM, Rahul Singh wrote:
> Hi,
> I am running a job(wordcount example) on 3 node cluster(1 master and 2
> slave), some times the job passes but some times it fails(as red
63 matches
Mail list logo