Can anyone explain the categories in block scanner report in detail?
On Tue, Oct 22, 2013 at 5:59 AM, Rita rmorgan...@gmail.com wrote:
anyone?
On Sun, Oct 20, 2013 at 9:56 AM, Rita rmorgan...@gmail.com wrote:
I have asked this question elsewhere and haven't gotten the answers.
Perhaps, I
anyone?
On Sun, Oct 20, 2013 at 9:56 AM, Rita rmorgan...@gmail.com wrote:
I have asked this question elsewhere and haven't gotten the answers.
Perhaps, I asked it the wrong way or forum.
I have a 40+ node cluster and I would like to make sure the data node
scanning is aggressively done. I
I have asked this question elsewhere and haven't gotten the answers.
Perhaps, I asked it the wrong way or forum.
I have a 40+ node cluster and I would like to make sure the data node
scanning is aggressively done. I know 8192kb/sec is hard coded but was
wondering if there was a better way to keep
at dfs.heartbeat.interval and
dfs.namenode.heartbeat.recheck-interval
40 datanodes is not a large cluster IMHO and the Namenode is capable of
managing 100 times more datanodes.
From: Rita rmorgan...@gmail.com
To: common-user@hadoop.apache.org common-user
at 10:50 AM, Ravi Prakash ravi...@ymail.com wrote:
Rita!
14-16 Tb is perhaps a big node. Even then the scalability limits of the
Namenode in your case would depend on how many files (more accurately how
many blocks) there are on HDFS.
In any case, if you want the datanodes to be marked dead
I would like my 40 data nodes to aggressively report to namenode if they
are alive or not therefore I think I need to change these params
dfs.block.access.token.lifetime : Default is 600 seconds. Can I decrease
this to 60?
dfs.block.access.key.update.interval: Default is 600 seconds. Can I
any thought?
On Wed, Mar 13, 2013 at 7:17 PM, Rita rmorgan...@gmail.com wrote:
i am planning to build a hdfs cluster primary for streaming large files
(10g avg size). I was wondering if anyone can recommend a good hardware
vendor.
--
--- Get your facts first, then you can distort them
, or as a benchmark to
measure the maximum?
The JMX page nn:port/jmx provides some interesting stats, but I'm not
sure they have what you want. And I'm unaware of other tools which could.
From: Rita rmorgan...@gmail.com
To: common-user
Anyone?
On Sun, Oct 21, 2012 at 8:30 AM, Rita rmorgan...@gmail.com wrote:
Hi,
Was curious if there was a method to measure the total number of IOPS (I/O
operations per second) on a HDFS cluster.
--
--- Get your facts first, then you can distort them as you please.--
--
--- Get
Is it possible to know how many reads and writes are occurring thru the
entire cluster in a consolidated manner -- this does not include
replication factors.
On Mon, Oct 22, 2012 at 10:28 AM, Ravi Prakash ravi...@ymail.com wrote:
Hi Rita,
SliveTest can help you measure the number of reads
out of curiosity, what does running HDFS give you when running thru an
Isilon cluster?
On Wed, Oct 17, 2012 at 3:59 PM, Mohit Anchlia mohitanch...@gmail.comwrote:
Look at the directory permissions?
On Wed, Oct 17, 2012 at 12:18 PM, Artem Ervits are9...@nyp.org wrote:
Anyone using Hadoop
thanks for the advise.
Before I push or pull. Are there any tests I can run before I do the
distCP. I am not 100% sure if I have my webhdfs setup properly.
On Fri, Oct 12, 2012 at 1:01 PM, J. Rottinghuis jrottingh...@gmail.comwrote:
Rita,
Are you doing a push from the source cluster
nvermind. Figured it out.
On Fri, Oct 12, 2012 at 3:20 PM, kojie.fu kojie...@gmail.com wrote:
kojie.fu
From: Rita
Date: 2012-10-13 03:19
To: common-user
Subject: Re: distcp question
thanks for the advise.
Before I push or pull. Are there any tests I can run before I do the
distCP
Does Hadoop, HDFS in particular, do any sanity checks of the file before
and after balancing/copying/reading the files? We have 20TB of data and I
want to make sure after these operating are completed the data is still in
good shape. Where can I read about this?
tia
--
--- Get your facts first,
, to prevent bit rod, blocks are checked periodically (weekly by
default, I believe, you can configure that period) in the background.
Kai
Am 25.06.2012 um 13:29 schrieb Rita:
Does Hadoop, HDFS in particular, do any sanity checks of the file before
and after balancing/copying/reading the files
.
P.s. A better solution would be to make your job not take as many
days, somehow? :-)
On Fri, May 11, 2012 at 4:13 PM, Rita rmorgan...@gmail.com wrote:
I have a rather large map reduce job which takes few days. I was
wondering
if its possible for me to freeze the job or make the job less
Is it possible to get pretty URLs when doing HDFS file browsing via web
browser?
--
--- Get your facts first, then you can distort them as you please.--
In the hdfs-site.xml file what argument do I need to set for client
retries? Also, what is the default parameter?
--
--- Get your facts first, then you can distort them as you please.--
My replication factor is 3 and if I were reading data thru libhdfs using C
is there a retry method? I am reading a 60gb file and what would will
happen if a rack goes down and the next block isn't available? Will the API
retry? is there a way t configuration this option?
--
--- Get your facts
on your
replication factor.
A block can not be available too due to corruption, and in this case,
it can be replicated to other live machines and fix the error with
the fsck utility.
Regards
On 3/18/2012 9:46 AM, Rita wrote:
My replication factor is 3 and if I were reading data thru
Running CDH U1 and using Hbase. So far I have been extremely happy with the
exception of Python support. Currently, I am using thrift but I suspect
there are some major features missing in it such as RegExFilters. If this
is the case, will there ever be a native Python client for Hbase?
--
---
taken has been to keep data that is accessed
repeatedly and fits in memory in some other system
(hbase/cassandra/mysql/whatever).
Edward
On Mon, Jan 16, 2012 at 11:33 AM, Rita rmorgan...@gmail.com wrote:
Thanks. I believe this is a good feature to have for clients especially
if
you
in coordination with Facebook. I don't believe it
has been published quite yet, but the title of the project is PACMan
-- I expect it will be published soon.
-Todd
On Sat, Jan 14, 2012 at 5:30 PM, Rita rmorgan...@gmail.com wrote:
After reading this article,
http://www.cloudera.com/blog/2012/01
After reading this article,
http://www.cloudera.com/blog/2012/01/caching-in-hbase-slabcache/ , I was
wondering if there was a filesystem cache for hdfs. For example, if a large
file (10gigabytes) was keep getting accessed on the cluster instead of keep
getting it from the network why not storage
yes, something different from that. To my knowledge, DistributedCache is
only for Mapreduce.
On Sat, Jan 14, 2012 at 8:33 PM, Prashant Kommireddi prash1...@gmail.comwrote:
You mean something different from the DistributedCache?
Sent from my iPhone
On Jan 14, 2012, at 5:30 PM, Rita rmorgan
Is there a tool or a method to measure the throughput of the cluster at a
given time? It would be a great feature to add
--
--- Get your facts first, then you can distort them as you please.--
Yes, I think they can graph it for you. However, I am looking for raw data
because I would like to create something custom
On Thu, Dec 22, 2011 at 8:19 AM, alo alt wget.n...@googlemail.com wrote:
Rita,
ganglia give you a throughput like Nagios. Could that help?
- Alex
On Thu, Dec 22
Hello,
I am working on writing a process (bash) which can attach to the namenode
and listen to the RPCs. I am interested in, what files are hot and who is
reading the data.
Currently, I am using the namenode logs to gather this data but was
wondering if I can attach to the hadoop/hdfs port and
I second Vinod´s idea. Get the latest stable from Cloudera. Their binaries
are near perfect!
On Tue, Dec 6, 2011 at 1:46 PM, T Vinod Gupta tvi...@readypulse.com wrote:
Saurabh,
Its best if you go through the hbase book - Lars George's book HBase the
Definitive Guide.
Your best bet is to
Hello,
I am using hbase and I have a default replication factor of 2. Now, if I
change the directory replication factor will all the new files being
created there be automatically be replicated as 3?
--
--- Get your facts first, then you can distort them as you please.--
you can reconstruct and which you
don't need to read really fast, but not good enough for data whose loss
will get you fired.
On Mon, Nov 7, 2011 at 7:34 PM, Rita rmorgan...@gmail.com wrote:
I have been running with 2x replication on a 500tb cluster. No issues
whatsoever. 3x is for super
For a 1PB installation you would need close to 170 servers with 12 TB disk
pack installed on them (with replication factor of 2). Thats a conservative
estimate
CPUs: 4 cores with 16gb of memory
Namenode: 4 core with 32gb of memory should be ok.
On Fri, Oct 21, 2011 at 5:40 PM, Steve Ed
, 2011, Rita rmorgan...@gmail.com wrote:
For a 1PB installation you would need close to 170 servers with 12 TB
disk pack installed on them (with replication factor of 2). Thats a
conservative estimate
CPUs: 4 cores with 16gb of memory
Namenode: 4 core with 32gb of memory should be ok
Why ?
The beauty of hadoop is its OS agnostic. What is your native operating
system? I am sure you have a version of JDK and JRE running there.
On Tue, Nov 1, 2011 at 4:53 AM, Masoud mas...@agape.hanyang.ac.kr wrote:
Hi
Anybody ran hadoop on cygwin for development purpose???
Did you have
I would like to know if and ever if HDFS RAID (
http://wiki.apache.org/hadoop/HDFS-RAID) will ever get into mainline. This
is would be an extremely useful feature for many sites especially larger
ones. The saving on storage will be noticeable. I haven really seen any
progress in
What is the correct way to reserve space for hdfs?
I currently have 2 filesystem, /fs1 and /fs2 and I would like to reserve
space for non-dfs operations. For example, for /fs1 i would like to reserve
30gb of space for non-dfs and 10gb of space for /fs2 ?
I fear HADOOP-2991 is still haunting us?
Is there a way to configure the default timeout of a datanode? Currently its
set to 630seconds and I want something a bit more realistic -- like 30
seconds.
--
--- Get your facts first, then you can distort them as you please.--
Arun,
I second Joeś comment.
Thanks for giving us a heads up.
I will wait patiently until 0.23 is considered stable.
On Mon, Jul 18, 2011 at 11:19 PM, Joe Stein
charmal...@allthingshadoop.comwrote:
Arun,
Thanks for the update.
Again, I hate to have to play the part of captain obvious.
be ready and then talk to our CHUG?
-Mike
To: common-user@hadoop.apache.org
Subject: Re: Which release to use?
From: tdeut...@us.ibm.com
Date: Sat, 16 Jul 2011 10:29:55 -0700
Hi Rita - I want to make sure we are honoring the purpose/approach of
this
list. So you
I am a dimwit.
On Mon, Jul 18, 2011 at 8:12 PM, Allen Wittenauer a...@apache.org wrote:
On Jul 18, 2011, at 5:01 PM, Rita wrote:
I made the big mistake by using the latest version, 0.21.0 and found
bunch
of bugs so I got pissed off at hdfs. Then, after reading this thread it
seems I
I am curious about the IBM product BigInishgts. Where can we download it? It
seems we have to register to download it?
On Fri, Jul 15, 2011 at 12:38 PM, Tom Deutsch tdeut...@us.ibm.com wrote:
One quick clarification - IBM GA'd a product called BigInsights in 2Q. It
faithfully uses the Hadoop
So, I use hdfs to store very large files and access them thru various client
(100 clients) using FS utils. Are there any other tools or projects
that solely use hdfs as its storage for fast access? I know hbase uses it
but requires mapreduce. I want to know only hdfs without mapreduce.
--
---
, MapReduce may still
be required (atop HBase, i.e.). There's been work ongoing to assist
the same at the HBase side as well, but you're guaranteed better
responses on their mailing lists instead.
On Tue, Jul 12, 2011 at 3:31 PM, Rita rmorgan...@gmail.com wrote:
This is encouraging.
¨Make
the mapreduce daemons. These do not
need to be started.¨
On Mon, Jul 11, 2011 at 1:40 PM, Bharath Mundlapudi
bharathw...@yahoo.comwrote:
Another option to look at is Pig Or Hive. These need MapReduce.
-Bharath
From: Rita rmorgan...@gmail.com
To: common-user
I have a dataset which is several terabytes in size. I would like to query
this data using hbase (sql). Would I need to setup mapreduce to use hbase?
Currently the data is stored in hdfs and I am using `hdfs -cat ` to get the
data and pipe it into stdin.
--
--- Get your facts first, then you
Thanks Steve. This is exactly what I was looking for. Unfortunately, I don
see any example code for the implementation.
On Wed, Jul 6, 2011 at 7:35 AM, Steve Loughran ste...@apache.org wrote:
On 06/07/11 11:08, Rita wrote:
I have many large files ranging from 2gb to 800gb and I use hadoop
Could someone please compile and provide the jar for this class? It would be
much appreciated. I am running
r0.21.0/http://hadoop.apache.org/common/docs/r0.21.0/
On Thu, Jul 7, 2011 at 3:56 AM, Rita rmorgan...@gmail.com wrote:
By looking at this, h
ttp://www.mail-archive.com/mapreduce-dev
Thanks again Steve.
I will try to implement it with thrift.
On Thu, Jul 7, 2011 at 5:35 AM, Steve Loughran ste...@apache.org wrote:
On 07/07/11 08:22, Rita wrote:
Thanks Steve. This is exactly what I was looking for. Unfortunately, I don
see any example code for the implementation
I have many large files ranging from 2gb to 800gb and I use hadoop fs -cat a
lot to pipe to various programs.
I was wondering if its possible to prefetch the data for clients with more
bandwidth. Most of my clients have 10g interface and datanodes are 1g.
I was thinking, prefetch x blocks (even
We use hadoop/hdfs to archive data. I archive a lot of file by creating one
large tar file and then placing to hdfs. Is it better to use hadoop archive
for this or is it essentially the same thing?
--
--- Get your facts first, then you can distort them as you please.--
-0400, Rita wrote:
what filesystem are they using and what is the size of each filesystem?
It sounds nuts, but each disk has its own ext3 filesystem. Beyond switching
to
the deadline IO scheduler, we haven't done much tuning/tweaking. A script
runs
every ten minutes to test all of the data
what filesystem are they using and what is the size of each filesystem?
On Mon, May 9, 2011 at 9:22 PM, Will Maier wcma...@hep.wisc.edu wrote:
On Mon, May 09, 2011 at 05:07:29PM -0700, Jonathan Disher wrote:
Speak for yourself, I just built a bunch of 36 disk datanodes :)
And I just
Sheng,
How big is your each XFS volume? We noticed if its over 4TB hdfs won't pick
it up.
2011/5/6 Ferdy Galema ferdy.gal...@kalooga.com
No unfortunately not, we couldn't because of our kernel versions.
On 05/06/2011 04:00 AM, ShengChang Gu wrote:
Many thanks.
We use xfs all the
I am trying to acquire statistics about my hdfs cluster in the lab. One stat
I am really interested in is the total throughput (gigabytes served) of the
cluster for 24 hours. I suppose I can look for 'cmd=open' in the log file of
the name node but how accurate is it? It seems there is no
:
Hello,
Have a look at conf/log4j.properties to configure all logging options.
On Thu, Apr 21, 2011 at 3:14 AM, Rita rmorgan...@gmail.com wrote:
I guess I should ask, how does one enable debug mode for the namenode and
datanode logs?
I would like to see if in the debug mode I am able to see
I guess I should ask, how does one enable debug mode for the namenode and
datanode logs?
I would like to see if in the debug mode I am able to see close calls of a
file.
On Tue, Apr 19, 2011 at 8:48 PM, Rita rmorgan...@gmail.com wrote:
I know in the logs you can see 'cmd=open
, Rita rmorgan...@gmail.com wrote:
What is the best way to change the rack of a node?
I have tried the following: Killed the datanode process. Changed the
rackmap
file so the node and ip address entry reflect the new rack and I do
a
'-refreshNodes'. Restarted the datanode
Hello All,
Is there a parameter or procedure to check more aggressively for a live/dead
node? Despite me killing the hadoop process, I see the node active for more
than 10+ minutes in the Live Nodes page. Fortunately, the last contact
increments.
Using, branch-0.21, 0985326
--
--- Get your
: heartbeat.recheck.interval
For 0.22 : dfs.namenode.heartbeat.recheck-interval dfs.heartbeat.interval
Cheers,
Ravi
On 3/29/11 10:24 AM, Michael Segel michael_se...@hotmail.com wrote:
Rita,
When the NameNode doesn't see a heartbeat for 10 minutes, it then
recognizes that the node is down
Thanks with ext4 i created 2 16TB volumes and they are seen. I think it
maybe a issue with XFS.
On Mon, Mar 28, 2011 at 3:50 PM, Todd Lipcon t...@cloudera.com wrote:
On Fri, Mar 25, 2011 at 9:06 PM, Rita rmorgan...@gmail.com wrote:
Using 0.21
When I have a filesystem (XFS) with 1TB
Thankyou. Switching to ext4
On Tue, Mar 29, 2011 at 8:23 AM, Eric eric.x...@gmail.com wrote:
Rita, another issue I've seen is that when you have lots of XFS filesystems
that are heavily used, the Linux kernel will at some point crash. So the XFS
driver seems to have problems that only appear
Thanks Allen.
I really hope this gets addressed. Leaving it in cache can become
dangerous.
On Sat, Mar 26, 2011 at 7:49 PM, Allen Wittenauer
awittena...@linkedin.comwrote:
On Mar 26, 2011, at 3:50 PM, Ted Dunning wrote:
I think that the namenode remembers the rack. Restarting the
Using 0.21
When I have a filesystem (XFS) with 1TB it detects the datanode detects it
immediately. When I create 3 identical file systems all 3TB are visible
immediately.
If I create a 6TB filesystem (XFS) and I add it to dfs.data.dir and I
restart the datanode, hdfs dfsadmin -report does not
Thanks everyone for your replies.
I knew Cloudera had their release but never knew Y! had one too...
On Thu, Mar 24, 2011 at 5:04 PM, Eli Collins e...@cloudera.com wrote:
Hey Rita,
All software developed by Cloudera for CDH is Apache (v2) licensed and
freely available. See these docs
I have been wondering if I should use CDH (http://www.cloudera.com/hadoop/)
instead of the standard Hadoop distribution.
What do most people use? Is CDH free? do they provide the tars or does it
provide source code and I simply compile? Can I have some data nodes as CDH
and the rest as regular
before? I understand that Cloudera's version is heavily patched (similar to
Redhat Linux kernel versus standard Linux kernel).
On Wed, Mar 23, 2011 at 10:44 AM, Michael Segel
michael_se...@hotmail.comwrote:
Rita,
Short answer...
Cloudera's release is free, and they do also offer a support
Any help?
On Wed, Mar 16, 2011 at 9:36 PM, Rita rmorgan...@gmail.com wrote:
Hello,
I have been struggling with decommissioning data nodes. I have a 50+ data
node cluster (no MR) with each server holding about 2TB of storage. I split
the nodes into 2 racks.
I edit the 'exclude' file
Hello,
I have been struggling with decommissioning data nodes. I have a 50+ data
node cluster (no MR) with each server holding about 2TB of storage. I split
the nodes into 2 racks.
I edit the 'exclude' file and then do a -refreshNodes. I see the node
immediate in 'Decommiosied node' and I also
No, just a datanode. Nothing else.
On Wed, Mar 9, 2011 at 11:30 AM, stu24m...@yahoo.com wrote:
Is anything else running on the datanodes? Datanodes themselves don't need
too much memory.
Take care,
-stu
--
*From: * Rita rmorgan...@gmail.com
*Date: *Wed, 9 Mar
I have a 2 rack cluster. All of my files have a replication factor of 2. How
does hdfs determine what node to use when serving the data? Does it always
use the first rack? or is there an algorithm for this?
--
--- Get your facts first, then you can distort them as you please.--
I would like to build a fast dataquery system. Basically I have several
terabytes of time data I would like to analyze and I was wondering if hbase
is the right tool? Currently, I have a hdfs cluster of 100+ nodes and
everything is working fine. We are very happy with it. However, it would be
nice
about what kind of time data you have and what
kind of analysis you want to do?
On Tue, Mar 8, 2011 at 5:16 AM, Rita rmorgan...@gmail.com wrote:
I would like to build a fast dataquery system. Basically I have several
terabytes of time data I would like to analyze and I was wondering if hbase
the NameNode.
Although I believe the reports cost a lot, so do not do it often (rpcs the
NN).
On Tue, Feb 15, 2011 at 6:51 PM, Rita rmorgan...@gmail.com wrote:
Is there a programmatic way to determine if a datanode is down?
--
--- Get your facts first, then you can distort them
Are there any tips to reduce the latency in general for hdfs?
I noticed that when trying to copy a 30gb file it takes a while. I know
its subjective but would like to know if anyone has any tricks to reduce
latency.
On Sat, Jan 29, 2011 at 12:58 PM, Nathan Rutman nrut...@gmail.com wrote:
Comparing apples and oranges.
Lustre is great filesystem but has no native fault tolerance. If you want
POSIX filesystem with high performance than Lustre does it. However, if you
want to access data in a heterogeneous environment and not POSIX complaint
then hdfs is the tool.
I've read an
.
How come some of the log messages go to those five logs (jobtracker,
tasktracer, etc) but some go to console instead? I suppose it must have
something to do with log4j.properties, but I don't see why.
Please let me know if possible? Thank you very much!
-Rita :))
On Tue, Sep 7, 2010 at 7:45 PM
-- may I have an example which teaches me how to use
hadoop-streaming feature?
Thanks a lot!
-Rita :)
Hi :) I did check stdout under userlogs, but it's empty. If I want to see
the log messages I add to mapper and reducer, should I check them only in
the runtime?
Thanks a lot!
On Sun, Sep 5, 2010 at 10:59 PM, Rita Liu crystaldol...@gmail.com wrote:
Thanks so much for the kind reply! :) I looked
with the
same level to HeartbeatResponse.java, my log message does show in
JobTracker.log.
Do you have any idea why? If you do, please do let me know? Thank you so
very much!
Best,
Rita :)
Actually, I did, but still couldn't find my log messages. I'll double check
and reply to this thread later tonight, but I am pretty sure that they are
not there either :S
Please help? Thanks a lot! -Rita :S
On Tue, Sep 7, 2010 at 10:46 AM, Owen O'Malley omal...@apache.org wrote:
On Sep 7
Thanks so much for the kind reply! :) I looked at the web ui of jobtracker
(50030) but still couldn't find my logger messages. Could you please explain
a little more? Thanks a lot!
-Rita :)
On Sun, Sep 5, 2010 at 10:47 PM, Hemanth Yamijala yhema...@gmail.comwrote:
Hi,
On Mon, Sep 6, 2010 at 9
, Datanode). How
may I see my logging messages from WordCount, or, any of the MapReduce
applications?
If possible, please help me out? Thank you very much!!
Best,
Rita :)
hadoop common. This problem can be solved if I export
HADOOP_COMMON_HOME to be hadoop-common trunk, but is there a way to
configure this environment variable so that I don't have to export
HADOOP_COMMON_HOME every time when I start the cluster?
Please help me if possible? Thank you very much!
-Rita :))
those are very basic (and silly) questions, sorry :$ and
thank you very much! If possible, please help me out so that I can at
least start? Any suggestion and advice will be greatly appreciated.
Thanks again!
Best,
Rita :)
84 matches
Mail list logo