Re: data corruption checks / block scanning report

2013-10-25 Thread Rita
Can anyone explain the categories in block scanner report in detail? On Tue, Oct 22, 2013 at 5:59 AM, Rita rmorgan...@gmail.com wrote: anyone? On Sun, Oct 20, 2013 at 9:56 AM, Rita rmorgan...@gmail.com wrote: I have asked this question elsewhere and haven't gotten the answers. Perhaps, I

Re: data corruption checks / block scanning report

2013-10-22 Thread Rita
anyone? On Sun, Oct 20, 2013 at 9:56 AM, Rita rmorgan...@gmail.com wrote: I have asked this question elsewhere and haven't gotten the answers. Perhaps, I asked it the wrong way or forum. I have a 40+ node cluster and I would like to make sure the data node scanning is aggressively done. I

data corruption checks / block scanning report

2013-10-20 Thread Rita
I have asked this question elsewhere and haven't gotten the answers. Perhaps, I asked it the wrong way or forum. I have a 40+ node cluster and I would like to make sure the data node scanning is aggressively done. I know 8192kb/sec is hard coded but was wondering if there was a better way to keep

Re: datanode tuning

2013-10-07 Thread Rita
at dfs.heartbeat.interval and dfs.namenode.heartbeat.recheck-interval 40 datanodes is not a large cluster IMHO and the Namenode is capable of managing 100 times more datanodes. From: Rita rmorgan...@gmail.com To: common-user@hadoop.apache.org common-user

Re: datanode tuning

2013-10-07 Thread Rita
at 10:50 AM, Ravi Prakash ravi...@ymail.com wrote: Rita! 14-16 Tb is perhaps a big node. Even then the scalability limits of the Namenode in your case would depend on how many files (more accurately how many blocks) there are on HDFS. In any case, if you want the datanodes to be marked dead

datanode tuning

2013-10-06 Thread Rita
I would like my 40 data nodes to aggressively report to namenode if they are alive or not therefore I think I need to change these params dfs.block.access.token.lifetime : Default is 600 seconds. Can I decrease this to 60? dfs.block.access.key.update.interval: Default is 600 seconds. Can I

Re: hardware for hdfs

2013-03-17 Thread Rita
any thought? On Wed, Mar 13, 2013 at 7:17 PM, Rita rmorgan...@gmail.com wrote: i am planning to build a hdfs cluster primary for streaming large files (10g avg size). I was wondering if anyone can recommend a good hardware vendor. -- --- Get your facts first, then you can distort them

Re: measuring iops

2012-10-23 Thread Rita
, or as a benchmark to measure the maximum? The JMX page nn:port/jmx provides some interesting stats, but I'm not sure they have what you want. And I'm unaware of other tools which could. From: Rita rmorgan...@gmail.com To: common-user

Re: measuring iops

2012-10-22 Thread Rita
Anyone? On Sun, Oct 21, 2012 at 8:30 AM, Rita rmorgan...@gmail.com wrote: Hi, Was curious if there was a method to measure the total number of IOPS (I/O operations per second) on a HDFS cluster. -- --- Get your facts first, then you can distort them as you please.-- -- --- Get

Re: measuring iops

2012-10-22 Thread Rita
Is it possible to know how many reads and writes are occurring thru the entire cluster in a consolidated manner -- this does not include replication factors. On Mon, Oct 22, 2012 at 10:28 AM, Ravi Prakash ravi...@ymail.com wrote: Hi Rita, SliveTest can help you measure the number of reads

Re: Hadoop on Isilon problem

2012-10-17 Thread Rita
out of curiosity, what does running HDFS give you when running thru an Isilon cluster? On Wed, Oct 17, 2012 at 3:59 PM, Mohit Anchlia mohitanch...@gmail.comwrote: Look at the directory permissions? On Wed, Oct 17, 2012 at 12:18 PM, Artem Ervits are9...@nyp.org wrote: Anyone using Hadoop

Re: distcp question

2012-10-12 Thread Rita
thanks for the advise. Before I push or pull. Are there any tests I can run before I do the distCP. I am not 100% sure if I have my webhdfs setup properly. On Fri, Oct 12, 2012 at 1:01 PM, J. Rottinghuis jrottingh...@gmail.comwrote: Rita, Are you doing a push from the source cluster

Re: Re: distcp question

2012-10-12 Thread Rita
nvermind. Figured it out. On Fri, Oct 12, 2012 at 3:20 PM, kojie.fu kojie...@gmail.com wrote: kojie.fu From: Rita Date: 2012-10-13 03:19 To: common-user Subject: Re: distcp question thanks for the advise. Before I push or pull. Are there any tests I can run before I do the distCP

file checksum

2012-06-25 Thread Rita
Does Hadoop, HDFS in particular, do any sanity checks of the file before and after balancing/copying/reading the files? We have 20TB of data and I want to make sure after these operating are completed the data is still in good shape. Where can I read about this? tia -- --- Get your facts first,

Re: file checksum

2012-06-25 Thread Rita
, to prevent bit rod, blocks are checked periodically (weekly by default, I believe, you can configure that period) in the background. Kai Am 25.06.2012 um 13:29 schrieb Rita: Does Hadoop, HDFS in particular, do any sanity checks of the file before and after balancing/copying/reading the files

Re: freeze a mapreduce job

2012-05-11 Thread Rita
. P.s. A better solution would be to make your job not take as many days, somehow? :-) On Fri, May 11, 2012 at 4:13 PM, Rita rmorgan...@gmail.com wrote: I have a rather large map reduce job which takes few days. I was wondering if its possible for me to freeze the job or make the job less

hdfs file browser

2012-04-17 Thread Rita
Is it possible to get pretty URLs when doing HDFS file browsing via web browser? -- --- Get your facts first, then you can distort them as you please.--

setting client retry

2012-04-12 Thread Rita
In the hdfs-site.xml file what argument do I need to set for client retries? Also, what is the default parameter? -- --- Get your facts first, then you can distort them as you please.--

Retry question

2012-03-18 Thread Rita
My replication factor is 3 and if I were reading data thru libhdfs using C is there a retry method? I am reading a 60gb file and what would will happen if a rack goes down and the next block isn't available? Will the API retry? is there a way t configuration this option? -- --- Get your facts

Re: Retry question

2012-03-18 Thread Rita
on your replication factor. A block can not be available too due to corruption, and in this case, it can be replicated to other live machines and fix the error with the fsck utility. Regards On 3/18/2012 9:46 AM, Rita wrote: My replication factor is 3 and if I were reading data thru

python and hbase

2012-02-06 Thread Rita
Running CDH U1 and using Hbase. So far I have been extremely happy with the exception of Python support. Currently, I am using thrift but I suspect there are some major features missing in it such as RegExFilters. If this is the case, will there ever be a native Python client for Hbase? -- ---

Re: hadoop filesystem cache

2012-01-17 Thread Rita
taken has been to keep data that is accessed repeatedly and fits in memory in some other system (hbase/cassandra/mysql/whatever). Edward On Mon, Jan 16, 2012 at 11:33 AM, Rita rmorgan...@gmail.com wrote: Thanks. I believe this is a good feature to have for clients especially if you

Re: hadoop filesystem cache

2012-01-16 Thread Rita
in coordination with Facebook. I don't believe it has been published quite yet, but the title of the project is PACMan -- I expect it will be published soon. -Todd On Sat, Jan 14, 2012 at 5:30 PM, Rita rmorgan...@gmail.com wrote: After reading this article, http://www.cloudera.com/blog/2012/01

hadoop filesystem cache

2012-01-14 Thread Rita
After reading this article, http://www.cloudera.com/blog/2012/01/caching-in-hbase-slabcache/ , I was wondering if there was a filesystem cache for hdfs. For example, if a large file (10gigabytes) was keep getting accessed on the cluster instead of keep getting it from the network why not storage

Re: hadoop filesystem cache

2012-01-14 Thread Rita
yes, something different from that. To my knowledge, DistributedCache is only for Mapreduce. On Sat, Jan 14, 2012 at 8:33 PM, Prashant Kommireddi prash1...@gmail.comwrote: You mean something different from the DistributedCache? Sent from my iPhone On Jan 14, 2012, at 5:30 PM, Rita rmorgan

measuring network throughput

2011-12-22 Thread Rita
Is there a tool or a method to measure the throughput of the cluster at a given time? It would be a great feature to add -- --- Get your facts first, then you can distort them as you please.--

Re: measuring network throughput

2011-12-22 Thread Rita
Yes, I think they can graph it for you. However, I am looking for raw data because I would like to create something custom On Thu, Dec 22, 2011 at 8:19 AM, alo alt wget.n...@googlemail.com wrote: Rita, ganglia give you a throughput like Nagios. Could that help? - Alex On Thu, Dec 22

Hadoop RPC question

2011-12-19 Thread Rita
Hello, I am working on writing a process (bash) which can attach to the namenode and listen to the RPCs. I am interested in, what files are hot and who is reading the data. Currently, I am using the namenode logs to gather this data but was wondering if I can attach to the hadoop/hdfs port and

Re: Hadoop 0.21

2011-12-06 Thread Rita
I second Vinod´s idea. Get the latest stable from Cloudera. Their binaries are near perfect! On Tue, Dec 6, 2011 at 1:46 PM, T Vinod Gupta tvi...@readypulse.com wrote: Saurabh, Its best if you go through the hbase book - Lars George's book HBase the Definitive Guide. Your best bet is to

replication question

2011-11-22 Thread Rita
Hello, I am using hbase and I have a default replication factor of 2. Now, if I change the directory replication factor will all the new files being created there be automatically be replicated as 3? -- --- Get your facts first, then you can distort them as you please.--

Re: Sizing help

2011-11-08 Thread Rita
you can reconstruct and which you don't need to read really fast, but not good enough for data whose loss will get you fired. On Mon, Nov 7, 2011 at 7:34 PM, Rita rmorgan...@gmail.com wrote: I have been running with 2x replication on a 500tb cluster. No issues whatsoever. 3x is for super

Re: Sizing help

2011-11-07 Thread Rita
For a 1PB installation you would need close to 170 servers with 12 TB disk pack installed on them (with replication factor of 2). Thats a conservative estimate CPUs: 4 cores with 16gb of memory Namenode: 4 core with 32gb of memory should be ok. On Fri, Oct 21, 2011 at 5:40 PM, Steve Ed

Re: Sizing help

2011-11-07 Thread Rita
, 2011, Rita rmorgan...@gmail.com wrote: For a 1PB installation you would need close to 170 servers with 12 TB disk pack installed on them (with replication factor of 2). Thats a conservative estimate CPUs: 4 cores with 16gb of memory Namenode: 4 core with 32gb of memory should be ok

Re: Hadoop + cygwin

2011-11-01 Thread Rita
Why ? The beauty of hadoop is its OS agnostic. What is your native operating system? I am sure you have a version of JDK and JRE running there. On Tue, Nov 1, 2011 at 4:53 AM, Masoud mas...@agape.hanyang.ac.kr wrote: Hi Anybody ran hadoop on cygwin for development purpose??? Did you have

HDFS-RAID

2011-10-29 Thread Rita
I would like to know if and ever if HDFS RAID ( http://wiki.apache.org/hadoop/HDFS-RAID) will ever get into mainline. This is would be an extremely useful feature for many sites especially larger ones. The saving on storage will be noticeable. I haven really seen any progress in

correct way to reserve space

2011-10-26 Thread Rita
What is the correct way to reserve space for hdfs? I currently have 2 filesystem, /fs1 and /fs2 and I would like to reserve space for non-dfs operations. For example, for /fs1 i would like to reserve 30gb of space for non-dfs and 10gb of space for /fs2 ? I fear HADOOP-2991 is still haunting us?

default timeout of datanode

2011-10-06 Thread Rita
Is there a way to configure the default timeout of a datanode? Currently its set to 630seconds and I want something a bit more realistic -- like 30 seconds. -- --- Get your facts first, then you can distort them as you please.--

Re: Which release to use?

2011-07-19 Thread Rita
Arun, I second Joeś comment. Thanks for giving us a heads up. I will wait patiently until 0.23 is considered stable. On Mon, Jul 18, 2011 at 11:19 PM, Joe Stein charmal...@allthingshadoop.comwrote: Arun, Thanks for the update. Again, I hate to have to play the part of captain obvious.

Re: Which release to use?

2011-07-18 Thread Rita
be ready and then talk to our CHUG? -Mike To: common-user@hadoop.apache.org Subject: Re: Which release to use? From: tdeut...@us.ibm.com Date: Sat, 16 Jul 2011 10:29:55 -0700 Hi Rita - I want to make sure we are honoring the purpose/approach of this list. So you

Re: Which release to use?

2011-07-18 Thread Rita
I am a dimwit. On Mon, Jul 18, 2011 at 8:12 PM, Allen Wittenauer a...@apache.org wrote: On Jul 18, 2011, at 5:01 PM, Rita wrote: I made the big mistake by using the latest version, 0.21.0 and found bunch of bugs so I got pissed off at hdfs. Then, after reading this thread it seems I

Re: Which release to use?

2011-07-16 Thread Rita
I am curious about the IBM product BigInishgts. Where can we download it? It seems we have to register to download it? On Fri, Jul 15, 2011 at 12:38 PM, Tom Deutsch tdeut...@us.ibm.com wrote: One quick clarification - IBM GA'd a product called BigInsights in 2Q. It faithfully uses the Hadoop

Any other uses of hdfs?

2011-07-16 Thread Rita
So, I use hdfs to store very large files and access them thru various client (100 clients) using FS utils. Are there any other tools or projects that solely use hdfs as its storage for fast access? I know hbase uses it but requires mapreduce. I want to know only hdfs without mapreduce. -- ---

Re: large data and hbase

2011-07-13 Thread Rita
, MapReduce may still be required (atop HBase, i.e.). There's been work ongoing to assist the same at the HBase side as well, but you're guaranteed better responses on their mailing lists instead. On Tue, Jul 12, 2011 at 3:31 PM, Rita rmorgan...@gmail.com wrote: This is encouraging. ¨Make

Re: large data and hbase

2011-07-12 Thread Rita
the mapreduce daemons. These do not need to be started.¨ On Mon, Jul 11, 2011 at 1:40 PM, Bharath Mundlapudi bharathw...@yahoo.comwrote: Another option to look at is Pig Or Hive. These need MapReduce. -Bharath From: Rita rmorgan...@gmail.com To: common-user

large data and hbase

2011-07-11 Thread Rita
I have a dataset which is several terabytes in size. I would like to query this data using hbase (sql). Would I need to setup mapreduce to use hbase? Currently the data is stored in hdfs and I am using `hdfs -cat ` to get the data and pipe it into stdin. -- --- Get your facts first, then you

Re: parallel cat

2011-07-07 Thread Rita
Thanks Steve. This is exactly what I was looking for. Unfortunately, I don see any example code for the implementation. On Wed, Jul 6, 2011 at 7:35 AM, Steve Loughran ste...@apache.org wrote: On 06/07/11 11:08, Rita wrote: I have many large files ranging from 2gb to 800gb and I use hadoop

Re: thrift and python

2011-07-07 Thread Rita
Could someone please compile and provide the jar for this class? It would be much appreciated. I am running r0.21.0/http://hadoop.apache.org/common/docs/r0.21.0/ On Thu, Jul 7, 2011 at 3:56 AM, Rita rmorgan...@gmail.com wrote: By looking at this, h ttp://www.mail-archive.com/mapreduce-dev

Re: parallel cat

2011-07-07 Thread Rita
Thanks again Steve. I will try to implement it with thrift. On Thu, Jul 7, 2011 at 5:35 AM, Steve Loughran ste...@apache.org wrote: On 07/07/11 08:22, Rita wrote: Thanks Steve. This is exactly what I was looking for. Unfortunately, I don see any example code for the implementation

parallel cat

2011-07-06 Thread Rita
I have many large files ranging from 2gb to 800gb and I use hadoop fs -cat a lot to pipe to various programs. I was wondering if its possible to prefetch the data for clients with more bandwidth. Most of my clients have 10g interface and datanodes are 1g. I was thinking, prefetch x blocks (even

tar or hadoop archive

2011-06-27 Thread Rita
We use hadoop/hdfs to archive data. I archive a lot of file by creating one large tar file and then placing to hdfs. Is it better to use hadoop archive for this or is it essentially the same thing? -- --- Get your facts first, then you can distort them as you please.--

Re: our experiences with various filesystems and tuning options

2011-05-10 Thread Rita
-0400, Rita wrote: what filesystem are they using and what is the size of each filesystem? It sounds nuts, but each disk has its own ext3 filesystem. Beyond switching to the deadline IO scheduler, we haven't done much tuning/tweaking. A script runs every ten minutes to test all of the data

Re: our experiences with various filesystems and tuning options

2011-05-09 Thread Rita
what filesystem are they using and what is the size of each filesystem? On Mon, May 9, 2011 at 9:22 PM, Will Maier wcma...@hep.wisc.edu wrote: On Mon, May 09, 2011 at 05:07:29PM -0700, Jonathan Disher wrote: Speak for yourself, I just built a bunch of 36 disk datanodes :) And I just

Re: our experiences with various filesystems and tuning options

2011-05-06 Thread Rita
Sheng, How big is your each XFS volume? We noticed if its over 4TB hdfs won't pick it up. 2011/5/6 Ferdy Galema ferdy.gal...@kalooga.com No unfortunately not, we couldn't because of our kernel versions. On 05/06/2011 04:00 AM, ShengChang Gu wrote: Many thanks. We use xfs all the

measure throughput of cluster

2011-05-03 Thread Rita
I am trying to acquire statistics about my hdfs cluster in the lab. One stat I am really interested in is the total throughput (gigabytes served) of the cluster for 24 hours. I suppose I can look for 'cmd=open' in the log file of the name node but how accurate is it? It seems there is no

Re: hdfs log question

2011-04-21 Thread Rita
: Hello, Have a look at conf/log4j.properties to configure all logging options. On Thu, Apr 21, 2011 at 3:14 AM, Rita rmorgan...@gmail.com wrote: I guess I should ask, how does one enable debug mode for the namenode and datanode logs? I would like to see if in the debug mode I am able to see

Re: hdfs log question

2011-04-20 Thread Rita
I guess I should ask, how does one enable debug mode for the namenode and datanode logs? I would like to see if in the debug mode I am able to see close calls of a file. On Tue, Apr 19, 2011 at 8:48 PM, Rita rmorgan...@gmail.com wrote: I know in the logs you can see 'cmd=open

Re: changing node's rack

2011-03-29 Thread Rita
, Rita rmorgan...@gmail.com wrote: What is the best way to change the rack of a node? I have tried the following: Killed the datanode process. Changed the rackmap file so the node and ip address entry reflect the new rack and I do a '-refreshNodes'. Restarted the datanode

live/dead node problem

2011-03-29 Thread Rita
Hello All, Is there a parameter or procedure to check more aggressively for a live/dead node? Despite me killing the hadoop process, I see the node active for more than 10+ minutes in the Live Nodes page. Fortunately, the last contact increments. Using, branch-0.21, 0985326 -- --- Get your

Re: live/dead node problem

2011-03-29 Thread Rita
: heartbeat.recheck.interval For 0.22 : dfs.namenode.heartbeat.recheck-interval dfs.heartbeat.interval Cheers, Ravi On 3/29/11 10:24 AM, Michael Segel michael_se...@hotmail.com wrote: Rita, When the NameNode doesn't see a heartbeat for 10 minutes, it then recognizes that the node is down

Re: directory scan issue

2011-03-29 Thread Rita
Thanks with ext4 i created 2 16TB volumes and they are seen. I think it maybe a issue with XFS. On Mon, Mar 28, 2011 at 3:50 PM, Todd Lipcon t...@cloudera.com wrote: On Fri, Mar 25, 2011 at 9:06 PM, Rita rmorgan...@gmail.com wrote: Using 0.21 When I have a filesystem (XFS) with 1TB

Re: directory scan issue

2011-03-29 Thread Rita
Thankyou. Switching to ext4 On Tue, Mar 29, 2011 at 8:23 AM, Eric eric.x...@gmail.com wrote: Rita, another issue I've seen is that when you have lots of XFS filesystems that are heavily used, the Linux kernel will at some point crash. So the XFS driver seems to have problems that only appear

Re: changing node's rack

2011-03-26 Thread Rita
Thanks Allen. I really hope this gets addressed. Leaving it in cache can become dangerous. On Sat, Mar 26, 2011 at 7:49 PM, Allen Wittenauer awittena...@linkedin.comwrote: On Mar 26, 2011, at 3:50 PM, Ted Dunning wrote: I think that the namenode remembers the rack. Restarting the

directory scan issue

2011-03-25 Thread Rita
Using 0.21 When I have a filesystem (XFS) with 1TB it detects the datanode detects it immediately. When I create 3 identical file systems all 3TB are visible immediately. If I create a 6TB filesystem (XFS) and I add it to dfs.data.dir and I restart the datanode, hdfs dfsadmin -report does not

Re: CDH and Hadoop

2011-03-24 Thread Rita
Thanks everyone for your replies. I knew Cloudera had their release but never knew Y! had one too... On Thu, Mar 24, 2011 at 5:04 PM, Eli Collins e...@cloudera.com wrote: Hey Rita, All software developed by Cloudera for CDH is Apache (v2) licensed and freely available. See these docs

CDH and Hadoop

2011-03-23 Thread Rita
I have been wondering if I should use CDH (http://www.cloudera.com/hadoop/) instead of the standard Hadoop distribution. What do most people use? Is CDH free? do they provide the tars or does it provide source code and I simply compile? Can I have some data nodes as CDH and the rest as regular

Re: CDH and Hadoop

2011-03-23 Thread Rita
before? I understand that Cloudera's version is heavily patched (similar to Redhat Linux kernel versus standard Linux kernel). On Wed, Mar 23, 2011 at 10:44 AM, Michael Segel michael_se...@hotmail.comwrote: Rita, Short answer... Cloudera's release is free, and they do also offer a support

Re: decommissioning node woes

2011-03-18 Thread Rita
Any help? On Wed, Mar 16, 2011 at 9:36 PM, Rita rmorgan...@gmail.com wrote: Hello, I have been struggling with decommissioning data nodes. I have a 50+ data node cluster (no MR) with each server holding about 2TB of storage. I split the nodes into 2 racks. I edit the 'exclude' file

decommissioning node woes

2011-03-16 Thread Rita
Hello, I have been struggling with decommissioning data nodes. I have a 50+ data node cluster (no MR) with each server holding about 2TB of storage. I split the nodes into 2 racks. I edit the 'exclude' file and then do a -refreshNodes. I see the node immediate in 'Decommiosied node' and I also

Re: datanode memory requirement

2011-03-09 Thread Rita
No, just a datanode. Nothing else. On Wed, Mar 9, 2011 at 11:30 AM, stu24m...@yahoo.com wrote: Is anything else running on the datanodes? Datanodes themselves don't need too much memory. Take care, -stu -- *From: * Rita rmorgan...@gmail.com *Date: *Wed, 9 Mar

how does hdfs determine what node to use?

2011-03-09 Thread Rita
I have a 2 rack cluster. All of my files have a replication factor of 2. How does hdfs determine what node to use when serving the data? Does it always use the first rack? or is there an algorithm for this? -- --- Get your facts first, then you can distort them as you please.--

hbase and hdfs

2011-03-08 Thread Rita
I would like to build a fast dataquery system. Basically I have several terabytes of time data I would like to analyze and I was wondering if hbase is the right tool? Currently, I have a hdfs cluster of 100+ nodes and everything is working fine. We are very happy with it. However, it would be nice

Re: hbase and hdfs

2011-03-08 Thread Rita
about what kind of time data you have and what kind of analysis you want to do? On Tue, Mar 8, 2011 at 5:16 AM, Rita rmorgan...@gmail.com wrote: I would like to build a fast dataquery system. Basically I have several terabytes of time data I would like to analyze and I was wondering if hbase

Re: datanode down alert

2011-02-24 Thread Rita
the NameNode. Although I believe the reports cost a lot, so do not do it often (rpcs the NN). On Tue, Feb 15, 2011 at 6:51 PM, Rita rmorgan...@gmail.com wrote: Is there a programmatic way to determine if a datanode is down? -- --- Get your facts first, then you can distort them

Re: Latency and speed of HDFS

2011-01-29 Thread Rita
Are there any tips to reduce the latency in general for hdfs? I noticed that when trying to copy a 30gb file it takes a while. I know its subjective but would like to know if anyone has any tricks to reduce latency. On Sat, Jan 29, 2011 at 12:58 PM, Nathan Rutman nrut...@gmail.com wrote:

Re: TestDFSIO on Lustre vs HDFS

2011-01-27 Thread Rita
Comparing apples and oranges. Lustre is great filesystem but has no native fault tolerance. If you want POSIX filesystem with high performance than Lustre does it. However, if you want to access data in a heterogeneous environment and not POSIX complaint then hdfs is the tool. I've read an

Re: Adding log messages to src files, but they don't appear in logs ...

2010-09-08 Thread Rita Liu
. How come some of the log messages go to those five logs (jobtracker, tasktracer, etc) but some go to console instead? I suppose it must have something to do with log4j.properties, but I don't see why. Please let me know if possible? Thank you very much! -Rita :)) On Tue, Sep 7, 2010 at 7:45 PM

Hadoop Streaming?

2010-09-08 Thread Rita Liu
-- may I have an example which teaches me how to use hadoop-streaming feature? Thanks a lot! -Rita :)

Re: Log4j Logger in MapReduce applications

2010-09-07 Thread Rita Liu
Hi :) I did check stdout under userlogs, but it's empty. If I want to see the log messages I add to mapper and reducer, should I check them only in the runtime? Thanks a lot! On Sun, Sep 5, 2010 at 10:59 PM, Rita Liu crystaldol...@gmail.com wrote: Thanks so much for the kind reply! :) I looked

Adding log messages to src files, but they don't appear in logs ...

2010-09-07 Thread Rita Liu
with the same level to HeartbeatResponse.java, my log message does show in JobTracker.log. Do you have any idea why? If you do, please do let me know? Thank you so very much! Best, Rita :)

Re: Adding log messages to src files, but they don't appear in logs ...

2010-09-07 Thread Rita Liu
Actually, I did, but still couldn't find my log messages. I'll double check and reply to this thread later tonight, but I am pretty sure that they are not there either :S Please help? Thanks a lot! -Rita :S On Tue, Sep 7, 2010 at 10:46 AM, Owen O'Malley omal...@apache.org wrote: On Sep 7

Re: Log4j Logger in MapReduce applications

2010-09-06 Thread Rita Liu
Thanks so much for the kind reply! :) I looked at the web ui of jobtracker (50030) but still couldn't find my logger messages. Could you please explain a little more? Thanks a lot! -Rita :) On Sun, Sep 5, 2010 at 10:47 PM, Hemanth Yamijala yhema...@gmail.comwrote: Hi, On Mon, Sep 6, 2010 at 9

Log4j Logger in MapReduce applications

2010-09-05 Thread Rita Liu
, Datanode). How may I see my logging messages from WordCount, or, any of the MapReduce applications? If possible, please help me out? Thank you very much!! Best, Rita :)

How to specify HADOOP_COMMON_HOME in hadoop-mapreduce trunk?

2010-09-05 Thread Rita Liu
hadoop common. This problem can be solved if I export HADOOP_COMMON_HOME to be hadoop-common trunk, but is there a way to configure this environment variable so that I don't have to export HADOOP_COMMON_HOME every time when I start the cluster? Please help me if possible? Thank you very much! -Rita :))

Hadoop Basics -- how to trace, starting from a mapreduce app?

2010-08-17 Thread Rita Liu
those are very basic (and silly) questions, sorry :$ and thank you very much! If possible, please help me out so that I can at least start? Any suggestion and advice will be greatly appreciated. Thanks again! Best, Rita :)