Determining input record directory using Streaming...

2009-06-22 Thread C G
Hi All: Is there any way using Hadoop Streaming to determining the directory from which an input record is being read?  This is straightforward in Hadoop using InputFormats, but I am curious if the same concept can be applied to streaming. The goal here is to read in data from 2 directories, say

What happens in HDFS DataNode recovery?

2009-01-24 Thread C G
through the old nodes, removing each one from service for several hours and then return it to service. Thoughts? Thanks in Advance, C G

Copy data between HDFS instances...

2008-12-17 Thread C G
found. I assume there is some way to do this and I just don't have the right command line magic. This is Hadoop 0.15.0. Any help appreciated. Thanks, C G

NameNode memory usage and 32 vs. 64 bit JVMs

2008-11-06 Thread C G
no issues, but I want to make sure it's OK to proceed. Speaking of NameNode, what does it keep in memory? Our memory usage ramped up rather suddenly recently. Also, does SecondaryNameNode require the same amount of memory as NameNode? Thanks for any help, C G

0.18.0 DataNode refuses to start...

2008-08-25 Thread C G
We can get the NameNode and SecondaryNameNode up and running, but DataNodes fail as shown below.  Hadoop Jira 4019 tracks this problem (https://issues.apache.org/jira/browse/HADOOP-4019), but I'm curious if anybody has solved it yet...   2008-08-25 23:21:53,743 INFO

Re: HDFS Vs KFS

2008-08-25 Thread C G
I've built and deployed KFS outside of Hadoop and it seems to work.  I'm planning to bring up a test environment shortly running Hadoop with KFS.  With all due respect to HDFS developers and committers, I am strongly hesitant to call HDFS stable.  We've had several major issues with HDFS in

Re: dfs.DataNode connection issues

2008-07-16 Thread C G
You should look at https://issues.apache.org/jira/browse/HADOOP-3678?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12610003#action_12610003 as well.  This eliminates spurious connection reset by peer messages that clutter up the DataNode logs and can be

Re: Re: Hadoop 0.17.0 - lots of I/O problems and can't run small datasets?

2008-07-08 Thread C G
Yongqiang:   Thanks for this information.  I'll try your changes and see if the experiment runs better.   Thanks, C G --- On Mon, 7/7/08, heyongqiang [EMAIL PROTECTED] wrote: From: heyongqiang [EMAIL PROTECTED] Subject: Re: Re: Hadoop 0.17.0 - lots of I/O problems and can't run small datasets

Hadoop 0.17.0 - lots of I/O problems and can't run small datasets?

2008-07-06 Thread C G
shed some light on this, or if others are having similar problems.     Any thoughts, insights, etc. would be greatly appreciated.   Thanks, C G   Here's an ugly trace: 08/07/06 01:43:29 INFO mapred.JobClient:  map 100% reduce 93% 08/07/06 01:43:29 INFO mapred.JobClient: Task Id

Too many Task Manager children...

2008-06-19 Thread C G
not typically do). This is on Hadoop 0.15.0, upgrading is not an option at the moment. Any help appreciate... Thanks, C G

Re: 0.16.4 DataNode problem...

2008-05-28 Thread C G
be going on? It seems really strange to have code that works in an old version but won't run in the more modern releases. Thanks, C G C G [EMAIL PROTECTED] wrote: Hi All: I'm seeing an inability to run one of our applications over a reasonably small dataset (~200G input) while running

Re: 0.16.4 DFS dropping blocks, then won't retart...

2008-05-23 Thread C G
? thanks, Raghu. C G wrote: We've recently upgraded from 0.15.0 to 0.16.4. Two nights ago we had a problem where DFS nodes could not communicate. After not finding anything obviously wrong we decided to shut down DFS and restart. Following restart I was seeing a corrupted system with significant

Re: 0.16.4 DFS dropping blocks, then won't retart...

2008-05-23 Thread C G
Ugh, that solved the problem. Thanks Dhruba! Thanks, C G Dhruba Borthakur [EMAIL PROTECTED] wrote: If you look at the log message starting with STARTUP_MSG: build =... you will see that the namenode and good datanode was built by CG whereas the bad datanodes were compiled by hadoopqa

When is HDFS really corrupt...(and can I upgrade a corrupt FS?)

2008-05-15 Thread C G
C G

Re: When is HDFS really corrupt...(and can I upgrade a corrupt FS?)

2008-05-15 Thread C G
one found. Is it safe/sufficient to simply delete this file? There were MR jobs active when the master failed...it wasn't a clean shutdown by any means. I surmise this file is remnant from an active job. Thanks, C G Lohit [EMAIL PROTECTED] wrote: Filesystem is considered

Re: HDFS corrupt...how to proceed?

2008-05-12 Thread C G
using DRBD. It performed very well in this first real world test. If there is interest I can write up how we protect our master nodes in more detail and share w/the community. Thanks, C G Ted Dunning [EMAIL PROTECTED] wrote: You don't need to correct over-replicated files. The under

HDFS corrupt...how to proceed?

2008-05-11 Thread C G
Hi All: We had a primary node failure over the weekend. When we brought the node back up and I ran Hadoop fsck, I see the file system is corrupt. I'm unsure how best to proceed. Any advice is greatly appreciated. If I've missed a Wiki page or documentation somewhere please feel free

Re: HDFS corrupt...how to proceed?

2008-05-11 Thread C G
blocks: 0 (0.0 %) Target replication factor: 3 Real replication factor: 3.0 The filesystem under path '/' is CORRUPT So it seems like it's fixing some problems on its own? Thanks, C G Dhruba Borthakur [EMAIL PROTECTED] wrote: Did one datanode fail or did the namenode

Re: HDFS corrupt...how to proceed?

2008-05-11 Thread C G
at 8:55 PM, C G wrote: The system hosting the namenode experienced an OS panic and shut down, we subsequently rebooted it. Currently we don't believe there is/was a bad disk or other hardware problem. Something interesting: I've ran fsck twice, the first time it gave the result I posted

Re: Quick jar deployment question...

2008-04-03 Thread C G
Yeah, everything is packaged into one jar...I've been copying those jars everywhere which didn't seem right, hence the question. Thanks, C G Ted Dunning [EMAIL PROTECTED] wrote: The easiest way is to package all of your code (classes and jars) into a single jar file which you

Most excellent Hadoop Summit

2008-03-26 Thread C G
Dear Yahoo, Amazon, and Hadoop Community: Thank you very much for a very well-done Hadoop Summit. It came as a complete surprise that a FREE conference would include breakfast, lunch, snacks, happy hour, and swag - very classy and very nice. All the presentations and discussions

Solving the hang problem in dfs -copyToLocal/-cat...

2008-02-27 Thread C G
-copyToLocal/-cat. Hope this helps... C G - Looking for last minute shopping deals? Find them fast with Yahoo! Search.

Re: Add your project or company to the powered by page?

2008-02-27 Thread C G
to in-stream viewing behavior of Internet video audiences. Our current grid contains more than 128 CPU cores and in excess of 100 terabytes of storage, and we plan to grow that substantially during 2008. Thanks, C G --- Christopher Gillett Chief Software Architect Visible

RE: Solving the hang problem in dfs -copyToLocal/-cat...

2008-02-27 Thread C G
I haven't looked at the source code to see how -cat is implemented, but I was pretty surprised at the results as well. When I sat down to do this experiment I figured I was wasting my time..surprisingly I was not. C G Joydeep Sen Sarma [EMAIL PROTECTED] wrote: This is amazing

RE: Solving the hang problem in dfs -copyToLocal/-cat...

2008-02-27 Thread C G
I think HTTP access is read-only...you'll need to continue to use copyFromLocalFile C G Phillip Wu [EMAIL PROTECTED] wrote: Very helpful information. Is there any ways to put files into DFS remotely, like http post? Or I have to keep using copyFromLocalFile? Thanks, Phil mobile

RE: Questions regarding configuration parameters...

2008-02-22 Thread C G
! Bottom line: Carefully setting configuration parameters, and paying attention to map/reduce task values relative to the size of the grid is VERY important in achieving good performance. Thanks, C G Joydeep Sen Sarma [EMAIL PROTECTED] wrote: The default value are 2 so you might only

RE: Questions regarding configuration parameters...

2008-02-21 Thread C G
have me puzzled. I'm sure that I'm doing something wrong/non-optimal w/r/t slow reduce phases, but the long pauses during a dfs command line operation seems like a bug to me. Unfortunately I've not seen anybody else report this. Any thoughts/ideas most welcome... Thanks, C G

Questions regarding configuration parameters...

2008-02-20 Thread C G
where it is taking a very long time to process small amounts of data. I am hoping that some amount of tuning will resolve the problems. Any thoughts and insights most appreciated. Thanks, C G - Never miss a thing. Make Yahoo your homepage.