Hi All:
Is there any way using Hadoop Streaming to determining the directory from which
an input record is being read? This is straightforward in Hadoop using
InputFormats, but I am curious if the same concept can be applied to streaming.
The goal here is to read in data from 2 directories, say
through the old nodes,
removing each one from service for several hours and then return it to service.
Thoughts?
Thanks in Advance,
C G
found.
I assume there is some way to do this and I just don't have the right command
line magic. This is Hadoop 0.15.0.
Any help appreciated.
Thanks,
C G
no issues, but I want
to make sure it's OK to proceed.
Speaking of NameNode, what does it keep in memory? Our memory usage ramped up
rather suddenly recently. Also, does SecondaryNameNode require the same amount
of memory as NameNode?
Thanks for any help,
C G
We can get the NameNode and SecondaryNameNode up and running, but DataNodes
fail as shown below. Hadoop Jira 4019 tracks this problem
(https://issues.apache.org/jira/browse/HADOOP-4019), but I'm curious if anybody
has solved it yet...
2008-08-25 23:21:53,743 INFO
I've built and deployed KFS outside of Hadoop and it seems to work. I'm
planning to bring up a test environment shortly running Hadoop with KFS. With
all due respect to HDFS developers and committers, I am strongly hesitant to
call HDFS stable. We've had several major issues with HDFS in
You should look at
https://issues.apache.org/jira/browse/HADOOP-3678?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12610003#action_12610003 as
well. This eliminates spurious connection reset by peer messages that
clutter up the DataNode logs and can be
Yongqiang:
Thanks for this information. I'll try your changes and see if the experiment
runs better.
Thanks,
C G
--- On Mon, 7/7/08, heyongqiang [EMAIL PROTECTED] wrote:
From: heyongqiang [EMAIL PROTECTED]
Subject: Re: Re: Hadoop 0.17.0 - lots of I/O problems and can't run small
datasets
shed some light on this, or if others are having
similar problems.
Any thoughts, insights, etc. would be greatly appreciated.
Thanks,
C G
Here's an ugly trace:
08/07/06 01:43:29 INFO mapred.JobClient: map 100% reduce 93%
08/07/06 01:43:29 INFO mapred.JobClient: Task Id
not typically do).
This is on Hadoop 0.15.0, upgrading is not an option at the moment.
Any help appreciate...
Thanks,
C G
be going on? It seems really
strange to have code that works in an old version but won't run in the more
modern releases.
Thanks,
C G
C G [EMAIL PROTECTED] wrote:
Hi All:
I'm seeing an inability to run one of our applications over a reasonably small
dataset (~200G input) while running
?
thanks,
Raghu.
C G wrote:
We've recently upgraded from 0.15.0 to 0.16.4. Two nights ago we had a
problem where DFS nodes could not communicate. After not finding anything
obviously wrong we decided to shut down DFS and restart. Following restart I
was seeing a corrupted system with significant
Ugh, that solved the problem. Thanks Dhruba!
Thanks,
C G
Dhruba Borthakur [EMAIL PROTECTED] wrote:
If you look at the log message starting with STARTUP_MSG: build
=... you will see that the namenode and good datanode was built by CG
whereas the bad datanodes were compiled by hadoopqa
C G
one found. Is it safe/sufficient to simply delete this file?
There were MR jobs active when the master failed...it wasn't a clean shutdown
by any means. I surmise this file is remnant from an active job.
Thanks,
C G
Lohit [EMAIL PROTECTED] wrote:
Filesystem is considered
using DRBD.
It performed very well in this first real world test. If there is interest I
can write up how we protect our master nodes in more detail and share w/the
community.
Thanks,
C G
Ted Dunning [EMAIL PROTECTED] wrote:
You don't need to correct over-replicated files.
The under
Hi All:
We had a primary node failure over the weekend. When we brought the node
back up and I ran Hadoop fsck, I see the file system is corrupt. I'm unsure
how best to proceed. Any advice is greatly appreciated. If I've missed a
Wiki page or documentation somewhere please feel free
blocks: 0 (0.0 %)
Target replication factor: 3
Real replication factor: 3.0
The filesystem under path '/' is CORRUPT
So it seems like it's fixing some problems on its own?
Thanks,
C G
Dhruba Borthakur [EMAIL PROTECTED] wrote:
Did one datanode fail or did the namenode
at 8:55 PM, C G
wrote:
The system hosting the namenode experienced an OS panic and shut down, we
subsequently rebooted it. Currently we don't believe there is/was a bad disk
or other hardware problem.
Something interesting: I've ran fsck twice, the first time it gave the result
I posted
Yeah, everything is packaged into one jar...I've been copying those jars
everywhere which didn't seem right, hence the question.
Thanks,
C G
Ted Dunning [EMAIL PROTECTED] wrote:
The easiest way is to package all of your code (classes and jars) into a
single jar file which you
Dear Yahoo, Amazon, and Hadoop Community:
Thank you very much for a very well-done Hadoop Summit. It came as a
complete surprise that a FREE conference would include breakfast, lunch,
snacks, happy hour, and swag - very classy and very nice.
All the presentations and discussions
-copyToLocal/-cat.
Hope this helps...
C G
-
Looking for last minute shopping deals? Find them fast with Yahoo! Search.
to in-stream viewing behavior of Internet video audiences. Our
current grid contains more than 128 CPU cores and in excess of 100 terabytes of
storage, and we plan to grow that substantially during 2008.
Thanks,
C G
---
Christopher Gillett
Chief Software Architect
Visible
I haven't looked at the source code to see how -cat is implemented, but I was
pretty surprised at the results as well. When I sat down to do this experiment
I figured I was wasting my time..surprisingly I was not.
C G
Joydeep Sen Sarma [EMAIL PROTECTED] wrote:
This is amazing
I think HTTP access is read-only...you'll need to continue to use
copyFromLocalFile
C G
Phillip Wu [EMAIL PROTECTED] wrote:
Very helpful information.
Is there any ways to put files into DFS remotely, like http post?
Or I have to keep using copyFromLocalFile?
Thanks,
Phil
mobile
!
Bottom line: Carefully setting configuration parameters, and paying
attention to map/reduce task values relative to the size of the grid is VERY
important in achieving good performance.
Thanks,
C G
Joydeep Sen Sarma [EMAIL PROTECTED] wrote:
The default value are 2 so you might only
have me puzzled. I'm
sure that I'm doing something wrong/non-optimal w/r/t slow reduce phases, but
the long pauses during a dfs command line operation seems like a bug to me.
Unfortunately I've not seen anybody else report this.
Any thoughts/ideas most welcome...
Thanks,
C G
where it is taking a very long time to process small amounts of data. I
am hoping that some amount of tuning will resolve the problems.
Any thoughts and insights most appreciated.
Thanks,
C G
-
Never miss a thing. Make Yahoo your homepage.
28 matches
Mail list logo