Thanh Do created HDFS-7870:
--
Summary: remove libuuid dependency
Key: HDFS-7870
URL: https://issues.apache.org/jira/browse/HDFS-7870
Project: Hadoop HDFS
Issue Type: Sub-task
Reporter
Thanh Do created HDFS-7862:
--
Summary: Revisit the use of long data type
Key: HDFS-7862
URL: https://issues.apache.org/jira/browse/HDFS-7862
Project: Hadoop HDFS
Issue Type: Sub-task
Thanh Do created HDFS-7861:
--
Summary: Revisit Windows socket API compatibility
Key: HDFS-7861
URL: https://issues.apache.org/jira/browse/HDFS-7861
Project: Hadoop HDFS
Issue Type: Sub-task
Thanh Do created HDFS-7860:
--
Summary: Get HA NameNode information from config file
Key: HDFS-7860
URL: https://issues.apache.org/jira/browse/HDFS-7860
Project: Hadoop HDFS
Issue Type: Sub-task
[
https://issues.apache.org/jira/browse/HDFS-7768?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Thanh Do resolved HDFS-7768.
Resolution: Invalid
Overlapped with HDFS-7188
> Separate Platform specific funti
Thanh Do created HDFS-7768:
--
Summary: Separate Platform specific funtions
Key: HDFS-7768
URL: https://issues.apache.org/jira/browse/HDFS-7768
Project: Hadoop HDFS
Issue Type: Sub-task
Thanh Do created HDFS-7577:
--
Summary: Add additional headers that includes need by Windows
Key: HDFS-7577
URL: https://issues.apache.org/jira/browse/HDFS-7577
Project: Hadoop HDFS
Issue Type: Sub
Thanh Do created HDFS-7574:
--
Summary: Make cmake work in Windows Visual Studio 2010
Key: HDFS-7574
URL: https://issues.apache.org/jira/browse/HDFS-7574
Project: Hadoop HDFS
Issue Type: Sub-task
hi all,
is there any work on snapshoting HDFS going on?
Can anybody give me some hint on what the
current state of the art in HDFS snapshoting.
thanks a lot.
I am using cloudera's distribution version:
hadoop-0.20.2+738.
On Thu, Apr 14, 2011 at 6:23 PM, Ted Dunning wrote:
> What version are you using?
>
>
> On Thu, Apr 14, 2011 at 3:55 PM, Thanh Do wrote:
>
>> Hi all,
>>
>> I have recently seen silent data l
Hi all,
I have recently seen silent data loss in our system.
Here is the case:
1. client appends to some block
2. for some reason, commitBlockSynchronization
returns successfully with synclist = [] (i.e empty)
3. in the client code, NO exception is thrown, and
client appends successfully.
Hi all,
Backup Node is there in 0.21.0,
but I am curious about the Standby Node progress.
Anybody is working on that?
Thanks
Thanh
We have a corrupted file which has only one block.
It turns out that all checksum files of the replicas
are corrupted... but the data files are OK...
How to recover this file?
I can think of trying to use shell get file with -ingorecrc
option. Then put it into HDFS again...
But can the system aut
3, 2010 at 7:37 PM, Brian Bockelman wrote:
>
> On Oct 13, 2010, at 7:29 PM, Thanh Do wrote:
>
> > Hi Brian,
> >
> > If this is the case, then is there any chance that,
> > some how the DataBlockScanner cannot finishes
> > the verification for all the block
Hi all,
Is there any benchmarks for HDFS available)
(measuring read/write throughput, latency and such).
It would be great if somebody point me to any source.
Thanks much
Thanh
got it here
https://issues.apache.org/jira/browse/HADOOP-3164
On Sun, Nov 14, 2010 at 11:31 AM, Thanh Do wrote:
> Hi all,
>
> Can somebody let me know what is this
> parameter used for:
>
> dfs.datanode.transferTo.allowed
>
> It is not in default config,
> and the ma
Hi all,
Can somebody let me know what is this
parameter used for:
dfs.datanode.transferTo.allowed
It is not in default config,
and the maxChunksPerPacket depends on it.
Thanks so much.
Thanh
Hi all,
Does current implementation (0.21.0) fsync()
once a datanode finalizes a replica?
I looked into the source and it seems to me
that no fsync() is call once the replica is finalized.
Am I wrong here?
Thanks.
Thanh
On Thu, Nov 11, 2010 at 12:43 PM, Thanh Do wrote:
>
> > Thank you all for clarification guys.
> > I also looked at 0.20-append trunk and see that the order is totally
> > different.
> >
> > One more thing, do you guys plan to implement hsync(), i.e API3
> >
write is more for the purpose of
> > implementation simplification. Currently readers do not read from
> DataNode
> > buffer. They only read from system buffer. A flush makes the data visible
> > to
> > readers sooner.
> >
> > Hairong
> >
> > On 11/11/
he API that will eventually go all the way to disk, but it
> has not yet been implemented.
>
> -Todd
>
> On Wednesday, November 10, 2010, Thanh Do wrote:
> > Or another way to rephase my question:
> > does data.flush and checksumOut.flush guarantee
> > da
Or another way to rephase my question:
does data.flush and checksumOut.flush guarantee
data be synchronized with underlying disk,
just like fsync().
Thanks
Thanh
On Wed, Nov 10, 2010 at 10:26 PM, Thanh Do wrote:
> Hi all,
>
> After reading the appenddesign3.pdf in HDFS-256,
> an
Hi all,
After reading the appenddesign3.pdf in HDFS-256,
and looking at the BlockReceiver.java code in 0.21.0,
I am confused by the following.
The document says that:
*For each packet, a DataNode in the pipeline has to do 3 things.
1. Stream data
a. Receive data from the upstream DataNode o
Thanks Eli,
I got it now.
On Fri, Nov 5, 2010 at 10:36 PM, Eli Collins wrote:
> Hey Thanh,
>
> Data gets written in 64KB packets so there doesn't seem to be a need
> to buffer it.
>
> Thanks,
> Eli
>
> On Thu, Nov 4, 2010 at 2:58 PM, Thanh Do wrote:
> > H
Hi all,
When a datanode receive a block, the datanode
write the block into 2 streams on disk:
- the data stream (dataOut)
- the checksum stream (checksumOut)
While the checksumOut is created with following code:
this.checksumOut = new DataOutputStream(new BufferedOutputStream(
Hi all,
Can some body tell me what is the block deleting
policies/mechanism in HDFS? or point me to the
source files where i can look it up.
It this some what similar to the garbage collection
technique describe in The Google File System papers.
Thanks
Thanh
, 2010 at 7:37 PM, Brian Bockelman wrote:
>
> On Oct 13, 2010, at 7:29 PM, Thanh Do wrote:
>
> > Hi Brian,
> >
> > If this is the case, then is there any chance that,
> > some how the DataBlockScanner cannot finishes
> > the verification for all the block in three
; That is correct. Last time I read the code, Hadoop scheduled the block
> verifications randomly throughout the period in order to avoid periodic
> effects (i.e., high load every N minutes).
>
> Brian
>
> On Oct 13, 2010, at 7:14 PM, Thanh Do wrote:
>
> > Brian,
> >
vailable to the
> scanning thread, you can specify impossibly small periods.
>
> Brian
>
> On Oct 13, 2010, at 7:01 PM, Thanh Do wrote:
>
> > Hi again,
> >
> > Could any body explain to me about the scanning period
> > policy of DataBlockScanner? That is who
Hi again,
Could any body explain to me about the scanning period
policy of DataBlockScanner? That is who often it wake up
and scan a block file.
When looking at the code, I found
static final long DEFAULT_SCAN_PERIOD_HOURS = 21*24L; // three weeks
but definitely it does not wake up and pick a
the same directory exceed a certain value.
>
> -dhruba
>
>
> On Mon, Oct 11, 2010 at 1:15 PM, Thanh Do wrote:
>
> > Hi all,
> >
> > can anyone explain to me while do HDFS has the policy
> > to store 64 block files in a single sub directory?
> > and
Hi all,
can anyone explain to me while do HDFS has the policy
to store 64 block files in a single sub directory?
and if the number of block files increase,
it just simply creates another subdir and put the block files there.
Thanks
Thanh
[
https://issues.apache.org/jira/browse/HDFS-1384?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Thanh Do reopened HDFS-1384:
Dhruba,
I think I make a bad description of the bug.
The excludedList does the job.
But in this case, the
: HDFS-1384
URL: https://issues.apache.org/jira/browse/HDFS-1384
Project: Hadoop HDFS
Issue Type: Bug
Affects Versions: 0.20.1
Reporter: Thanh Do
We saw a case that NN keeps giving client nodes from the same rack, hence an
exception
from client
Project: Hadoop HDFS
Issue Type: Bug
Components: name-node
Reporter: Thanh Do
We experienced a data loss situation that due to double failures.
One is transient disk failure with edits logs and the other is corrupted fstime.
Here is the detail:
1. NameNode
client
Affects Versions: 0.20-append
Reporter: Thanh Do
1. Say we have 2 racks: rack-0 and rack-1.
Rack-0 has dn1, dn2, dn3. Rack-0 has dn4, dn5, dn6.
2. Suppose client is in rack-0, and the write pipeline is:
client --> localnode --> other rack --> other rack
In this e
Project: Hadoop HDFS
Issue Type: Bug
Components: data-node
Affects Versions: 0.20-append
Reporter: Thanh Do
- Component: data node
- Version: 0.20-append
- Setup:
1) # disks / datanode = 3
2) # failures = 2
3) failure type = crash
4) When/where failure
Components: data-node
Affects Versions: 0.20-append
Reporter: Thanh Do
- Component: data node
- Version: 0.20-append
- Summary: we found a case that when a block is truncated during updateBlock,
the length on the ongoingCreates is not updated, hence leading to failed append
Hi,
I look at FSConstants and see this
LEASE_RECOVER_PERIOD = 10 * 1000; // i.e 10 seconds
and the only place this is used is in:
INodeFileUnderConstruction.setLastRecoveryTime()
Can any one explain to me the intuition behind this?
Why this value is fixed to be 10 seconds?
Thanks
--
thanh
Reporter: Thanh Do
- Setups:
number of datanodes = 2
replication factor = 2
Type of failure: transient fault (a java i/o call throws an exception or return
false)
Number of failures = 2
when/where failures happen = during the 2nd phase of the pipeline, each happens
at each datanode
: Bug
Components: hdfs client
Affects Versions: 0.20.1
Reporter: Thanh Do
- Setup:
+ # datanodes = 2
+ replication factor = 2
+ failure type = transient (i.e. a java I/O call that throws I/O Exception or
returns false)
+ # failures = 2
+ When/where failures happen: (This
Components: hdfs client
Affects Versions: 0.20.1
Reporter: Thanh Do
- Setup:
number of datanodes = 4
replication factor = 2 (2 datanodes in the pipeline)
number of failure injected = 2
failure type: crash
Where/When failures happen: There are two scenarios: First, is when two
Reporter: Thanh Do
Summary:
Client uselessly retries recoverBlock 5 times
The same behavior is also seen in append protocol (HDFS-1229)
The setup:
# available datanodes = 4
Replication factor = 2 (hence there are 2 datanodes in the pipeline)
Failure type = Bad disk at datanode (not crashes
Issue Type: Bug
Components: name-node
Reporter: Thanh Do
This bug has been reported.
Basically since datanode's hearbeat messages are infrequent (~ every 10
minutes),
NameNode always gives the client the same datanode even if the datanode is dead.
We want to poin
Issue Type: Bug
Components: name-node
Affects Versions: 0.20.1
Reporter: Thanh Do
- Summary: Datanode 'alive' but with its disk failed, Namenode still thinks
it's alive
- Setups:
+ Replication = 1
+ # available datanodes = 2
+ # disks / datanode
Project: Hadoop HDFS
Issue Type: Bug
Components: data-node
Affects Versions: 0.20.1
Reporter: Thanh Do
- Summary: block is corrupted if a crash happens before writing to checksumOut
but
after writing to dataOut.
- Setup:
+ # available datanodes = 1
Reporter: Thanh Do
- Summary: failover bug, bad retry logic at DFSClient, cannot failover to the
2nd disk
- Setups:
+ # available datanodes = 1
+ # disks / datanode = 2
+ # failures = 1
+ failure type = bad disk
+ When/where failure happens = (see below)
- Details:
The setup is:
1 datanode, 1
client
Affects Versions: 0.20.1
Reporter: Thanh Do
- Summary: the recoverBlock is not atomic, leading retrial fails when
facing a failure.
- Setup:
+ # available datanodes = 3
+ # disks / datanode = 1
+ # failures = 2
+ failure type = crash
+ When/where failure happens = (see
: Hadoop HDFS
Issue Type: Bug
Components: hdfs client
Affects Versions: 0.20.1
Reporter: Thanh Do
- Setup:
+ # available datanodes = 2
+ # disks / datanode = 1
+ # failures = 1
+ failure type = crash
+ When/where failure happens = during primary's recover
Components: data-node
Affects Versions: 0.20.1
Reporter: Thanh Do
- Summary: when appending to partial block, if is possible that
retrial when facing an exception fails due to a checksum mismatch.
Append operation is not atomic (either complete or fail completely).
- Setup
Affects Versions: 0.20.1
Reporter: Thanh Do
- Summary: client append is not atomic, hence, it is possible that
when retrying during append, there is an exception in updateBlock
indicating unmatched file length, making append failed.
- Setup:
+ # available datanodes = 3
+ # disks
Issue Type: Bug
Components: data-node
Affects Versions: 0.20.1
Reporter: Thanh Do
- Summary: the last block is unavailable to subsequent readers if appender
crashes in the
middle of appending workload.
- Setup:
# available datanodes = 3
# disks / datanode = 1
Affects Versions: 0.20.1
Reporter: Thanh Do
- Summary: Block is lost if primary datanode crashes in the middle
tryUpdateBlock.
- Setup:
# available datanode = 2
# replica = 2
# disks / datanode = 1
# failures = 1
# failure type = crash
When/where failure happens = (see below
Stale connection makes node miss append
---
Key: HDFS-1224
URL: https://issues.apache.org/jira/browse/HDFS-1224
Project: Hadoop HDFS
Issue Type: Bug
Reporter: Thanh Do
- Summary: if a
Components: data-node
Affects Versions: 0.20.1
Reporter: Thanh Do
A datanode can store block files in multiple volumes.
If a datanode sees a bad volume during start up (i.e, face an exception
when accessing that volume), it simply fail stops, making all block files
stored in other
Components: name-node
Affects Versions: 0.20.1
Reporter: Thanh Do
Despite the ability to configure multiple name directories
(to store fsimage) and edits directories, the NameNode will fail stop
in most of the time it faces exception when accessing to these directories.
NameNode
Versions: 0.20.1
Reporter: Thanh Do
- Summary:
If a crash happens during FSEditLog.createEditLogFile(), the
edits log file on disk may be stale. During next reboot, NameNode
will get an exception when parsing the edits file, because of stale data,
leading to unsuccessful reboot
Affects Versions: 0.20.1
Reporter: Thanh Do
- Summary: updating fstime file on disk is not atomic, so it is possible that
if a crash happens in the middle, next time when NameNode reboots, it will
read stale fstime, hence unable to start successfully.
- Details:
Below is the code for
: 0.20.2
Reporter: Thanh Do
We found this problem almost at the same time as HDFS developers.
Basically, the edits log is truncated before fsimage.ckpt is renamed to fsimage.
Hence, any crash happens after the truncation but before the renaming will lead
to a data loss. Detailed description
59 matches
Mail list logo