[jira] [Created] (HDFS-7870) remove libuuid dependency

2015-03-02 Thread Thanh Do (JIRA)
Thanh Do created HDFS-7870: -- Summary: remove libuuid dependency Key: HDFS-7870 URL: https://issues.apache.org/jira/browse/HDFS-7870 Project: Hadoop HDFS Issue Type: Sub-task Reporter

[jira] [Created] (HDFS-7862) Revisit the use of long data type

2015-02-27 Thread Thanh Do (JIRA)
Thanh Do created HDFS-7862: -- Summary: Revisit the use of long data type Key: HDFS-7862 URL: https://issues.apache.org/jira/browse/HDFS-7862 Project: Hadoop HDFS Issue Type: Sub-task

[jira] [Created] (HDFS-7861) Revisit Windows socket API compatibility

2015-02-27 Thread Thanh Do (JIRA)
Thanh Do created HDFS-7861: -- Summary: Revisit Windows socket API compatibility Key: HDFS-7861 URL: https://issues.apache.org/jira/browse/HDFS-7861 Project: Hadoop HDFS Issue Type: Sub-task

[jira] [Created] (HDFS-7860) Get HA NameNode information from config file

2015-02-27 Thread Thanh Do (JIRA)
Thanh Do created HDFS-7860: -- Summary: Get HA NameNode information from config file Key: HDFS-7860 URL: https://issues.apache.org/jira/browse/HDFS-7860 Project: Hadoop HDFS Issue Type: Sub-task

[jira] [Resolved] (HDFS-7768) Separate Platform specific funtions

2015-02-19 Thread Thanh Do (JIRA)
[ https://issues.apache.org/jira/browse/HDFS-7768?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Thanh Do resolved HDFS-7768. Resolution: Invalid Overlapped with HDFS-7188 > Separate Platform specific funti

[jira] [Created] (HDFS-7768) Separate Platform specific funtions

2015-02-10 Thread Thanh Do (JIRA)
Thanh Do created HDFS-7768: -- Summary: Separate Platform specific funtions Key: HDFS-7768 URL: https://issues.apache.org/jira/browse/HDFS-7768 Project: Hadoop HDFS Issue Type: Sub-task

[jira] [Created] (HDFS-7577) Add additional headers that includes need by Windows

2014-12-30 Thread Thanh Do (JIRA)
Thanh Do created HDFS-7577: -- Summary: Add additional headers that includes need by Windows Key: HDFS-7577 URL: https://issues.apache.org/jira/browse/HDFS-7577 Project: Hadoop HDFS Issue Type: Sub

[jira] [Created] (HDFS-7574) Make cmake work in Windows Visual Studio 2010

2014-12-29 Thread Thanh Do (JIRA)
Thanh Do created HDFS-7574: -- Summary: Make cmake work in Windows Visual Studio 2010 Key: HDFS-7574 URL: https://issues.apache.org/jira/browse/HDFS-7574 Project: Hadoop HDFS Issue Type: Sub-task

snapshot

2011-05-31 Thread Thanh Do
hi all, is there any work on snapshoting HDFS going on? Can anybody give me some hint on what the current state of the art in HDFS snapshoting. thanks a lot.

Re: silent data loss during append

2011-04-15 Thread Thanh Do
I am using cloudera's distribution version: hadoop-0.20.2+738. On Thu, Apr 14, 2011 at 6:23 PM, Ted Dunning wrote: > What version are you using? > > > On Thu, Apr 14, 2011 at 3:55 PM, Thanh Do wrote: > >> Hi all, >> >> I have recently seen silent data l

silent data loss during append

2011-04-14 Thread Thanh Do
Hi all, I have recently seen silent data loss in our system. Here is the case: 1. client appends to some block 2. for some reason, commitBlockSynchronization returns successfully with synclist = [] (i.e empty) 3. in the client code, NO exception is thrown, and client appends successfully.

Standby Node

2011-03-14 Thread Thanh Do
Hi all, Backup Node is there in 0.21.0, but I am curious about the Standby Node progress. Anybody is working on that? Thanks Thanh

Automatic recover corrupted files

2010-11-24 Thread Thanh Do
We have a corrupted file which has only one block. It turns out that all checksum files of the replicas are corrupted... but the data files are OK... How to recover this file? I can think of trying to use shell get file with -ingorecrc option. Then put it into HDFS again... But can the system aut

Re: DataBlockScanner scan period

2010-11-23 Thread Thanh Do
3, 2010 at 7:37 PM, Brian Bockelman wrote: > > On Oct 13, 2010, at 7:29 PM, Thanh Do wrote: > > > Hi Brian, > > > > If this is the case, then is there any chance that, > > some how the DataBlockScanner cannot finishes > > the verification for all the block

HDFS benchmarks

2010-11-22 Thread Thanh Do
Hi all, Is there any benchmarks for HDFS available) (measuring read/write throughput, latency and such). It would be great if somebody point me to any source. Thanks much Thanh

Re: transferToAllowed

2010-11-14 Thread Thanh Do
got it here https://issues.apache.org/jira/browse/HADOOP-3164 On Sun, Nov 14, 2010 at 11:31 AM, Thanh Do wrote: > Hi all, > > Can somebody let me know what is this > parameter used for: > > dfs.datanode.transferTo.allowed > > It is not in default config, > and the ma

transferToAllowed

2010-11-14 Thread Thanh Do
Hi all, Can somebody let me know what is this parameter used for: dfs.datanode.transferTo.allowed It is not in default config, and the maxChunksPerPacket depends on it. Thanks so much. Thanh

fsync() when finalize data?

2010-11-12 Thread Thanh Do
Hi all, Does current implementation (0.21.0) fsync() once a datanode finalizes a replica? I looked into the source and it seems to me that no fsync() is call once the replica is finalized. Am I wrong here? Thanks. Thanh

Re: Why datanode does a flush to disk after receiving a packet

2010-11-11 Thread Thanh Do
On Thu, Nov 11, 2010 at 12:43 PM, Thanh Do wrote: > > > Thank you all for clarification guys. > > I also looked at 0.20-append trunk and see that the order is totally > > different. > > > > One more thing, do you guys plan to implement hsync(), i.e API3 > >

Re: Why datanode does a flush to disk after receiving a packet

2010-11-11 Thread Thanh Do
write is more for the purpose of > > implementation simplification. Currently readers do not read from > DataNode > > buffer. They only read from system buffer. A flush makes the data visible > > to > > readers sooner. > > > > Hairong > > > > On 11/11/

Re: Why datanode does a flush to disk after receiving a packet

2010-11-11 Thread Thanh Do
he API that will eventually go all the way to disk, but it > has not yet been implemented. > > -Todd > > On Wednesday, November 10, 2010, Thanh Do wrote: > > Or another way to rephase my question: > > does data.flush and checksumOut.flush guarantee > > da

Re: Why datanode does a flush to disk after receiving a packet

2010-11-10 Thread Thanh Do
Or another way to rephase my question: does data.flush and checksumOut.flush guarantee data be synchronized with underlying disk, just like fsync(). Thanks Thanh On Wed, Nov 10, 2010 at 10:26 PM, Thanh Do wrote: > Hi all, > > After reading the appenddesign3.pdf in HDFS-256, > an

Why datanode does a flush to disk after receiving a packet

2010-11-10 Thread Thanh Do
Hi all, After reading the appenddesign3.pdf in HDFS-256, and looking at the BlockReceiver.java code in 0.21.0, I am confused by the following. The document says that: *For each packet, a DataNode in the pipeline has to do 3 things. 1. Stream data a. Receive data from the upstream DataNode o

Re: Why dataOut is FileOutputStream?

2010-11-06 Thread Thanh Do
Thanks Eli, I got it now. On Fri, Nov 5, 2010 at 10:36 PM, Eli Collins wrote: > Hey Thanh, > > Data gets written in 64KB packets so there doesn't seem to be a need > to buffer it. > > Thanks, > Eli > > On Thu, Nov 4, 2010 at 2:58 PM, Thanh Do wrote: > > H

Why dataOut is FileOutputStream?

2010-11-04 Thread Thanh Do
Hi all, When a datanode receive a block, the datanode write the block into 2 streams on disk: - the data stream (dataOut) - the checksum stream (checksumOut) While the checksumOut is created with following code: this.checksumOut = new DataOutputStream(new BufferedOutputStream(

HDFS block deleting policy

2010-10-28 Thread Thanh Do
Hi all, Can some body tell me what is the block deleting policies/mechanism in HDFS? or point me to the source files where i can look it up. It this some what similar to the garbage collection technique describe in The Google File System papers. Thanks Thanh

Re: DataBlockScanner scan period

2010-10-13 Thread Thanh Do
, 2010 at 7:37 PM, Brian Bockelman wrote: > > On Oct 13, 2010, at 7:29 PM, Thanh Do wrote: > > > Hi Brian, > > > > If this is the case, then is there any chance that, > > some how the DataBlockScanner cannot finishes > > the verification for all the block in three

Re: DataBlockScanner scan period

2010-10-13 Thread Thanh Do
; That is correct. Last time I read the code, Hadoop scheduled the block > verifications randomly throughout the period in order to avoid periodic > effects (i.e., high load every N minutes). > > Brian > > On Oct 13, 2010, at 7:14 PM, Thanh Do wrote: > > > Brian, > >

Re: DataBlockScanner scan period

2010-10-13 Thread Thanh Do
vailable to the > scanning thread, you can specify impossibly small periods. > > Brian > > On Oct 13, 2010, at 7:01 PM, Thanh Do wrote: > > > Hi again, > > > > Could any body explain to me about the scanning period > > policy of DataBlockScanner? That is who

DataBlockScanner scan period

2010-10-13 Thread Thanh Do
Hi again, Could any body explain to me about the scanning period policy of DataBlockScanner? That is who often it wake up and scan a block file. When looking at the code, I found static final long DEFAULT_SCAN_PERIOD_HOURS = 21*24L; // three weeks but definitely it does not wake up and pick a

Re: Reason to store 64 block file in a sub directory?

2010-10-11 Thread Thanh Do
the same directory exceed a certain value. > > -dhruba > > > On Mon, Oct 11, 2010 at 1:15 PM, Thanh Do wrote: > > > Hi all, > > > > can anyone explain to me while do HDFS has the policy > > to store 64 block files in a single sub directory? > > and

Reason to store 64 block file in a sub directory?

2010-10-11 Thread Thanh Do
Hi all, can anyone explain to me while do HDFS has the policy to store 64 block files in a single sub directory? and if the number of block files increase, it just simply creates another subdir and put the block files there. Thanks Thanh

[jira] Reopened: (HDFS-1384) NameNode should give client the first node in the pipeline from different rack other than that of excludedNodes list in the same rack.

2010-09-11 Thread Thanh Do (JIRA)
[ https://issues.apache.org/jira/browse/HDFS-1384?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Thanh Do reopened HDFS-1384: Dhruba, I think I make a bad description of the bug. The excludedList does the job. But in this case, the

[jira] Created: (HDFS-1384) NameNode should give client the first node in the pipeline from different rack other than that of excludedNodes list in the same rack.

2010-09-07 Thread Thanh Do (JIRA)
: HDFS-1384 URL: https://issues.apache.org/jira/browse/HDFS-1384 Project: Hadoop HDFS Issue Type: Bug Affects Versions: 0.20.1 Reporter: Thanh Do We saw a case that NN keeps giving client nodes from the same rack, hence an exception from client

[jira] Created: (HDFS-1382) A transient failure with edits log and a corrupted fstime together could lead to a data loss

2010-09-07 Thread Thanh Do (JIRA)
Project: Hadoop HDFS Issue Type: Bug Components: name-node Reporter: Thanh Do We experienced a data loss situation that due to double failures. One is transient disk failure with edits logs and the other is corrupted fstime. Here is the detail: 1. NameNode

[jira] Created: (HDFS-1380) The append pipeline does not followed TSP principal

2010-09-07 Thread Thanh Do (JIRA)
client Affects Versions: 0.20-append Reporter: Thanh Do 1. Say we have 2 racks: rack-0 and rack-1. Rack-0 has dn1, dn2, dn3. Rack-0 has dn4, dn5, dn6. 2. Suppose client is in rack-0, and the write pipeline is: client --> localnode --> other rack --> other rack In this e

[jira] Created: (HDFS-1337) Unmatched file length makes append fail. Should we retry if a startBlockRecovery() fails?

2010-08-09 Thread Thanh Do (JIRA)
Project: Hadoop HDFS Issue Type: Bug Components: data-node Affects Versions: 0.20-append Reporter: Thanh Do - Component: data node - Version: 0.20-append - Setup: 1) # disks / datanode = 3 2) # failures = 2 3) failure type = crash 4) When/where failure

[jira] Created: (HDFS-1336) TruncateBlock does not update in-memory information correctly

2010-08-09 Thread Thanh Do (JIRA)
Components: data-node Affects Versions: 0.20-append Reporter: Thanh Do - Component: data node - Version: 0.20-append - Summary: we found a case that when a block is truncated during updateBlock, the length on the ongoingCreates is not updated, hence leading to failed append

meaning of LEASE_RECOVER_PERIOD

2010-07-24 Thread Thanh Do
Hi, I look at FSConstants and see this LEASE_RECOVER_PERIOD = 10 * 1000; // i.e 10 seconds and the only place this is used is in: INodeFileUnderConstruction.setLastRecoveryTime() Can any one explain to me the intuition behind this? Why this value is fixed to be 10 seconds? Thanks -- thanh

[jira] Created: (HDFS-1239) All datanodes are bad in 2nd phase

2010-06-16 Thread Thanh Do (JIRA)
Reporter: Thanh Do - Setups: number of datanodes = 2 replication factor = 2 Type of failure: transient fault (a java i/o call throws an exception or return false) Number of failures = 2 when/where failures happen = during the 2nd phase of the pipeline, each happens at each datanode

[jira] Created: (HDFS-1238) A block is stuck in ongoingRecovery due to exception not propagated

2010-06-16 Thread Thanh Do (JIRA)
: Bug Components: hdfs client Affects Versions: 0.20.1 Reporter: Thanh Do - Setup: + # datanodes = 2 + replication factor = 2 + failure type = transient (i.e. a java I/O call that throws I/O Exception or returns false) + # failures = 2 + When/where failures happen: (This

[jira] Created: (HDFS-1237) Client logic for 1st phase and 2nd phase failover are different

2010-06-16 Thread Thanh Do (JIRA)
Components: hdfs client Affects Versions: 0.20.1 Reporter: Thanh Do - Setup: number of datanodes = 4 replication factor = 2 (2 datanodes in the pipeline) number of failure injected = 2 failure type: crash Where/When failures happen: There are two scenarios: First, is when two

[jira] Created: (HDFS-1236) Client uselessly retries recoverBlock 5 times

2010-06-16 Thread Thanh Do (JIRA)
Reporter: Thanh Do Summary: Client uselessly retries recoverBlock 5 times The same behavior is also seen in append protocol (HDFS-1229) The setup: # available datanodes = 4 Replication factor = 2 (hence there are 2 datanodes in the pipeline) Failure type = Bad disk at datanode (not crashes

[jira] Created: (HDFS-1235) Namenode returning the same Datanode to client, due to infrequent heartbeat

2010-06-16 Thread Thanh Do (JIRA)
Issue Type: Bug Components: name-node Reporter: Thanh Do This bug has been reported. Basically since datanode's hearbeat messages are infrequent (~ every 10 minutes), NameNode always gives the client the same datanode even if the datanode is dead. We want to poin

[jira] Created: (HDFS-1234) Datanode 'alive' but with its disk failed, Namenode thinks it's alive

2010-06-16 Thread Thanh Do (JIRA)
Issue Type: Bug Components: name-node Affects Versions: 0.20.1 Reporter: Thanh Do - Summary: Datanode 'alive' but with its disk failed, Namenode still thinks it's alive - Setups: + Replication = 1 + # available datanodes = 2 + # disks / datanode

[jira] Created: (HDFS-1232) Corrupted block if a crash happens before writing to checksumOut but after writing to dataOut

2010-06-16 Thread Thanh Do (JIRA)
Project: Hadoop HDFS Issue Type: Bug Components: data-node Affects Versions: 0.20.1 Reporter: Thanh Do - Summary: block is corrupted if a crash happens before writing to checksumOut but after writing to dataOut. - Setup: + # available datanodes = 1

[jira] Created: (HDFS-1233) Bad retry logic at DFSClient

2010-06-16 Thread Thanh Do (JIRA)
Reporter: Thanh Do - Summary: failover bug, bad retry logic at DFSClient, cannot failover to the 2nd disk - Setups: + # available datanodes = 1 + # disks / datanode = 2 + # failures = 1 + failure type = bad disk + When/where failure happens = (see below) - Details: The setup is: 1 datanode, 1

[jira] Created: (HDFS-1231) Generation Stamp mismatches, leading to failed append

2010-06-16 Thread Thanh Do (JIRA)
client Affects Versions: 0.20.1 Reporter: Thanh Do - Summary: the recoverBlock is not atomic, leading retrial fails when facing a failure. - Setup: + # available datanodes = 3 + # disks / datanode = 1 + # failures = 2 + failure type = crash + When/where failure happens = (see

[jira] Created: (HDFS-1229) DFSClient incorrectly asks for new block if primary crashes during first recoverBlock

2010-06-16 Thread Thanh Do (JIRA)
: Hadoop HDFS Issue Type: Bug Components: hdfs client Affects Versions: 0.20.1 Reporter: Thanh Do - Setup: + # available datanodes = 2 + # disks / datanode = 1 + # failures = 1 + failure type = crash + When/where failure happens = during primary's recover

[jira] Created: (HDFS-1228) CRC does not match when retrying appending a partial block

2010-06-16 Thread Thanh Do (JIRA)
Components: data-node Affects Versions: 0.20.1 Reporter: Thanh Do - Summary: when appending to partial block, if is possible that retrial when facing an exception fails due to a checksum mismatch. Append operation is not atomic (either complete or fail completely). - Setup

[jira] Created: (HDFS-1227) UpdateBlock fails due to unmatched file length

2010-06-16 Thread Thanh Do (JIRA)
Affects Versions: 0.20.1 Reporter: Thanh Do - Summary: client append is not atomic, hence, it is possible that when retrying during append, there is an exception in updateBlock indicating unmatched file length, making append failed. - Setup: + # available datanodes = 3 + # disks

[jira] Created: (HDFS-1226) Last block is temporary unavailable for readers because of crashed appender

2010-06-16 Thread Thanh Do (JIRA)
Issue Type: Bug Components: data-node Affects Versions: 0.20.1 Reporter: Thanh Do - Summary: the last block is unavailable to subsequent readers if appender crashes in the middle of appending workload. - Setup: # available datanodes = 3 # disks / datanode = 1

[jira] Created: (HDFS-1225) Block lost when primary crashes in recoverBlock

2010-06-16 Thread Thanh Do (JIRA)
Affects Versions: 0.20.1 Reporter: Thanh Do - Summary: Block is lost if primary datanode crashes in the middle tryUpdateBlock. - Setup: # available datanode = 2 # replica = 2 # disks / datanode = 1 # failures = 1 # failure type = crash When/where failure happens = (see below

[jira] Created: (HDFS-1224) Stale connection makes node miss append

2010-06-16 Thread Thanh Do (JIRA)
Stale connection makes node miss append --- Key: HDFS-1224 URL: https://issues.apache.org/jira/browse/HDFS-1224 Project: Hadoop HDFS Issue Type: Bug Reporter: Thanh Do - Summary: if a

[jira] Created: (HDFS-1223) DataNode fails stop due to a bad disk (or storage directory)

2010-06-16 Thread Thanh Do (JIRA)
Components: data-node Affects Versions: 0.20.1 Reporter: Thanh Do A datanode can store block files in multiple volumes. If a datanode sees a bad volume during start up (i.e, face an exception when accessing that volume), it simply fail stops, making all block files stored in other

[jira] Created: (HDFS-1222) NameNode fail stop in spite of multiple metadata directories

2010-06-16 Thread Thanh Do (JIRA)
Components: name-node Affects Versions: 0.20.1 Reporter: Thanh Do Despite the ability to configure multiple name directories (to store fsimage) and edits directories, the NameNode will fail stop in most of the time it faces exception when accessing to these directories. NameNode

[jira] Created: (HDFS-1221) NameNode unable to start due to stale edits log after a crash

2010-06-16 Thread Thanh Do (JIRA)
Versions: 0.20.1 Reporter: Thanh Do - Summary: If a crash happens during FSEditLog.createEditLogFile(), the edits log file on disk may be stale. During next reboot, NameNode will get an exception when parsing the edits file, because of stale data, leading to unsuccessful reboot

[jira] Created: (HDFS-1220) Namenode unable to start due to truncated fstime

2010-06-16 Thread Thanh Do (JIRA)
Affects Versions: 0.20.1 Reporter: Thanh Do - Summary: updating fstime file on disk is not atomic, so it is possible that if a crash happens in the middle, next time when NameNode reboots, it will read stale fstime, hence unable to start successfully. - Details: Below is the code for

[jira] Created: (HDFS-1219) Data Loss due to edits log truncation

2010-06-16 Thread Thanh Do (JIRA)
: 0.20.2 Reporter: Thanh Do We found this problem almost at the same time as HDFS developers. Basically, the edits log is truncated before fsimage.ckpt is renamed to fsimage. Hence, any crash happens after the truncation but before the renaming will lead to a data loss. Detailed description