Re: [DISCUSS] Make AsyncFSWAL the default WAL in 2.0

2016-05-12 Thread Sean Busbey
On Wed, May 11, 2016 at 7:53 PM, 张铎 wrote: > I think at that time I will start a new project called AsyncDFSClient which > will implement the whole client side logic of HDFS without using reflection > :) > If we end up in this dystopian future, then please have that project live as a subproject o

Re: [DISCUSS] Make AsyncFSWAL the default WAL in 2.0

2016-05-12 Thread Nick Dimiduk
On Wed, May 11, 2016 at 10:28 PM, Andrew Purtell wrote: > All you have to do is stick around long enough. Hadoop 0.20-append v2 :-) > *palm-all-the-faces* > On May 11, 2016, at 9:46 PM, Stack wrote: > > > >> On Wed, May 11, 2016 at 7:53 PM, 张铎 wrote: > >> > >> I think at that time I will star

Re: [DISCUSS] Make AsyncFSWAL the default WAL in 2.0

2016-05-11 Thread Andrew Purtell
All you have to do is stick around long enough. Hadoop 0.20-append v2 :-) > On May 11, 2016, at 9:46 PM, Stack wrote: > >> On Wed, May 11, 2016 at 7:53 PM, 张铎 wrote: >> >> I think at that time I will start a new project called AsyncDFSClient which >> will implement the whole client side logic

Re: [DISCUSS] Make AsyncFSWAL the default WAL in 2.0

2016-05-11 Thread Stack
On Wed, May 11, 2016 at 7:53 PM, 张铎 wrote: > I think at that time I will start a new project called AsyncDFSClient which > will implement the whole client side logic of HDFS without using reflection > :) > > Haven't I seen this movie before? (smile) St.Ack > 2016-05-12 10:27 GMT+08:00 Andrew P

Re: [DISCUSS] Make AsyncFSWAL the default WAL in 2.0

2016-05-11 Thread 张铎
I think at that time I will start a new project called AsyncDFSClient which will implement the whole client side logic of HDFS without using reflection :) 2016-05-12 10:27 GMT+08:00 Andrew Purtell : > If Hadoop refuses the changes before we release, we can change the default > back. > > > On May

Re: [DISCUSS] Make AsyncFSWAL the default WAL in 2.0

2016-05-11 Thread Andrew Purtell
If Hadoop refuses the changes before we release, we can change the default back. On May 11, 2016, at 6:50 PM, Gary Helmling wrote: >> >> >> I was trying to avoid the below oft-repeated pattern at least for the case >> of critical developments: >> >> + New feature arrives after much work by

Re: [DISCUSS] Make AsyncFSWAL the default WAL in 2.0

2016-05-11 Thread Gary Helmling
> > > I was trying to avoid the below oft-repeated pattern at least for the case > of critical developments: > > + New feature arrives after much work by developer, reviewers and testers > accompanied by fanfare (blog, talks). > + Developers and reviewers move on after getting it committed or it ge

Re: [DISCUSS] Make AsyncFSWAL the default WAL in 2.0

2016-05-10 Thread 张铎
See HDFS-223 and HDFS-916. There are plenty of issues related. The most important thing is that we need a suitable api and there is an asynchronous file system proposal in HADOOP-12910 which does not fit our requirements so I need to stop it being committed first... And a default choice in a later

Re: [DISCUSS] Make AsyncFSWAL the default WAL in 2.0

2016-05-10 Thread Stack
On Tue, May 10, 2016 at 10:39 AM, Gary Helmling wrote: > > > > The suggestion is that we make this new client the default now in master > > branch so we have plenty of time to find any issues with the > > implementation. We'd also enable it as the default because the > improvement > > is dramatic

Re: [DISCUSS] Make AsyncFSWAL the default WAL in 2.0

2016-05-10 Thread Andrew Purtell
I'm not sure this should be default for 2.0 but I'd definitely like to see it an option we're comfortable supporting through the duration we are negotiating with HDFS. Would be one major reason why trying out a 2.0.0 release would be compelling. On May 10, 2016, at 10:51 AM, Gary Helmling wr

Re: [DISCUSS] Make AsyncFSWAL the default WAL in 2.0

2016-05-10 Thread Gary Helmling
> > Yeah the 'push to upstream' work has been started already. See here > > https://issues.apache.org/jira/browse/HADOOP-12910 > > But it is much harder to push code into HDFS than HBase. It is the core of > all hadoop systems and I do not have many contacts in the hdfs community... > > Yes, I'm fa

Re: [DISCUSS] Make AsyncFSWAL the default WAL in 2.0

2016-05-10 Thread Gary Helmling
> > The suggestion is that we make this new client the default now in master > branch so we have plenty of time to find any issues with the > implementation. We'd also enable it as the default because the improvement > is dramatic (performance, less moving parts, comprehensible, etc.) and we > thin

Re: [DISCUSS] Make AsyncFSWAL the default WAL in 2.0

2016-05-10 Thread Stack
On Mon, May 9, 2016 at 11:59 PM, Gary Helmling wrote: ... > To me, it seems much safer to actively try to push this upstream into HDFS > right now, and still pointing to its optional, non-default use in HBase as > a compelling story. I don't understand why making it the default in 2.0 is > nece

Re: [DISCUSS] Make AsyncFSWAL the default WAL in 2.0

2016-05-10 Thread 张铎
Some methods are moved from classes in hadoop-hdfs to classes in hadoop-hdfs-client. ClientProtocol.addBlock method adds an extra parameter. DFSClient.Conf is moved to a separated file and renamed to DFSClientConf. Not very hard. I promise that I can give a patch within 3 days after the release of

Re: [DISCUSS] Make AsyncFSWAL the default WAL in 2.0

2016-05-10 Thread 张铎
Yeah the 'push to upstream' work has been started already. See here https://issues.apache.org/jira/browse/HADOOP-12910 But it is much harder to push code into HDFS than HBase. It is the core of all hadoop systems and I do not have many contacts in the hdfs community... And it is more convincing

Re: [DISCUSS] Make AsyncFSWAL the default WAL in 2.0

2016-05-09 Thread Gary Helmling
Thanks for adding the tests and fixing up AES support. My only real concern is the maintainability of this code as our own private DFS client. The SASL support, for example, is largely based on reflection and reaches in to private fields of @InterfaceAudience.Private Hadoop classes. This seems b

Re: [DISCUSS] Make AsyncFSWAL the default WAL in 2.0

2016-05-09 Thread Stack
Any other suggestions/objections here? If not, will make the cut over in next day or so. Thanks, St.Ack On Thu, May 5, 2016 at 10:02 PM, Stack wrote: > On Thu, May 5, 2016 at 7:39 PM, Yu Li wrote: > >> Almost miss the party... >> >> bq. Do you think it worth to backport this feature to branch-1

Re: [DISCUSS] Make AsyncFSWAL the default WAL in 2.0

2016-05-05 Thread Stack
On Thu, May 5, 2016 at 7:39 PM, Yu Li wrote: > Almost miss the party... > > bq. Do you think it worth to backport this feature to branch-1 and release > it in the next 1.x release? This may introduce a compatibility issue as > said > in HBASE-14949 that we need HBASE-14949 to make sure that the r

Re: [DISCUSS] Make AsyncFSWAL the default WAL in 2.0

2016-05-05 Thread Yu Li
Almost miss the party... bq. Do you think it worth to backport this feature to branch-1 and release it in the next 1.x release? This may introduce a compatibility issue as said in HBASE-14949 that we need HBASE-14949 to make sure that the rolling upgrade does not lose data... >From current perf da

Re: [DISCUSS] Make AsyncFSWAL the default WAL in 2.0

2016-05-05 Thread Ted Yu
Thanks for your effort, Duo. I am in favor of turning AsyncWAL as default in master branch. Cheers On Thu, May 5, 2016 at 6:03 PM, 张铎 wrote: > Some progress. > > I have filed HBASE-15743 for the transparent encryption support, > and HBASE-15754 for the AES encryption UT. Now both of them are r

Re: [DISCUSS] Make AsyncFSWAL the default WAL in 2.0

2016-05-05 Thread 张铎
Some progress. I have filed HBASE-15743 for the transparent encryption support, and HBASE-15754 for the AES encryption UT. Now both of them are resolved. Let's resume the discussion here. Thanks. 2016-05-03 10:09 GMT+08:00 张铎 : > Fine, will add the testcase. > > And for the RPC, we only impleme

Re: [DISCUSS] Make AsyncFSWAL the default WAL in 2.0

2016-05-02 Thread 张铎
Fine, will add the testcase. And for the RPC, we only implement a new client side DTP here and still use the original RPC. Thanks. 2016-05-03 3:20 GMT+08:00 Gary Helmling : > On Fri, Apr 29, 2016 at 6:24 PM 张铎 wrote: > > > Yes, it does. There is testcase that enumerates all the possible > prot

Re: [DISCUSS] Make AsyncFSWAL the default WAL in 2.0

2016-05-02 Thread Gary Helmling
On Fri, Apr 29, 2016 at 6:24 PM 张铎 wrote: > Yes, it does. There is testcase that enumerates all the possible protection > level(authentication, integrity and privacy) and encryption algorithm(none, > 3des, rc4). > > > https://github.com/apache/hbase/blob/master/hbase-server/src/test/java/org/apac

Re: [DISCUSS] Make AsyncFSWAL the default WAL in 2.0

2016-05-01 Thread Stack
On Sat, Apr 30, 2016 at 10:06 PM, Sean Busbey wrote: > On Sat, Apr 30, 2016 at 1:34 PM, Stack wrote: > > On Sat, Apr 30, 2016 at 6:33 AM, Ted Yu wrote: > > > >> What about support for Transparent Data Encryption feature which was > >> introduced in Apache Hadoop 2.6.0 ? > >> > >> > > Transparen

Re: [DISCUSS] Make AsyncFSWAL the default WAL in 2.0

2016-05-01 Thread 张铎
I checked the code. The HdfsFileStatus returned when creating of a file inside encryption zone will contain a FileEncryptionInfo. DFSClient will create a CryptoOutputStream which wraps a DFSOutputStream based on the FileEntryptionInfo. Let me file issue to implement it. Just another piece of refle

Re: [DISCUSS] Make AsyncFSWAL the default WAL in 2.0

2016-04-30 Thread Sean Busbey
On Sat, Apr 30, 2016 at 1:34 PM, Stack wrote: > On Sat, Apr 30, 2016 at 6:33 AM, Ted Yu wrote: > >> What about support for Transparent Data Encryption feature which was >> introduced in Apache Hadoop 2.6.0 ? >> >> > Transparent: "...(of a process or interface) functioning without the user > being

Re: [DISCUSS] Make AsyncFSWAL the default WAL in 2.0

2016-04-30 Thread Stack
On Sat, Apr 30, 2016 at 6:33 AM, Ted Yu wrote: > What about support for Transparent Data Encryption feature which was > introduced in Apache Hadoop 2.6.0 ? > > Transparent: "...(of a process or interface) functioning without the user being aware of its presence." St.Ack > On Fri, Apr 29, 2016

Re: [DISCUSS] Make AsyncFSWAL the default WAL in 2.0

2016-04-30 Thread Ted Yu
What about support for Transparent Data Encryption feature which was introduced in Apache Hadoop 2.6.0 ? On Fri, Apr 29, 2016 at 6:24 PM, 张铎 wrote: > Yes, it does. There is testcase that enumerates all the possible protection > level(authentication, integrity and privacy) and encryption algorith

Re: [DISCUSS] Make AsyncFSWAL the default WAL in 2.0

2016-04-29 Thread 张铎
Yes, it does. There is testcase that enumerates all the possible protection level(authentication, integrity and privacy) and encryption algorithm(none, 3des, rc4). https://github.com/apache/hbase/blob/master/hbase-server/src/test/java/org/apache/hadoop/hbase/io/asyncfs/TestSaslFanOutOneBlockAsyncD

Re: [DISCUSS] Make AsyncFSWAL the default WAL in 2.0

2016-04-29 Thread Gary Helmling
How well has this been tested on secure clusters? I know SASL support was lacking initially, but I believe it had been added? Does AsyncFSWAL support all the HDFS transport encryption options? On Fri, Apr 29, 2016 at 12:05 AM Stack wrote: > I'm +1 on enabling asyncfswal as default in 2.0: > >

Re: [DISCUSS] Make AsyncFSWAL the default WAL in 2.0

2016-04-29 Thread Stack
I'm +1 on enabling asyncfswal as default in 2.0: + We'll have plenty of time to figure issues if any if we get it in now, early. + The improvement in throughput is substantial + There are now less moving parts + A critical piece of our write path is much less opaque in its workings and no longer (

Re: [DISCUSS] Make AsyncFSWAL the default WAL in 2.0

2016-04-28 Thread 张铎
I‘ve done dig in HDFS and HADOOP proejcts and found that there is an active issue HADOOP-12910 that related to asynchronous FileSystem implementation. I have left some comments on it, maybe we could start from there. Thanks. 2016-04-29 14:42 GMT+08:00 Stack : > On Thu, Apr 28, 2016 at 8:47 PM,

Re: [DISCUSS] Make AsyncFSWAL the default WAL in 2.0

2016-04-28 Thread Stack
On Thu, Apr 28, 2016 at 8:47 PM, Ted Yu wrote: > Last comment on HDFS-916 was from 2010. > > Suggest making a new issue or reviving discussion on HDFS-916 (currently > assigned to Todd). > > Duo is on it. Some mileage and confidence in the new code would be good to have before going to HDFS (Gett

Re: [DISCUSS] Make AsyncFSWAL the default WAL in 2.0

2016-04-28 Thread Stack
On Thu, Apr 28, 2016 at 8:34 PM, Heng Chen wrote: > The performance is quite great, but i think maybe we should collect some > experience on real production cluster before we make it as default. > > Yeah. Would be nice if a production deploy before we made the switch but in the absence of that,

Re: [DISCUSS] Make AsyncFSWAL the default WAL in 2.0

2016-04-28 Thread 张铎
2016-04-29 11:47 GMT+08:00 Ted Yu : > Last comment on HDFS-916 was from 2010. > > Suggest making a new issue or reviving discussion on HDFS-916 (currently > assigned to Todd). > > bq. The fallback implementation is not aim to get a good performance > > For more than two weeks, I have been working

Re: [DISCUSS] Make AsyncFSWAL the default WAL in 2.0

2016-04-28 Thread Ted Yu
Last comment on HDFS-916 was from 2010. Suggest making a new issue or reviving discussion on HDFS-916 (currently assigned to Todd). bq. The fallback implementation is not aim to get a good performance For more than two weeks, I have been working with Azure Data Lake developers so that all hbase

Re: [DISCUSS] Make AsyncFSWAL the default WAL in 2.0

2016-04-28 Thread 张铎
But 2.0 is not released yet... Do you think it worth to backport this feature to branch-1 and release it in the next 1.x release? This may introduce a compatibility issue as said in HBASE-14949 that we need HBASE-14949 to make sure that the rolling upgrade does not lose data... 2016-04-29 11:34 G

Re: [DISCUSS] Make AsyncFSWAL the default WAL in 2.0

2016-04-28 Thread 张铎
2016-04-29 11:35 GMT+08:00 Ted Yu : > bq. AsyncFSOutput will be in HDFS-3.0 > > Is there HDFS JIRA for the above ? Can you share the number ? > I have not filed a new one but there are bunch of related issues already, such as this one https://issues.apache.org/jira/browse/HDFS-916 > > bq. Just wr

Re: [DISCUSS] Make AsyncFSWAL the default WAL in 2.0

2016-04-28 Thread Ted Yu
+1 to what Heng said. On Thu, Apr 28, 2016 at 8:34 PM, Heng Chen wrote: > The performance is quite great, but i think maybe we should collect some > experience on real production cluster before we make it as default. > > 2016-04-29 11:30 GMT+08:00 张铎 : > > > Inline comments. > > Thanks, > > > >

Re: [DISCUSS] Make AsyncFSWAL the default WAL in 2.0

2016-04-28 Thread Ted Yu
bq. AsyncFSOutput will be in HDFS-3.0 Is there HDFS JIRA for the above ? Can you share the number ? bq. Just wrap FSDataOutputStream to make it act like an asynchronous output Can you be a bit more specific ? HBase currently works with WASB and Azure Data Lake. Does the above mean their performa

Re: [DISCUSS] Make AsyncFSWAL the default WAL in 2.0

2016-04-28 Thread Heng Chen
The performance is quite great, but i think maybe we should collect some experience on real production cluster before we make it as default. 2016-04-29 11:30 GMT+08:00 张铎 : > Inline comments. > Thanks, > > 2016-04-29 10:57 GMT+08:00 Sean Busbey : > > > I am nervous about having default out-of-th

Re: [DISCUSS] Make AsyncFSWAL the default WAL in 2.0

2016-04-28 Thread 张铎
Inline comments. Thanks, 2016-04-29 10:57 GMT+08:00 Sean Busbey : > I am nervous about having default out-of-the-box new HBase users reliant on > a bespoke HDFS client, especially given Hadoop's compatibility > promises and history. Answers for these questions would make me more > confident: > >

Re: [DISCUSS] Make AsyncFSWAL the default WAL in 2.0

2016-04-28 Thread Sean Busbey
I am nervous about having default out-of-the-box new HBase users reliant on a bespoke HDFS client, especially given Hadoop's compatibility promises and history. Answers for these questions would make me more confident: 1) Where are we on getting the client-side changes to HDFS pushed back upstream

[DISCUSS] Make AsyncFSWAL the default WAL in 2.0

2016-04-28 Thread Duo Zhang
Six month after I filed HBASE-14790... Now the AsyncFSWAL is ready. The WALPE result shows that it is *1.4x~3.7x* faster than FSHLog. The ITBLL result turns out that it is *not bad* than FSHLog(the master branch is not that stable itself...). More details can be found on HBASE-15536. So here we