Re: HBase 2 ,bulk import question

2019-07-18 Thread OpenInx
> To add to that, the split will be done on the master, It's done locally, not master. say the LoadIncrementHFile tool will split the hfile locally if found anyone is cross two or more regions. On Fri, Jul 19, 2019 at 1:27 AM Jean-Marc Spaggiari wrote: > +1 to that last statement. (I think the

Re: [Announce] 张铎 (Duo Zhang) is Apache HBase PMC chair

2019-07-18 Thread OpenInx
Congratulations Duo! and thanks Misty. On Fri, Jul 19, 2019 at 9:34 AM Guanghao Zhang wrote: > Congratulations! > > Duo Zhang 于2019年7月19日周五 上午9:33写道: > > > Thanks Misty for the great job you have done these years. > > > > And thanks all for trusting me. Will try my best. > > > > Jan Hentschel

Re: [Announce] 张铎 (Duo Zhang) is Apache HBase PMC chair

2019-07-18 Thread Guanghao Zhang
Congratulations! Duo Zhang 于2019年7月19日周五 上午9:33写道: > Thanks Misty for the great job you have done these years. > > And thanks all for trusting me. Will try my best. > > Jan Hentschel 于2019年7月19日周五 上午3:55写道: > > > Congrats Duo! > > > > From: Andrew Purtell > > Reply-To: "user@hbase.apache.org"

Re: [Announce] 张铎 (Duo Zhang) is Apache HBase PMC chair

2019-07-18 Thread Duo Zhang
Thanks Misty for the great job you have done these years. And thanks all for trusting me. Will try my best. Jan Hentschel 于2019年7月19日周五 上午3:55写道: > Congrats Duo! > > From: Andrew Purtell > Reply-To: "user@hbase.apache.org" > Date: Thursday, July 18, 2019 at 7:52 PM > To: Hbase-User > Cc: HBa

Re: Thank you Misty

2019-07-18 Thread Zach York
Thank you for all you have done as our Chair, Misty! Best of luck! On Thu, Jul 18, 2019 at 10:54 AM Andrew Purtell wrote: > Thank you for serving as our Chair, Misty. Your reports were some of the > best I've ever seen in my ten or so years at the ASF. Best of luck to you > and yours in your fut

Re: [Announce] 张铎 (Duo Zhang) is Apache HBase PMC chair

2019-07-18 Thread Jan Hentschel
Congrats Duo! From: Andrew Purtell Reply-To: "user@hbase.apache.org" Date: Thursday, July 18, 2019 at 7:52 PM To: Hbase-User Cc: HBase Dev List , Duo Zhang , "priv...@hbase.apache.org" Subject: Re: [Announce] 张铎 (Duo Zhang) is Apache HBase PMC chair Congratulations Duo! Thank you for taking

Thank you Misty

2019-07-18 Thread Andrew Purtell
Thank you for serving as our Chair, Misty. Your reports were some of the best I've ever seen in my ten or so years at the ASF. Best of luck to you and yours in your future endeavors. On Thu, Jul 18, 2019 at 10:46 AM Misty Linville wrote: > It's been my honor to serve as your PMC chair since 2017

Re: [Announce] 张铎 (Duo Zhang) is Apache HBase PMC chair

2019-07-18 Thread Andrew Purtell
Congratulations Duo! Thank you for taking on the role of Chair. On Thu, Jul 18, 2019 at 10:46 AM Misty Linville wrote: > Each Apache project has a project management committee (PMC) that oversees > governance of the project, votes on new committers and PMC members, and > ensures that the softwar

[Announce] 张铎 (Duo Zhang) is Apache HBase PMC chair

2019-07-18 Thread Misty Linville
Each Apache project has a project management committee (PMC) that oversees governance of the project, votes on new committers and PMC members, and ensures that the software we produce adheres to the standards of the Foundation. One of the roles on the PMC is the PMC chair. The PMC chair represents

Re: HBase 2 ,bulk import question

2019-07-18 Thread Jean-Marc Spaggiari
+1 to that last statement. (I think the split is done locally where you run the command, not sure if it's in the master, but I can be wrong). Means if you have a single big giant file and 200 regions, it will require a lot a non distributed work... Le jeu. 18 juil. 2019 à 13:03, Austin Heyne a éc

Re: HBase 2 ,bulk import question

2019-07-18 Thread Austin Heyne
To add to that, the split will be done on the master, so if you anticipate a lot of splits it can be an issue. -Austin On 7/18/19 12:32 PM, Jean-Marc Spaggiari wrote: One think to add, when you will bulkload your files, if needed, they will be split according to the regions boundaries. Becaus

Re: HBase 2 ,bulk import question

2019-07-18 Thread Jean-Marc Spaggiari
One think to add, when you will bulkload your files, if needed, they will be split according to the regions boundaries. Because between when you start your job and when you push your files, there might have been some "natural" splits on the table side, the bulkloader has to be able to re-split you

Re: HBase 2 ,bulk import question

2019-07-18 Thread OpenInx
Austin is right. The pre-splitting is mainly used for generate&load HFiles, say when do bulkload, it will load each generated hfile to the corresponding region who include the rowkey interval of the hfile. If no pre-splitting, then all HFiles will be in one region, bulkload will be time-consuming a

Re: HBase 2 ,bulk import question

2019-07-18 Thread Austin Heyne
Bulk importing requires the table the data is being bulk imported into to already exists. This is because the mapreduce job needs to extract the region start/end keys in order to drive the reducers. This means that you need to create your table before hand, providing the appropriate pre-splitti

HBase 2 ,bulk import question

2019-07-18 Thread Michael
Hi, I looked at the possibility of bulk importing into hbase, but somehow I don't get it. I am not able to perform a presplitting of the data, so does bulk importing work without presplitting? As I understand it, instead of putting the data, I create the hbase region files, but all tutorials I rea