Hi I am attaching screenshot [image: Inline image 2] can anyone help me to figure out I can see that my first region was empty as their was no start rowkey same with end row key second my data actually disturbed on only 2 nodes i have 5 nodes
Thanks Manjeet On Mon, Sep 12, 2016 at 10:38 AM, Manjeet Singh <manjeet.chand...@gmail.com> wrote: > Thanks Ted for your inputs > I have write some algorithm to convert my some String to single char like > # $ ! etc and its my salt so based on these I know whats my salt > as my input data was so random and I need to know in advance what is my > rowkey (Hash like Md5 generates long string , which coz some performance > impact because my rowkey was getting log) > > In my lab testing I found that number of region created but one region > start row ! was empty > As i observe i create my table with pre split table with these char and > data did't come which starts with ! > > is their any way to distribute data equally to all region and I know what > what is my salt is its fix like !@#$% > > Thanks > Manjeet > > On Sat, Sep 10, 2016 at 7:04 AM, Ted Yu <yuzhih...@gmail.com> wrote: > >> Given ingestion rate of 7GB / hour, you would have ~ 5000GB data per >> month. >> That's about 500 regions. >> You would run out of ASCII character in position of #. >> >> Since mobile number is personal identification information, it is not >> prudent to directly use it in row key. >> You can search for commonly accepted practice on the internet. >> If you use hash of mobile number, you would avoid hot spotting. >> >> More detail of your use would be helpful in providing better answer. >> >> Cheers >> >> On Fri, Sep 9, 2016 at 6:09 PM, Manjeet Singh <manjeet.chand...@gmail.com >> > >> wrote: >> >> > Thanks Ted for links links will help to determine how region split what >> > should be the size etc which will really helpful >> > but can you correct me if I am not wrong does my understanding was >> correct >> > as I asked in trailing mail? >> > I know what will be the salt based on my Mobile number coming in my data >> > So assume for mobile number 9999999999 is # >> > so my rowkey is #_9999999999 >> > As i know in advance what is my exact rowkey i can distribute my data on >> > cluster to avoid HOTSpoting and i want to distribute my data equally on >> > cluster >> > So it is mandatory condition to create table according to my splits? >> > >> > Thanks >> > Manjeet >> > >> > On Sat, Sep 10, 2016 at 6:26 AM, Ted Yu <yuzhih...@gmail.com> wrote: >> > >> > > Please take a look at: >> > > >> > > http://hbase.apache.org/book.html#table_schema_rules_of_thumb >> > > http://hbase.apache.org/book.html#arch.regions.size >> > > http://hbase.apache.org/book.html#ops.capacity.regions >> > > http://hbase.apache.org/book.html#ops.capacity.regions.total >> > > >> > > On Fri, Sep 9, 2016 at 5:35 PM, Manjeet Singh < >> > manjeet.chand...@gmail.com> >> > > wrote: >> > > >> > > > Yeah its in weekdays >> > > > Yeah default is 10 gb so what is the way/forumla to knw what shuld >> be >> > the >> > > > size of RS >> > > > On 9 Sep 2016 19:03, "Ted Yu" <yuzhih...@gmail.com> wrote: >> > > > >> > > > > Can you clarify whether the incoming data rate is for weekdays ? >> > > > > >> > > > > At 6-7 Gb /Hour, you need to set larger region size. >> > > > > Default is 10GB. >> > > > > >> > > > > If you know roughly how the key space would be filled, presplit >> your >> > > > table >> > > > > accordingly. >> > > > > >> > > > > On Thu, Sep 8, 2016 at 11:24 PM, Manjeet Singh < >> > > > manjeet.chand...@gmail.com >> > > > > > >> > > > > wrote: >> > > > > >> > > > > > Hi All >> > > > > > >> > > > > > I have some basic question can anyone help me out >> > > > > > >> > > > > > Q1. this is my understanding To perform splitting I need to >> create >> > > > table >> > > > > > like below >> > > > > > create 'test_table','c1', SPLITS=>['#", '!', '$''] >> > > > > > >> > > > > > and I have to design row key in this way >> > > > > > #_123456789 >> > > > > > !_123456789 >> > > > > > $_123456789 >> > > > > > >> > > > > > so my data distributed on cluster >> > > > > > >> > > > > > My requirement is very simple I want to equally distributed >> data on >> > > > > regions >> > > > > > as per my rowkey only >> > > > > > >> > > > > > So please correct me if I am missing any thing? >> > > > > > >> > > > > > >> > > > > > Q2 If i have 5 regions on my each region server and I give 100 >> MB >> > > space >> > > > > by >> > > > > > using hbase.hregion.max.filesize property >> > > > > > >> > > > > > what will happen when my all regions fill with 100 MB data >> > > > > > Please note I have cron job secluded on every weekend and my >> > Incoming >> > > > > data >> > > > > > rate is 6-7 Gb /Hour. so my region get filled very fast >> > > > > > >> > > > > > >> > > > > > >> > > > > > >> > > > > > >> > > > > > Thanks >> > > > > > Manjeet >> > > > > > >> > > > > > >> > > > > > >> > > > > > >> > > > > > -- >> > > > > > luv all >> > > > > > >> > > > > >> > > > >> > > >> > >> > >> > >> > -- >> > luv all >> > >> > > > > -- > luv all > -- luv all