Re: [ceph-users] poor data distribution

2014-03-24 Thread Dominik Mostowiec
Hi, > FWIW the tunable that fixes this was just merged today but won't > appear in a release for another 3 weeks or so. This is "vary_r tunable" ? Can I use this in production? -- Regards Dominik 2014-02-12 3:24 GMT+01:00 Sage Weil : > On Wed, 12 Feb 2014, Dominik Mostowiec wrote: >> Hi, >> If

Re: [ceph-users] poor data distribution

2014-02-11 Thread Sage Weil
On Wed, 12 Feb 2014, Dominik Mostowiec wrote: > Hi, > If this problem (with stucked active+remapped pgs after > reweight-by-utilisation) affects all ceph configurations or only > specific ones? > If specific: what is the reason in my case? Is this caused by crush > configuration (cluster architectu

Re: [ceph-users] poor data distribution

2014-02-11 Thread Dominik Mostowiec
Hi, If this problem (with stucked active+remapped pgs after reweight-by-utilisation) affects all ceph configurations or only specific ones? If specific: what is the reason in my case? Is this caused by crush configuration (cluster architecture, crush tunnables, ...), cluster size, architecture desi

Re: [ceph-users] poor data distribution

2014-02-06 Thread Dominik Mostowiec
Great! Thanks for Your help. -- Regards Dominik 2014-02-06 21:10 GMT+01:00 Sage Weil : > On Thu, 6 Feb 2014, Dominik Mostowiec wrote: >> Hi, >> Thanks !! >> Can You suggest any workaround for now? > > You can adjust the crush weights on the overfull nodes slightly. You'd > need to do it by hand,

Re: [ceph-users] poor data distribution

2014-02-06 Thread Sage Weil
On Thu, 6 Feb 2014, Dominik Mostowiec wrote: > Hi, > Thanks !! > Can You suggest any workaround for now? You can adjust the crush weights on the overfull nodes slightly. You'd need to do it by hand, but that will do the trick. For example, ceph osd crush reweight osd.123 .96 (if the current

Re: [ceph-users] poor data distribution

2014-02-06 Thread Dominik Mostowiec
Hi, Thanks !! Can You suggest any workaround for now? -- Regards Dominik 2014-02-06 18:39 GMT+01:00 Sage Weil : > Hi, > > Just an update here. Another user saw this and after playing with it I > identified a problem with CRUSH. There is a branch outstanding > (wip-crush) that is pending review

Re: [ceph-users] poor data distribution

2014-02-06 Thread Sage Weil
Hi, Just an update here. Another user saw this and after playing with it I identified a problem with CRUSH. There is a branch outstanding (wip-crush) that is pending review, but it's not a quick fix because of compatibility issues. sage On Thu, 6 Feb 2014, Dominik Mostowiec wrote: > Hi, >

Re: [ceph-users] poor data distribution

2014-02-06 Thread Dominik Mostowiec
Hi, Mabye this info can help to find what is wrong. For one PG (3.1e4a) which is active+remapped: { "state": "active+remapped", "epoch": 96050, "up": [ 119, 69], "acting": [ 119, 69, 7], Logs: On osd.7: 2014-02-04 09:45:54.966913 7fa618afe700 1 osd.7 p

Re: [ceph-users] poor data distribution

2014-02-04 Thread Dominik Mostowiec
Hi, Thanks for Your help !! We've done again 'ceph osd reweight-by-utilization 105' Cluster stack on 10387 active+clean, 237 active+remapped; More info in attachments. -- Regards Dominik 2014-02-04 Sage Weil : > Hi, > > I spent a couple hours looking at your map because it did look like there >

Re: [ceph-users] poor data distribution

2014-02-03 Thread Sage Weil
Hi, I spent a couple hours looking at your map because it did look like there was something wrong. After some experimentation and adding a bucnh of improvements to osdmaptool to test the distribution, though, I think everything is working as expected. For pool 3, your map has a standard devi

Re: [ceph-users] poor data distribution

2014-02-03 Thread Sage Weil
On Mon, 3 Feb 2014, Dominik Mostowiec wrote: > Sory, i forgot to tell You. > It can be important. > We done: > ceph osd reweight-by-utilization 105 ( as i wrote in second mail ). > and after cluster stack on 'active+remapped' PGs we had to reweight it > back to 1.0. (all reweighted osd's) > This os

Re: [ceph-users] poor data distribution

2014-02-03 Thread Dominik Mostowiec
Sory, i forgot to tell You. It can be important. We done: ceph osd reweight-by-utilization 105 ( as i wrote in second mail ). and after cluster stack on 'active+remapped' PGs we had to reweight it back to 1.0. (all reweighted osd's) This osdmap is not from active+clean cluster, rebalancing is in pr

Re: [ceph-users] poor data distribution

2014-02-03 Thread Sage Weil
Hi Dominik, Can you send a copy of your osdmap? ceph osd getmap -o /tmp/osdmap (Can send it off list if the IP addresses are sensitive.) I'm tweaking osdmaptool to have a --test-map-pgs option to look at this offline. Thanks! sage On Mon, 3 Feb 2014, Dominik Mostowiec wrote: > In other wo

Re: [ceph-users] poor data distribution

2014-02-03 Thread Dominik Mostowiec
In other words, 1. we've got 3 racks ( 1 replica per rack ) 2. in every rack we have 3 hosts 3. every host has 22 OSD's 4. all pg_num's are 2^n for every pool 5. we enabled "crush tunables optimal". 6. on every machine we disabled 4 unused disk's (osd out, osd reweight 0 and osd rm) Pool ".rgw.buc

Re: [ceph-users] poor data distribution

2014-02-01 Thread Dominik Mostowiec
Hi, For more info: crush: http://dysk.onet.pl/link/r4wGK osd_dump: http://dysk.onet.pl/link/I3YMZ pg_dump: http://dysk.onet.pl/link/4jkqM -- Regards Dominik 2014-02-02 Dominik Mostowiec : > Hi, > Hmm, > You think about sumarize PGs from different pools on one OSD's i think. > But for one po

Re: [ceph-users] poor data distribution

2014-02-01 Thread Dominik Mostowiec
Hi, Hmm, You think about sumarize PGs from different pools on one OSD's i think. But for one pool (.rgw.buckets) where i have almost of all my data, PG count on OSDs is aslo different. For example 105 vs 144 PGs from pool .rgw.buckets. In first case it is 52% disk usage, second 74%. -- Regards Dom

Re: [ceph-users] poor data distribution

2014-02-01 Thread Sage Weil
It occurs to me that this (and other unexplain variance reports) could easily be the 'hashpspool' flag not being set. The old behavior had the misfeature where consecutive pool's pg's would 'line up' on the same osds, so that 1.7 == 2.6 == 3.5 == 4.4 etc would map to the same nodes. This tend

Re: [ceph-users] poor data distribution

2014-02-01 Thread Dominik Mostowiec
Hi, After scrubbing almost all PGs has equal(~) num of objects. I found something else. On one host PG coun on OSDs: OSD with small(52%) disk usage: count, pool 105 3 18 4 3 5 Osd with larger(74%) disk usage: 144 3 31 4 2 5 Pool 3 is .rgw.buckets (where is almost of

Re: [ceph-users] poor data distribution

2014-02-01 Thread Dominik Mostowiec
Hi, > Did you bump pgp_num as well? Yes. See: http://dysk.onet.pl/link/BZ968 > 25% pools is two times smaller from other. This is changing after scrubbing. -- Regards Dominik 2014-02-01 Kyle Bader : > >> Change pg_num for .rgw.buckets to power of 2, an 'crush tunables >> optimal' didn't help :(

Re: [ceph-users] poor data distribution

2014-02-01 Thread Kyle Bader
> Change pg_num for .rgw.buckets to power of 2, an 'crush tunables > optimal' didn't help :( Did you bump pgp_num as well? The split pgs will stay in place until pgp_num is bumped as well, if you do this be prepared for (potentially lots) of data movement. _

Re: [ceph-users] poor data distribution

2014-02-01 Thread Dominik Mostowiec
Hi, Stats for pool 3 (.rgw.buckets), objects distribution: cat pg_pool_obj_size_up.txt | awk '{if ($1=="3") print $2}' | sed -e 's/...$//' | sort | uniq -c 183 12 6166 13 1843 6 About 25% pools is two times smaller from other. I think this can be a reason of strange data distribution on

Re: [ceph-users] poor data distribution

2014-01-31 Thread Dominik Mostowiec
Hi, Change pg_num for .rgw.buckets to power of 2, an 'crush tunables optimal' didn't help :( Graph: http://dysk.onet.pl/link/BZ968 What can i do with dhis? Something is broken because cluster before increase pg_num reported 10T of data, now it is 18751 GB data, 34612 GB used, 20497 GB / 55110 G

Re: [ceph-users] poor data distribution

2014-01-30 Thread Dominik Mostowiec
Hi, For this cluster 198x100/3 = 6600 If i bump up pool .rgw.buckets to 8192 (now it is 4800 ), it'll be 9896. It is not to much? Mabye better way is to destroy eg '.log' pool and create it with lower pool count (it is safe?)? -- Regards Dominik 2014-01-30 Sage Weil : > On Thu, 30 Jan 2014, Domi

Re: [ceph-users] poor data distribution

2014-01-30 Thread Sage Weil
On Thu, 30 Jan 2014, Dominik Mostowiec wrote: > Hi, > Thaks for Your response. > > > - with ~6,5k objects, size ~1,4G > > - with ~13k objects, size ~2,8G > is on one the biggest pool 5 '.rgw.buckets' > > > This is because pg_num is not a power of 2 > This is for all PGs (sum of all pools) or for

Re: [ceph-users] poor data distribution

2014-01-30 Thread Dominik Mostowiec
Hi, Thaks for Your response. > - with ~6,5k objects, size ~1,4G > - with ~13k objects, size ~2,8G is on one the biggest pool 5 '.rgw.buckets' > This is because pg_num is not a power of 2 This is for all PGs (sum of all pools) or for pool 5 '.rgw.buckets' where i have almost all data ? > Did you

Re: [ceph-users] poor data distribution

2014-01-30 Thread Sage Weil
On Thu, 30 Jan 2014, Dominik Mostowiec wrote: > Hi, > I found something else. > 'ceph pg dump' shows PGs: > - with zero or near zero objects count These are probably for a different pool than the big ones, right? The PG id is basically $pool.$shard. > - with ~6,5k objects, size ~1,4G > - with

Re: [ceph-users] poor data distribution

2014-01-30 Thread Dominik Mostowiec
Hi, I found something else what I think can help. PG distribution it seems isn't ok. Graph: http://dysk.onet.pl/link/AVzTe All PGS is from 70 to 140 per OSD. Primary 15 to 58 per OSD. Is there some way to fix it? -- Regards Dominik 2014-01-30 Dominik Mostowiec : > Hi, > I found something else.

Re: [ceph-users] poor data distribution

2014-01-30 Thread Dominik Mostowiec
Hi, I found something else. 'ceph pg dump' shows PGs: - with zero or near zero objects count - with ~6,5k objects, size ~1,4G - with ~13k objects, size ~2,8G This can be a reason of wrong data distribution on OSD's? --- Regards Dominik 2014-01-30 Dominik Mostowiec : > Hi, > I have problem with

[ceph-users] poor data distribution

2014-01-30 Thread Dominik Mostowiec
Hi, I have problem with data distribution. Smallest disk usage 40% vs highest 82%. All PGS: 6504. Almost all data is in '.rgw.buckets' pool with pg_num 4800. The best way to better data distribution is increese pg_num in this pool? Is thre another way? ( eg crush tunables, or something like that ..