On 19.01.2012, at 20:15, Narendra Sharma wrote: > I believe you need to move the nodes on the ring. What was the load on the > nodes before you added 5 new nodes? Its just that you are getting data in > certain token range more than others. With three nodes, it was also imbalanced.
What I don't understand is, why the md5 sums would generate such massive hot spots. Most of our keys look like that: 00013270494972450001234567 with the first 16 digits being a timestamp of one of our application server's startup times, and the last 10 digits being sequentially generated per user. There may be a lot of keys that start with e.g. "0001327049497245" (or some other time stamp). But I was under the impression that md5 doesn't bother and generates uniform distribution? But then again, I know next to nothing about md5. Maybe someone else has a better insight to the algorithm? However, we also use cfs with a date ("yyyymmdd") as key, as well as cfs with uuids as keys. And those cfs in itself are not balanced either. E.g. node 5 has 12 GB live space used in the cf the uuid as key, and node 8 only 428MB. Cheers, Marcel > > On Thu, Jan 19, 2012 at 3:22 AM, Marcel Steinbach <marcel.steinb...@chors.de> > wrote: > On 18.01.2012, at 02:19, Maki Watanabe wrote: >> Are there any significant difference of number of sstables on each nodes? > No, no significant difference there. Actually, node 8 is among those with > more sstables but with the least load (20GB) > > On 17.01.2012, at 20:14, Jeremiah Jordan wrote: >> Are you deleting data or using TTL's? Expired/deleted data won't go away >> until the sstable holding it is compacted. So if compaction has happened on >> some nodes, but not on others, you will see this. The disparity is pretty >> big 400Gb to 20GB, so this probably isn't the issue, but with our data using >> TTL's if I run major compactions a couple times on that column family it can >> shrink ~30%-40%. > Yes, we do delete data. But I agree, the disparity is too big to blame only > the deletions. > > Also, initially, we started out with 3 nodes and upgraded to 8 a few weeks > ago. After adding the node, we did > compactions and cleanups and didn't have a balanced cluster. So that should > have removed outdated data, right? > >> 2012/1/18 Marcel Steinbach <marcel.steinb...@chors.de>: >>> We are running regular repairs, so I don't think that's the problem. >>> And the data dir sizes match approx. the load from the nodetool. >>> Thanks for the advise, though. >>> >>> Our keys are digits only, and all contain a few zeros at the same >>> offsets. I'm not that familiar with the md5 algorithm, but I doubt that it >>> would generate 'hotspots' for those kind of keys, right? >>> >>> On 17.01.2012, at 17:34, Mohit Anchlia wrote: >>> >>> Have you tried running repair first on each node? Also, verify using >>> df -h on the data dirs >>> >>> On Tue, Jan 17, 2012 at 7:34 AM, Marcel Steinbach >>> <marcel.steinb...@chors.de> wrote: >>> >>> Hi, >>> >>> >>> we're using RP and have each node assigned the same amount of the token >>> space. The cluster looks like that: >>> >>> >>> Address Status State Load Owns Token >>> >>> >>> 205648943402372032879374446248852460236 >>> >>> 1 Up Normal 310.83 GB 12.50% >>> 56775407874461455114148055497453867724 >>> >>> 2 Up Normal 470.24 GB 12.50% >>> 78043055807020109080608968461939380940 >>> >>> 3 Up Normal 271.57 GB 12.50% >>> 99310703739578763047069881426424894156 >>> >>> 4 Up Normal 282.61 GB 12.50% >>> 120578351672137417013530794390910407372 >>> >>> 5 Up Normal 248.76 GB 12.50% >>> 141845999604696070979991707355395920588 >>> >>> 6 Up Normal 164.12 GB 12.50% >>> 163113647537254724946452620319881433804 >>> >>> 7 Up Normal 76.23 GB 12.50% >>> 184381295469813378912913533284366947020 >>> >>> 8 Up Normal 19.79 GB 12.50% >>> 205648943402372032879374446248852460236 >>> >>> >>> I was under the impression, the RP would distribute the load more evenly. >>> >>> Our row sizes are 0,5-1 KB, hence, we don't store huge rows on a single >>> node. Should we just move the nodes so that the load is more even >>> distributed, or is there something off that needs to be fixed first? >>> >>> >>> Thanks >>> >>> Marcel >>> >>> <hr style="border-color:blue"> >>> >>> <p>chors GmbH >>> >>> <br><hr style="border-color:blue"> >>> >>> <p>specialists in digital and direct marketing solutions<br> >>> >>> Haid-und-Neu-Straße 7<br> >>> >>> 76131 Karlsruhe, Germany<br> >>> >>> www.chors.com</p> >>> >>> <p>Managing Directors: Dr. Volker Hatz, Markus Plattner<br>Amtsgericht >>> Montabaur, HRB 15029</p> >>> >>> <p style="font-size:9px">This e-mail is for the intended recipient only and >>> may contain confidential or privileged information. If you have received >>> this e-mail by mistake, please contact us immediately and completely delete >>> it (and any attachments) and do not forward it or inform any other person of >>> its contents. If you send us messages by e-mail, we take this as your >>> authorization to correspond with you by e-mail. E-mail transmission cannot >>> be guaranteed to be secure or error-free as information could be >>> intercepted, amended, corrupted, lost, destroyed, arrive late or incomplete, >>> or contain viruses. Neither chors GmbH nor the sender accept liability for >>> any errors or omissions in the content of this message which arise as a >>> result of its e-mail transmission. Please note that all e-mail >>> communications to and from chors GmbH may be monitored.</p> >>> >>> >> >> >> >> -- >> w3m > > > > > -- > Narendra Sharma > Software Engineer > http://www.aeris.com > http://narendrasharma.blogspot.com/ > >