This is probably a better topic for manta-discuss. Some thoughts below: On Wed, Nov 18, 2015 at 7:13 PM, Benjamin Bergia <[email protected]> wrote:
> I am currently looking into different storage solutions for the > infrastructure I am working on. The main usage would be slow storage of > images, archives and maybe later logs. Map/Reduce jobs are a bonus and not > a requirement. Since our current infrastructure is running on SDC7 I > naturally thought about Manta. Other alternatives being Riak S2 or Swift. > The fact that Manta is not oriented toward any specific type/size of files > is a good selling point for me. > > The current infrastructure is limited to 1 Rack of around 15 CNs in one > Datacenter. We have plans to extend to multiple racks and at least two > other Datacenters. I currently have 4 CNs and 3~4 SNs available. > > One of the 4 SNs is the current SDC Headnode which means that I could do > something like: > 1) using 2CNs to setup a HA headnode. The 2CNs left being used for Manta > with 2 Metadata shards. This would free one SN and let me with 4SNs for > Manta. > 2) keeping my Headnode where it is and using the 4 CNs + 3 SNs for Manta > with a number of shards to define. > 3) buying more CNs/SNs. > > As explained before I am primarily looking for storage. My main concern is > keeping as much flexibility as possible for future growth/expansion and raw > performance doesn't really matter. > > I don't have concise answers to your questions about hardware allocation and zone layout, but here's what I've got. The three primary considerations are: (1) how much capacity do you need? (in terms of object count, request throughput, bytes stored, and so on) (2) how much capacity can each CN/SN provide (which is hard to know ahead of time) (3) how much do you care about guaranteeing availability and consistent performance in the face of transient CN failure? If you misjudge (1) or (2), you can scale Manta up as described in [1]. Generally, you can add storage capacity as well as capacity for everything _other_ than the metadata databases by adding new hardware and just provisioning new instances on that hardware. For the database itself, you may need to reshard to increase capacity. The resharding procedure is mostly documented[2], but I don't believe anybody has run this procedure since it was documented and initially tested, so there's potentially some risk there. If you misjudge nearly any of this, you can always address it by some combination of adding or removing hardware and moving instances around the cloud. I apologize that the docs gave you the impression that things were fixed -- I see how that's confusing the way they're presented now, and I'll update them to reflect this. > My questions are: > How many shards should I start with ? > Remember that each shard consists of three Manatee peers (and three PostgreSQL database), and the point is for them to live on separate compute nodes. That's not strictly a requirement, but if you put multiple instances from the same shard on the same CN, then the system cannot survive failure of that CN. Similarly, if two databases from different shards wind up on the same CN, then you're not really extending performance beyond what a single system can do anyway. Given that you've got at most 4 CNs for metadata, you won't get much benefit out of having more than two shards. The only reason to go larger, I think, would be to allow you to expand capacity with new hardware in the future without having to reshard (by moving the existing shards onto the new hardware). How many CNs / SNs would you recommend (cf solution 1, 2, 3 or others) ? > The headnode does not typically use a lot of storage, so I would save the storage nodes exclusively for Manta. But again, it really depends on what you want out of your deployment: if you want to provision very high storage VMs, maybe you want to keep one SN for non-Manta usage instead. See my answer to your next question. >From there which zone layout would you recommend ? > If you care at all about surviving CN failure, then I would start with at least 3 CNs and 3 SNs. I would only put storage and marlin zones on the SNs. I would put one Manatee peer from each shard onto each of the 3 CNs. Then I would deploy two of every other service: one instance onto two different CNs. If you don't care about surviving CN failure, and the CNs are reasonably beefy, I might start with putting only one of each of the the non-storage services onto a single node, and put all of the Manatee instances onto that node as well. I would watch resource utilization. If resources become saturated, I would start moving the affected resources onto a second CN. I hope this helps. -- Dave [1] https://github.com/joyent/manta/blob/master/docs/manta-ops.md#manta-scalability [2] https://github.com/joyent/manatee/blob/master/docs/resharding.md ------------------------------------------- smartos-discuss Archives: https://www.listbox.com/member/archive/184463/=now RSS Feed: https://www.listbox.com/member/archive/rss/184463/25769125-55cfbc00 Modify Your Subscription: https://www.listbox.com/member/?member_id=25769125&id_secret=25769125-7688e9fb Powered by Listbox: http://www.listbox.com
