This is probably a better topic for manta-discuss.  Some thoughts below:

On Wed, Nov 18, 2015 at 7:13 PM, Benjamin Bergia <[email protected]>
wrote:

> I am currently looking into different storage solutions for the
> infrastructure I am working on. The main usage would be slow storage of
> images, archives and maybe later logs. Map/Reduce jobs are a bonus and not
> a requirement. Since our current infrastructure is running on SDC7 I
> naturally thought about Manta. Other alternatives being Riak S2 or Swift.
> The fact that Manta is not oriented toward any specific type/size of files
> is a good selling point for me.
>
> The current infrastructure is limited to 1 Rack of around 15 CNs in one
> Datacenter. We have plans to extend to multiple racks and at least two
> other Datacenters. I currently have 4 CNs and 3~4 SNs available.
>
> One of the 4 SNs is the current SDC Headnode which means that I could do
> something like:
> 1) using 2CNs to setup a HA headnode. The 2CNs left being used for Manta
> with 2 Metadata shards. This would free one SN and let me with 4SNs for
> Manta.
> 2) keeping my Headnode where it is and using the 4 CNs + 3 SNs for Manta
> with a number of shards to define.
> 3) buying more CNs/SNs.
>
> As explained before I am primarily looking for storage. My main concern is
> keeping as much flexibility as possible for future growth/expansion and raw
> performance doesn't really matter.
>
>
I don't have concise answers to your questions about hardware allocation
and zone layout, but here's what I've got.  The three primary
considerations are:

(1) how much capacity do you need? (in terms of object count, request
throughput, bytes stored, and so on)
(2) how much capacity can each CN/SN provide (which is hard to know ahead
of time)
(3) how much do you care about guaranteeing availability and consistent
performance in the face of transient CN failure?

If you misjudge (1) or (2), you can scale Manta up as described in [1].
Generally, you can add storage capacity as well as capacity for everything
_other_ than the metadata databases by adding new hardware and just
provisioning new instances on that hardware.  For the database itself, you
may need to reshard to increase capacity.  The resharding procedure is
mostly documented[2], but I don't believe anybody has run this procedure
since it was documented and initially tested, so there's potentially some
risk there.

If you misjudge nearly any of this, you can always address it by some
combination of adding or removing hardware and moving instances around the
cloud.  I apologize that the docs gave you the impression that things were
fixed -- I see how that's confusing the way they're presented now, and I'll
update them to reflect this.



> My questions are:
> How many shards should I start with ?
>


Remember that each shard consists of three Manatee peers (and three
PostgreSQL database), and the point is for them to live on separate compute
nodes.  That's not strictly a requirement, but if you put multiple
instances from the same shard on the same CN, then the system cannot
survive failure of that CN.  Similarly, if two databases from different
shards wind up on the same CN, then you're not really extending performance
beyond what a single system can do anyway.  Given that you've got at most 4
CNs for metadata, you won't get much benefit out of having more than two
shards.  The only reason to go larger, I think, would be to allow you to
expand capacity with new hardware in the future without having to reshard
(by moving the existing shards onto the new hardware).


How many CNs / SNs would you recommend (cf solution 1, 2, 3 or others) ?
>


The headnode does not typically use a lot of storage, so I would save the
storage nodes exclusively for Manta.  But again, it really depends on what
you want out of your deployment: if you want to provision very high storage
VMs, maybe you want to keep one SN for non-Manta usage instead.

See my answer to your next question.


>From there which zone layout would you recommend ?
>

If you care at all about surviving CN failure, then I would start with at
least 3 CNs and 3 SNs.  I would only put storage and marlin zones on the
SNs.  I would put one Manatee peer from each shard onto each of the 3 CNs.
Then I would deploy two of every other service: one instance onto two
different CNs.

If you don't care about surviving CN failure, and the CNs are reasonably
beefy, I might start with putting only one of each of the the non-storage
services onto a single node, and put all of the Manatee instances onto that
node as well.  I would watch resource utilization.  If resources become
saturated, I would start moving the affected resources onto a second CN.

I hope this helps.

-- Dave

[1]
https://github.com/joyent/manta/blob/master/docs/manta-ops.md#manta-scalability
[2] https://github.com/joyent/manatee/blob/master/docs/resharding.md



-------------------------------------------
smartos-discuss
Archives: https://www.listbox.com/member/archive/184463/=now
RSS Feed: https://www.listbox.com/member/archive/rss/184463/25769125-55cfbc00
Modify Your Subscription: 
https://www.listbox.com/member/?member_id=25769125&id_secret=25769125-7688e9fb
Powered by Listbox: http://www.listbox.com

Reply via email to