On Jun 2, 9:08 pm, Nate Wiger <[email protected]> wrote:
> Thanks for the explanation Jeremy.  I think the issue of confusion for
> me is this:
>
>    User.server(:shard_a).first(:name=>'Bob')
>
> This makes sharding more like a scope, if you will.  That is, in code
> I would have to do something like this:
>
>    User.server(decide_shard_for(username)).first(:name => username)

Yes, that's by design.  It's a typical dataset method that returns a
cloned dataset, so specifying a shard is like adding a filter.

> Since you know what you're hashing on (in this example username) it
> seems a bit repetitive/out of place to have this at the top level.
>
> Instead what I'd expect would be something like this:
>
>    User.find_sharded(username)
>
> And then find_sharded would do something like this:
>
>    def self.find_sharded(username)
>      decide_shard_for(username).find_by_username(username)
>    end

You could already do that yourself (as you note below), and I think
that's best as a method like that is going to be application
specific.  What the sharding plugin does is make sure that when you
save a user retrieved by find_sharded, it saves it back to the shard
it was retrieved from.

> Which, yes, you could write yourself as the application programmer.
> But along the lines of a sharding plugin, it seems to me what I would
> be looking to do is something like this:
>
>    class User < Sequel::Model
>      plugin :sharding
>      shard_on :username, :hash => :sha1
>    end

I purposely chose not to do that as sharding in that manner is going
to be very application specific, while the sharding plugin is designed
to be fairly generic.  How would the :hash=>:sha1 option work
generically?  At the very least, I think you'd have to provide a proc
(or multiple procs) to get anything approaching generic behavior, and
all that would do would give a slightly nicer interface for defining a
custom shard mapping.

What I think may be better is if you have a sharding scheme that works
for you and you think it would be appropriate for other cases, make it
a plugin that other people can use, and publish it as a gem (e.g.
sequel_hash_sharding).  That way, there could be multiple plugins for
different kinds of sharding, each tailored to that specific type.

> Or something like that.  Because then I could do that in sub-models as
> well, to ensure that records ended up on the same shard, regardless of
> whether they were created via associations or by themselves.
>
> Does any of that make sense or am I completely talking crazy.

It makes sense, but again, we are talking about application specific
behavior.  Unlike some other libraries, Sequel doesn't assume you are
building the database from scratch and can modify it to match Sequel's
expectations.  Sequel assumes you are using it with an existing
database, and tries to be flexible enough to accommodate different
ways of doing things.  The saving back to the shard the object was
retrieved from is something pretty much any sharding implementation is
going to want to do.  The association handling is not as generic, but
it's the best default I could think of, and the required refactoring
of the association code made things significantly more flexible.

If you look at the other current thread about sharding (http://
groups.google.com/group/sequel-talk/t/418a989e58157221), I show
example code about how to deal with such a horizontal partitioning
scheme by overriding Model#this and Model#_insert_dataset.  For
horizontal partitioning, that may be a better route to take.

The sharding plugin will work best with database-per-customer, silo-
style partitioning (where all related data is on a single server/
database), as opposed to horizontal partitioning (where a single table
is split among multiple servers).  It still helps in the horizontal
partitioning case in terms of saving an object back to the shard it
was retrieved from (assuming you don't want to go the route of
overriding #this and #_insert_dataset), and easily allowing you to
create objects on specific shards, but the association integration is
something you may have to override in some cases.

Jeremy

-- 
You received this message because you are subscribed to the Google Groups 
"sequel-talk" group.
To post to this group, send email to [email protected].
To unsubscribe from this group, send email to 
[email protected].
For more options, visit this group at 
http://groups.google.com/group/sequel-talk?hl=en.

Reply via email to