Hey Martin, thanks for sharing your code! I will have a read through
it today. I initially wanted to try a very naive and simple approach.
Something like (I am using java btw):

   class User {
       @PrimaryKey
       private String mUsername;
   }

   class Follow {
       @PrimaryKey
       @Persistent(valueStrategy = IdGeneratorStrategy.IDENTITY)
       private Key mKey;

       @Persistent
       private String mUsernameFollowed;

       @Persistent
       private String mUsernameFollower;
   }

So every time a user followers another user, a Follower instance is
created in the datastore. This should be pretty quick.

When I query a user, it will involve a few trips to the datastore
which is probably an awful idea, since reads are going to be frequent
(ie. viewing a user's profile page and seeing their first 20
followers, just like twitter does). This would involve:

  1) Ask the datastore for the User object given the unique username.
  2) Ask the datastore for the first 20 Follow records where
Follow.mUsernameFollowed = username.
  3) Do a batch-get of those 20 User objects.

So, three reads here. This is probably going to be sub-optimal. as
users view one another's profiles, my datastore usage will climb
quickly, and the multiple reads will be slow.

So, I'll read through yours, this was a naive first thing that comes
to mind though!

Thanks


On Apr 19, 8:13 pm, Martin Webb <spydre...@yahoo.co.uk> wrote:
> Mark - im working on a similar system myself. I agree that using a string in 
> the user class is potentially going to throw issues on a large scale - app as 
> it would need sharding? also running into 10,000 of followers could get 
> heavy. I have made an implementation using a simple relationship model. and 
> it can be used for anything. Friendships, followers, related images anything 
> you like. I am not 100% sure it is the best model or even built correctly but 
> i was considering posting it for comments. Its still in working so the code 
> is not 100% tested ect - but it may give you some ideas.
> Any comments on this are much appreciated - as i have said i have looked at 
> list properties but i can only imagine they would need sharding etc etc.
> I have not added any mcache support but this will be added to the finish 
> class - i will post my code if anyone is interested.
>
> !/usr/bin/env python
> #
> # Copyright 2007 Google Inc.
> #
> # Licensed under the Apache License, Version 2.0 (the "License");
> # you may not use this file except in compliance with the License.
> # You may obtain a copy of the License at
> #
> #    http://www.apache.org/licenses/LICENSE-2.0
> #
> # Unless required by applicable law or agreed to in writing, software
> # distributed under the License is distributed on an "AS IS" BASIS,
> # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
> # See the License for the specific language governing permissions and
> # limitations under the License.
> #
>
> from google.appengine.ext import db
> import google.appengine.ext.db
> from google.appengine.api import memcache
> import shards
>
> class collection(db.Model):
>     """
>     Stores relationships - This could be achieved using a 
> ListProperty(db.ReferenceProperty) but that may
>     run into issues as it is possible that many "sessions" could be appending 
> keys at the same time. Also a list
>     property may have a performance issue if their are millions of siblings 
> following for one entity.
>
>     further-more we can mchache our keys for even faster performance when we 
> check if keys are siblings of parent urls.
>
>     I feel that this approach simple
>     it may be - may be a better approach for long term performance?
>
>     The collection object stores a domain, url and a key
>
>     domain, url, key
>
>     Where the domain and url describe the object and the key is the related 
> sibling.
>
>     domian='friend'
>     url='martin'
>     key=some instance key
>
>     long_url = 'friend/martin'
>     stored_key ='friend/martin/[some key]'
>
>     We can store friendships by storing one side of the friendship as the url 
> and the siblings (friends) as keys.
>
>     the query to test if a sibling is a friend of a parent - is VERY simple:
>     stored_key=make_key(domian, url,key)
>     exists = collection.get_by_key_name(stored_key)
>
>     This works as when we store a key the we store the key name as the 
> domian+url+key; this is done using the make_key function
>
>     We can memcache this - again using the stored_key as the cache key for 
> super quick requests - when we are using the collection for say friendships
>     user pages will need to know if the current user is a friend of the page 
> owner - mcache will respond super fast in this scenario.
>
>     Counters.
>
>     We use our shard class for keeping count of how many instances are 
> related to the url (ie how many friends to an entity)
>
>     counter_key=long_url+'/cnt'
>     shards.increment(counter_key)
>
>     (note the format of our key )
>
>     Getting a list of all the keys that are a sibling: (so all the friends of 
> an entity)
>
>     long_url=make_url('friends','martin')  # 'friends/martin'
>
>     all_siblings = db.GqlQuery('SELECT * FROM collection WHERE url = :1', 
> long_url).get(limit) #paging removed
>
>     We use a simple query that filters the url and returns the keys. As the 
> keys are strings we don't need to worry about
>     the data-store loading 'related instances'. Once we get our keys - we can 
> create a short list of keys [] that can then make one
>     call to the datastore and load the instances by key.
>     This in theory should be super-quick as only the instances say 10 at a 
> time are loaded for for queries where potentialy thousands of
>     siblings may be present
>
>     #Make a list of keys so that we can load our instances in a flash - in 
> real world this might be 10 at a time (see limit in above query)
>     li=[]
>     for instance in keys:
>             li.append(instance.key)
>
>     #get a list of instances for the keys
>     return db.get(li)  
>
>     The next question we could ask the datastore is:
>
>     who am i a sibling of (who am i friends to)
>
>     This can be returned using another simple query;
>
>     all_instances = db.GqlQuery('SELECT * FROM collection WHERE key = :1 AND 
> domain = ;2', sibling_key, 'friends').get(limit)
>     note we use 'friends' as our domain as what to only return keys where the 
> relationship is of a friendship
>
>     Another example of use
>
>     Lets say i want to create a group of people this can be model'd by
>     creating a model for Groups:
>
>     Class Groups(db.Model)
>     Name: db.StringProperty()
>
>     Now we can do:
>
>     martin=user("martin")
>     leo=user("leo")
>
>     group=Group(key_name="Club1", Name="Club1")
>
>     links.collection.add_sibling('groups',Club.Key(),martin)
>     links.collection.add_sibling('groups',Club.Key(),leo)
>
>     Note in this example we identify that the relationship is a group by 
> using 'group' as our domain.
>
>     now we can use the queries above to see "who" is in the group and if a 
> person is a member of a group and what groups say martin is a member of;
>
>     all_groups = db.GqlQuery('SELECT * FROM collection WHERE key = :1 AND 
> domain = ;2', martin.key() , 'groups').get(limit)
>
>     the above lists all the 'groups' martin is a member of.
>
>     Friendships:
>
>      martin=user("martin") #is the initiator
>     leo=user("leo") #is the acceptor
>
>     #adds the relationship    
>     links.collection.add_sibling('friends',leo.key(),martin.key())
>     #if the relaitionship is too sided we simply reverse it
>     links.collection.add_sibling('friends',martin.key(),leo.key()) #this may 
> not be the best solution - we could simply create a query to find reversed 
> relationships?
>
>     the above queries can be used to get the relationships
>
>     This class is useful in social apps where many relationships can be made.
>
>     TO DO:
>
>     1. test
>     2. add mcache
>     3. add other in built queries as detailed in header outline (list 
> siblings, who am i a sibling of, am i a sibling)
>     4. add remove relationship
>
>     """
>
>     """
>
>     THE MODEL OBJECT
>     domain, url, key
>
>     the domain of the instance example 'freinds'
>     """
>     domain=db.StringProperty()
>     """
>     a url to describe our parent like friends/martin
>     we make the url by using;
>
>     url=make_key(domain,url,sibling)
>
>     """
>     url   = db.StringProperty()
>     """the key of our sibling instance"""
>     sibling = db.StringProperty()
>
>     @staticmethod
>     def add_sibling(domain='',url=None, key=None):
>         """
>         Add a sibling key to a url's collection.
>         the key should be like numberfile.key()
>         the url should be  a long url path like martin/friends but not 
> include the key
>         """
>
>         """
>         make the unique stored key which is the complete path long_url/key
>         """
>         stored_key=make_key(domain,url,key)
>         exists = collection.get_by_key_name(stored_key)
>         if exists is not None:
>             # KEY is already added as a LINK
>             return
>
>         #add our new stored key
>         long_url=make_url(domain,url)
>
>         link = collection(key_name=stored_key)
>         link.domain=domain
>         link.url=long_url
>         link.sibling=key
>         link.put()
>         #update our counter for the long_url ie friends/martin
>         shards.increment(long_url +'_cnt')
>
>     @staticmethod
>     def get_siblings(domain,url, limit = 10, offset = None):
>         """
>         get all the sibling keys for a given doomain/url - identifier
>         """
>
>         li=[]
>         """
>         make the long url example friends/martin
>         """
>         long_url=make_url(domain,url)
>
>         if offset is None:
>             q = db.GqlQuery('SELECT * FROM collection WHERE url = :1', 
> long_url)
>         else:
>             if type(offset) == type(str):
>                 offset = db.Key(offset)
>                 q = c.GqlQuery('WHERE url = :1 AND __key__ > :2', long_url, 
> offset)
>
>         #build our list
>         for collection in q:
>             li.append(collection.ref)
>
>         """
>         return all the instances for the keys in our list
>         """
>
>         return db.get(li)
>
> def make_url(domain='',url=None):
>     """
>     makes a long url path using the passed url and domain used for our 
> key_names
>     """
>
>     if url is None:
>         return None
>     return domain+'/'+url
>
> def make_key(domain='',url=None,sibling=None):
>     """
>     makes a url using the passed url and key used for our key_names
>
>     keys are make like this
>
>     domain+url+key
>
>     which for an example could be
>
>     friends/martin/[key]
>
>     which defines the key as a sibling of our long url - domain+url
>
>     in our real world app using the key of the instance would be safer as our 
> human readable word may change
>     i.e. a persons name
>     """
>
>     if key is None:
>         return None
>
>     #build our long url path
>     long_url=make_url(domain,url)
>     #should we exit if long url is None? This
> ...
>
> read more »

-- 
You received this message because you are subscribed to the Google Groups 
"Google App Engine" group.
To post to this group, send email to google-appeng...@googlegroups.com.
To unsubscribe from this group, send email to 
google-appengine+unsubscr...@googlegroups.com.
For more options, visit this group at 
http://groups.google.com/group/google-appengine?hl=en.

Reply via email to