Hey Martin, thanks for sharing your code! I will have a read through it today. I initially wanted to try a very naive and simple approach. Something like (I am using java btw):
class User { @PrimaryKey private String mUsername; } class Follow { @PrimaryKey @Persistent(valueStrategy = IdGeneratorStrategy.IDENTITY) private Key mKey; @Persistent private String mUsernameFollowed; @Persistent private String mUsernameFollower; } So every time a user followers another user, a Follower instance is created in the datastore. This should be pretty quick. When I query a user, it will involve a few trips to the datastore which is probably an awful idea, since reads are going to be frequent (ie. viewing a user's profile page and seeing their first 20 followers, just like twitter does). This would involve: 1) Ask the datastore for the User object given the unique username. 2) Ask the datastore for the first 20 Follow records where Follow.mUsernameFollowed = username. 3) Do a batch-get of those 20 User objects. So, three reads here. This is probably going to be sub-optimal. as users view one another's profiles, my datastore usage will climb quickly, and the multiple reads will be slow. So, I'll read through yours, this was a naive first thing that comes to mind though! Thanks On Apr 19, 8:13 pm, Martin Webb <spydre...@yahoo.co.uk> wrote: > Mark - im working on a similar system myself. I agree that using a string in > the user class is potentially going to throw issues on a large scale - app as > it would need sharding? also running into 10,000 of followers could get > heavy. I have made an implementation using a simple relationship model. and > it can be used for anything. Friendships, followers, related images anything > you like. I am not 100% sure it is the best model or even built correctly but > i was considering posting it for comments. Its still in working so the code > is not 100% tested ect - but it may give you some ideas. > Any comments on this are much appreciated - as i have said i have looked at > list properties but i can only imagine they would need sharding etc etc. > I have not added any mcache support but this will be added to the finish > class - i will post my code if anyone is interested. > > !/usr/bin/env python > # > # Copyright 2007 Google Inc. > # > # Licensed under the Apache License, Version 2.0 (the "License"); > # you may not use this file except in compliance with the License. > # You may obtain a copy of the License at > # > # http://www.apache.org/licenses/LICENSE-2.0 > # > # Unless required by applicable law or agreed to in writing, software > # distributed under the License is distributed on an "AS IS" BASIS, > # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. > # See the License for the specific language governing permissions and > # limitations under the License. > # > > from google.appengine.ext import db > import google.appengine.ext.db > from google.appengine.api import memcache > import shards > > class collection(db.Model): > """ > Stores relationships - This could be achieved using a > ListProperty(db.ReferenceProperty) but that may > run into issues as it is possible that many "sessions" could be appending > keys at the same time. Also a list > property may have a performance issue if their are millions of siblings > following for one entity. > > further-more we can mchache our keys for even faster performance when we > check if keys are siblings of parent urls. > > I feel that this approach simple > it may be - may be a better approach for long term performance? > > The collection object stores a domain, url and a key > > domain, url, key > > Where the domain and url describe the object and the key is the related > sibling. > > domian='friend' > url='martin' > key=some instance key > > long_url = 'friend/martin' > stored_key ='friend/martin/[some key]' > > We can store friendships by storing one side of the friendship as the url > and the siblings (friends) as keys. > > the query to test if a sibling is a friend of a parent - is VERY simple: > stored_key=make_key(domian, url,key) > exists = collection.get_by_key_name(stored_key) > > This works as when we store a key the we store the key name as the > domian+url+key; this is done using the make_key function > > We can memcache this - again using the stored_key as the cache key for > super quick requests - when we are using the collection for say friendships > user pages will need to know if the current user is a friend of the page > owner - mcache will respond super fast in this scenario. > > Counters. > > We use our shard class for keeping count of how many instances are > related to the url (ie how many friends to an entity) > > counter_key=long_url+'/cnt' > shards.increment(counter_key) > > (note the format of our key ) > > Getting a list of all the keys that are a sibling: (so all the friends of > an entity) > > long_url=make_url('friends','martin') # 'friends/martin' > > all_siblings = db.GqlQuery('SELECT * FROM collection WHERE url = :1', > long_url).get(limit) #paging removed > > We use a simple query that filters the url and returns the keys. As the > keys are strings we don't need to worry about > the data-store loading 'related instances'. Once we get our keys - we can > create a short list of keys [] that can then make one > call to the datastore and load the instances by key. > This in theory should be super-quick as only the instances say 10 at a > time are loaded for for queries where potentialy thousands of > siblings may be present > > #Make a list of keys so that we can load our instances in a flash - in > real world this might be 10 at a time (see limit in above query) > li=[] > for instance in keys: > li.append(instance.key) > > #get a list of instances for the keys > return db.get(li) > > The next question we could ask the datastore is: > > who am i a sibling of (who am i friends to) > > This can be returned using another simple query; > > all_instances = db.GqlQuery('SELECT * FROM collection WHERE key = :1 AND > domain = ;2', sibling_key, 'friends').get(limit) > note we use 'friends' as our domain as what to only return keys where the > relationship is of a friendship > > Another example of use > > Lets say i want to create a group of people this can be model'd by > creating a model for Groups: > > Class Groups(db.Model) > Name: db.StringProperty() > > Now we can do: > > martin=user("martin") > leo=user("leo") > > group=Group(key_name="Club1", Name="Club1") > > links.collection.add_sibling('groups',Club.Key(),martin) > links.collection.add_sibling('groups',Club.Key(),leo) > > Note in this example we identify that the relationship is a group by > using 'group' as our domain. > > now we can use the queries above to see "who" is in the group and if a > person is a member of a group and what groups say martin is a member of; > > all_groups = db.GqlQuery('SELECT * FROM collection WHERE key = :1 AND > domain = ;2', martin.key() , 'groups').get(limit) > > the above lists all the 'groups' martin is a member of. > > Friendships: > > martin=user("martin") #is the initiator > leo=user("leo") #is the acceptor > > #adds the relationship > links.collection.add_sibling('friends',leo.key(),martin.key()) > #if the relaitionship is too sided we simply reverse it > links.collection.add_sibling('friends',martin.key(),leo.key()) #this may > not be the best solution - we could simply create a query to find reversed > relationships? > > the above queries can be used to get the relationships > > This class is useful in social apps where many relationships can be made. > > TO DO: > > 1. test > 2. add mcache > 3. add other in built queries as detailed in header outline (list > siblings, who am i a sibling of, am i a sibling) > 4. add remove relationship > > """ > > """ > > THE MODEL OBJECT > domain, url, key > > the domain of the instance example 'freinds' > """ > domain=db.StringProperty() > """ > a url to describe our parent like friends/martin > we make the url by using; > > url=make_key(domain,url,sibling) > > """ > url = db.StringProperty() > """the key of our sibling instance""" > sibling = db.StringProperty() > > @staticmethod > def add_sibling(domain='',url=None, key=None): > """ > Add a sibling key to a url's collection. > the key should be like numberfile.key() > the url should be a long url path like martin/friends but not > include the key > """ > > """ > make the unique stored key which is the complete path long_url/key > """ > stored_key=make_key(domain,url,key) > exists = collection.get_by_key_name(stored_key) > if exists is not None: > # KEY is already added as a LINK > return > > #add our new stored key > long_url=make_url(domain,url) > > link = collection(key_name=stored_key) > link.domain=domain > link.url=long_url > link.sibling=key > link.put() > #update our counter for the long_url ie friends/martin > shards.increment(long_url +'_cnt') > > @staticmethod > def get_siblings(domain,url, limit = 10, offset = None): > """ > get all the sibling keys for a given doomain/url - identifier > """ > > li=[] > """ > make the long url example friends/martin > """ > long_url=make_url(domain,url) > > if offset is None: > q = db.GqlQuery('SELECT * FROM collection WHERE url = :1', > long_url) > else: > if type(offset) == type(str): > offset = db.Key(offset) > q = c.GqlQuery('WHERE url = :1 AND __key__ > :2', long_url, > offset) > > #build our list > for collection in q: > li.append(collection.ref) > > """ > return all the instances for the keys in our list > """ > > return db.get(li) > > def make_url(domain='',url=None): > """ > makes a long url path using the passed url and domain used for our > key_names > """ > > if url is None: > return None > return domain+'/'+url > > def make_key(domain='',url=None,sibling=None): > """ > makes a url using the passed url and key used for our key_names > > keys are make like this > > domain+url+key > > which for an example could be > > friends/martin/[key] > > which defines the key as a sibling of our long url - domain+url > > in our real world app using the key of the instance would be safer as our > human readable word may change > i.e. a persons name > """ > > if key is None: > return None > > #build our long url path > long_url=make_url(domain,url) > #should we exit if long url is None? This > ... > > read more » -- You received this message because you are subscribed to the Google Groups "Google App Engine" group. To post to this group, send email to google-appeng...@googlegroups.com. To unsubscribe from this group, send email to google-appengine+unsubscr...@googlegroups.com. For more options, visit this group at http://groups.google.com/group/google-appengine?hl=en.