Last Wednesday, the App Engine team hosted the latest session of its bimonthly IRC office hours. A transcript of the session and a summary of the topics covered is provided below. The next session will take place next Wednesday, March 3rd from 7:00-8:00 p.m. PST in the #appengine channel on irc.freenode.net.
- Jason --SUMMARY----------------------------------------------------------- - Q: What are my options do move my data from my App Engine datastore in production to my PC? A: For now, you have to use the bulkloader utility that ships with the Python SDK, but you can use this with Java- based apps as well -- see http://blog.notdot.net/2009/9/Advanced-Bulk-Loading-Part-5-Bulk-Loading-for-Java for details on setting up the appropriate handlers for a Java environment. [9:12-9:16] - Q: With the index row limit raised to 5,000, does this mean that I can build queries involving multiple list properties? A: Yes. You could do this before, technically, but you had to be more careful to avoid exploding indexes, which could be caused by querying large list properties. You still have to be concerned about this, but the larger limit does give you more flexibility. [9:19-9:22] - Discussion on using memcache to "lock" an entity for writing -- not recommended (memcache is designed as a cache first and foremost) but feasible. All the same, this should be avoided, especially since there are no guarantees made that memcache won't be cleared periodically, and you wouldn't want this locking metadata purged without warning. [9:32-9:40] - Discussion on mitigation strategies for contention -- in general, you only need to think about sharding or other mechanisms if you expect heavy contention (generally more than one write per second consistently) on a single entity or entity group. Otherwise, even if the occasional concurrent write request comes in, the system will automatically retry the write after any failures, so you don't need to alter your models to account for this. But for heavy contention, you can look at sharding, using the task queue API, or even dropping requests. [9:40-9:41, 9:43, 9:46-9:50] - Q: When MapReduce functionality is available, will it support global transactions? A: In general, global transactions are difficult to support in a distributed system. You can achieve a weak form of global transactions today with transactional tasks, which guarantee eventual consistency. [9:55-10:01] --FULL TRANSCRIPT----------------------------------------------------------- [09:08am] nickjohnson: Welcome to the bi-weekly App Engine chat time! With us today are apijason_google and myself, with others to come. [09:09am] nickjohnson: If you're having trouble sending messages to the channel, please be patient, we're working on resolving that right now. [09:10am] apijason_google: Hi Everyone. [09:10am] lurkdev: hello [09:10am] cyonyx: hello [09:11am] cresloyd: hello [09:11am] webus: hello [09:11am] apijason_google: [09:12am] webus: how i can backup my data from GAE (BigTable) to my local PC ? [09:12am] nickjohnson: webus: Java, or Python? [09:13am] webus: java [09:15am] nickjohnson: For now, you need to use the Python bulkloader to load and dump data - see http://blog.notdot.net/2009/9/Advanced-Bulk-Loading-Part-5-Bulk-Loading-for-Java for details on setting up a handler, and http://code.google.com/appengine/docs/python/tools/uploadingdata.html#Downloading_and_Uploading_All_Data for details on dump and restore [09:15am] apijason_google: webus: I think the only way right now is to use the bulkloader tool that ships with the Python SDK. It should work with Java apps too, and we're working on a better Java solution. [09:15am] nickjohnson: ** If you're still having trouble sending to the channel, please send me a PM ** [09:15am] webus: thnx! [09:16am] cresloyd: there are third-party solutions to sync data between the datastore and external databases; I don't know if Googlers would want to recommend any of them, but you're welcome to try them of course [09:18am] ryan_google: hi! [09:19am] ryan_google: aha i can talk now [09:19am] nickjohnson: Hooray! [09:19am] nickjohnson: Anyone who was trying to talk, you can try again now. [09:19am] cyonyx: now that the 5000 index limit has been set for an entity, does that mean one can use more than one list property per model as long as they stay within the 5000? [09:20am] ryan_google: cyonyx: yes [09:20am] ryan_google: i assume you've read http://code.google.com/appengine/docs/python/datastore/queriesandindexes.html#Big_Entities_and_Exploding_Indexes ? [09:22am] ryan_google: *crickets* [09:22am] cyonyx: i have read that thank you. the docs are great by the way. i was playing with some code and found that the 1000 limit was removed for a list, so I tried adding more than 5000 to that list and got an error [09:23am] ryan_google: hmm [09:23am] nickjohnson: cyonyx: The limit we removed was the 1000 result limit, and is unrelated to the max entries in a list. [09:24am] ryan_google: specifically, i don't think we've ever had a limit on the number of property values in a list property [09:24am] ryan_google: well, indirectly 5k "indexed properties," true, but that's for the entity as a whole [09:26am] cyonyx: nickjohnson: ok. not sure where I got the idea of 1000 per list, but now I know 5k is the magic number for an entity. thanks for the clarifications [09:28am] ryan_google: while i'm here, are there any other questions about the datastore? [09:30am] ryan_google: *louder crickets* [09:30am] ryan_google: hmm guess we can go home early [09:30am] nickjohnson: Certainly quiet here today. [09:30am] ikai_google: I rushed in for this ... [09:30am] ikai_google: [09:30am] ikai_google: so I'm guessing 1.3.1 was a crowd pleaser [09:31am] cresloyd: should someone remind people about the outage that is scheduled for later today? [09:31am] ryan_google: hard to say, at least based on the quiet [09:31am] ryan_google: i'd like to remind everyone that there's an outage scheduled for later today. [09:32am] cresloyd: really? tell me more [09:32am] cyonyx: do ae developers use memcache as a way to write lock an entity, to prevent contention? [09:32am] ikai_google: cresloyd: Yes, there's a downtime notification list [09:32am] ryan_google: kidding...but yes, there will be a read only period at roughly 5pm [09:32am] ikai_google: cyonyx: You could also transact on it [09:33am] morais78: oh... I'd just logged it a few minutes ago and thought "ohhh wrong time/day" [09:33am] nickjohnson: cyonyx: You can do that, sort of, but it's not recommended. [09:33am] nickjohnson: Locking should be a last resort. [09:34am] ikai_google: +1 on nick johnson's comment [09:34am] ikai_google: if you need to lock, remember that every save of that kind you do pays that price [09:35am] lent: anything happening on the application loading area? [09:35am] ikai_google: lent: lots [09:35am] nickjohnson: If you've got a specific use-case in mind, feel free to tell us about it and we may be able to suggest some alternatives to locking. [09:35am] ryan_google: more importantly, you shouldn't really use memcache for any kind of correctness, like locking, since it's just a cache [09:35am] ikai_google: lent: There's a lot going on with regards to things like trying to figure out ways to cut resource consumption (resulting in less cycling) [09:35am] ryan_google: your lock could be flushed, and another request could come in and acquire it, and then you'd have two requests both thinking they hold the lock [09:35am] ikai_google: lent: As well as ways to optimize loading requests in general [09:38am] cyonyx: thanks for the feedback. i was just thinking of all the possibilities of ways to reduce contention, and considered memcache write lock to be one of them, but from consensus, it should probably be avoided. [09:40am] nickjohnson: cyonyx: If you just need to reduce contention, you can use that. I've met with some success using memcache.add with a timeout, and caching results in memcache if it returns False [09:40am] ryan_google: out of curiosity, are you actually seeing contention? or are you just thinking ahead? or...? [09:40am] ryan_google: and by contention, what do you mean specifically? [09:41am] cyonyx: just thinking ahead. contention meaning where two simultaneous ds requests, try to update the same entity at the same time [09:41am] nickjohnson: http://code.google.com/p/googleappengine/source/browse/trunk/python/google/appengine/ext/ereporter/ereporter.py#218 - that snippet will run a transaction but generally be limited to one per entity group per log_interval seconds [09:41am] nickjohnson: (Though that's not guaranteed, of course, since values can be evicted) [09:43am] lent: In the blog entries about simple joins, it mentioned: "Fourth, if you're using owned relationships, in order to use simple joins you're going to need to upgrade your storage version and migrate existing data." What does this mean exactly? [09:43am] ryan_google: cyonyx, we already handle that kind of contention for you. the datastore backend automatically retries writes outside of transactions, and the api code retries transactions themselves [09:43am] ryan_google: generally you only need to worry if you expect *heavy* contention on a single entity group. occasional concurrent writes won't really cause a problem. [09:44am] nickjohnson: +1 to what Ryan said, of course. [09:46am] cyonyx: what if a feature was created that was on a first come first serve basis, and there were 500 simultaneous requests for transacting on a given entity? how would you recommend handling this? [09:46am] ryan_google: honestly, you'd want to redesign that feature [09:46am] ikai_google: lent: It just means that you need to update your persistence classes [09:46am] nickjohnson: cyonyx: that depends on the nature of the updates. There are a number of techniques, including sharding, the task queue, and simply dropping updates. [09:46am] ryan_google: our general rule of thumb is that a single entity group can handle at most 1-10qps of writes [09:46am] ryan_google: (to be safe, read that as at most 1qps of writes) [09:47am] ikai_google: lent: My guess is storage version = either optimistic locking or your serialization version [09:47am] ryan_google: good points. as usual, my colleagues are much more helpful than me. [09:47am] ikai_google: cyonyx: That feature wouldn't scale anywhere. Relational datastore, filesystem, etc [09:48am] ikai_google: cynonyx: For some ideas, check out Nick Johnson's blog about sharded counters [09:48am] cyonyx: nickjohnson, when you say dropping updates, how would this be done? [09:49am] cyonyx: randomly? [09:49am] ikai_google: cyonyx: You'd either have to sacrifice some simplicity (sharding) or possible volatility/loss of data/ inconsistency (using memcache/flushing) [09:49am] nickjohnson: cyonyx: See the snippet I linked to as an example [09:49am] nickjohnson: That one uses a memcache entity to reduce the update frequency to one every n seconds (with more if the memcache key gets evicted prematurely) [09:50am] nickjohnson: Alternately, you can enqueue your updates on the task queue with a key name; subsequent attempts in the same time interval will return a NameERror. [09:52am] cyonyx: thanks for all your suggestions everyone. nickjohnson, i will take a look at the link you shared [09:53am] cyonyx: on another note, any word on the much anticipated map reduce? [09:53am] ikai_google: cyonyx: being worked on [09:54am] nickjohnson: cyonyx: Out of curiosity, are you using Java, or Python? [09:54am] cyonyx: python [09:55am] nickjohnson: What's your use-case for map/reduce functionality? [09:55am] cyonyx: will map reduce be transactional? such as global txns? [09:57am] nickjohnson: Unlikely; Global transactions in a distributed system are difficult, and very high overhead. [10:00am] ryan_google: you can use transactional tasks to get a weak form of global txes though [10:00am] nickjohnson: Good point! [10:01am] ryan_google: basically, in a tx, write to entity group A and enqueue a transactional task to write to entity group B [10:01am] ryan_google: it's not a true global tx - it's more of an eventually consistent tx, ie the B write is guaranteed to happen eventually - but it's something [10:01am] ryan_google: ...and on that note, looks like our hour is up [10:01am] cyonyx: ryan_google: i wasn't aware of that. thanks for the suggestions [10:02am] lurkdev: http://blog.notdot.net/2009/9/Distributed-Transactions-on-App-Engine [10:02am] nickjohnson: That ends this App Engine developer chat. Feel free to stick around; some of us hang around here semi-permanently, and are always happy to answer questions (if we're awake!) [10:03am] cyonyx: thanks for all the tips everyone [10:04am] nickjohnson: Our pleasure -- You received this message because you are subscribed to the Google Groups "Google App Engine for Java" group. To post to this group, send email to google-appengine-j...@googlegroups.com. To unsubscribe from this group, send email to google-appengine-java+unsubscr...@googlegroups.com. For more options, visit this group at http://groups.google.com/group/google-appengine-java?hl=en.