[appengine-java] Chat Time transcript for February 17, 2010

Jason (Google) Wed, 24 Feb 2010 01:54:15 -0800

Last Wednesday, the App Engine team hosted the latest session of its
bimonthly IRC office hours. A transcript of the session and a summary
of the topics covered is provided below. The next session will take
place next Wednesday, March 3rd from 7:00-8:00 p.m. PST in the
#appengine channel on irc.freenode.net.


- Jason


--SUMMARY-----------------------------------------------------------
- Q: What are my options do move my data from my App Engine datastore
in production to my PC? A: For now, you have to use the bulkloader
utility that ships with the Python SDK, but you can use this with Java-
based apps as well -- see 
http://blog.notdot.net/2009/9/Advanced-Bulk-Loading-Part-5-Bulk-Loading-for-Java
for details on setting up the appropriate handlers for a Java
environment. [9:12-9:16]

- Q: With the index row limit raised to 5,000, does this mean that I
can build queries involving multiple list properties? A: Yes. You
could do this before, technically, but you had to be more careful to
avoid exploding indexes, which could be caused by querying large list
properties. You still have to be concerned about this, but the larger
limit does give you more flexibility. [9:19-9:22]

- Discussion on using memcache to "lock" an entity for writing -- not
recommended (memcache is designed as a cache first and foremost) but
feasible. All the same, this should be avoided, especially since there
are no guarantees made that memcache won't be cleared periodically,
and you wouldn't want this locking metadata purged without warning.
[9:32-9:40]

- Discussion on mitigation strategies for contention -- in general,
you only need to think about sharding or other mechanisms if you
expect heavy contention (generally more than one write per second
consistently) on a single entity or entity group. Otherwise, even if
the occasional concurrent write request comes in, the system will
automatically retry the write after any failures, so you don't need to
alter your models to account for this. But for heavy contention, you
can look at sharding, using the task queue API, or even dropping
requests. [9:40-9:41, 9:43, 9:46-9:50]

- Q: When MapReduce functionality is available, will it support global
transactions? A: In general, global transactions are difficult to
support in a distributed system. You can achieve a weak form of global
transactions today with transactional tasks, which guarantee eventual
consistency. [9:55-10:01]


--FULL
TRANSCRIPT-----------------------------------------------------------
[09:08am] nickjohnson: Welcome to the bi-weekly App Engine chat time!
With us today are apijason_google and myself, with others to come.
[09:09am] nickjohnson: If you're having trouble sending messages to
the channel, please be patient, we're working on resolving that right
now.
[09:10am] apijason_google: Hi Everyone.
[09:10am] lurkdev: hello
[09:10am] cyonyx: hello
[09:11am] cresloyd: hello
[09:11am] webus: hello
[09:11am] apijason_google:
[09:12am] webus: how i can backup my data from GAE (BigTable) to my
local PC ?
[09:12am] nickjohnson: webus: Java, or Python?
[09:13am] webus: java
[09:15am] nickjohnson: For now, you need to use the Python bulkloader
to load and dump data - see 
http://blog.notdot.net/2009/9/Advanced-Bulk-Loading-Part-5-Bulk-Loading-for-Java
for details on setting up a handler, and
http://code.google.com/appengine/docs/python/tools/uploadingdata.html#Downloading_and_Uploading_All_Data
for details on dump and restore
[09:15am] apijason_google: webus: I think the only way right now is to
use the bulkloader tool that ships with the Python SDK. It should work
with Java apps too, and we're working on a better Java solution.
[09:15am] nickjohnson: ** If you're still having trouble sending to
the channel, please send me a PM **
[09:15am] webus: thnx!
[09:16am] cresloyd: there are third-party solutions to sync data
between the datastore and external databases; I don't know if Googlers
would want to recommend any of them, but you're welcome to try them of
course
[09:18am] ryan_google: hi!
[09:19am] ryan_google: aha i can talk now
[09:19am] nickjohnson: Hooray!
[09:19am] nickjohnson: Anyone who was trying to talk, you can try
again now.
[09:19am] cyonyx: now that the 5000 index limit has been set for an
entity, does that mean one can use more than one list property per
model as long as they stay within the 5000?
[09:20am] ryan_google: cyonyx: yes
[09:20am] ryan_google: i assume you've read
http://code.google.com/appengine/docs/python/datastore/queriesandindexes.html#Big_Entities_and_Exploding_Indexes
?
[09:22am] ryan_google: *crickets*
[09:22am] cyonyx: i have read that thank you.  the docs are great by
the way.  i was playing with some code and found that the 1000 limit
was removed for a list, so I tried adding more than 5000 to that list
and got an error
[09:23am] ryan_google: hmm
[09:23am] nickjohnson: cyonyx: The limit we removed was the 1000
result limit, and is unrelated to the max entries in a list.
[09:24am] ryan_google: specifically, i don't think we've ever had a
limit on the number of property values in a list property
[09:24am] ryan_google: well, indirectly 5k "indexed properties," true,
but that's for the entity as a whole
[09:26am] cyonyx: nickjohnson: ok.  not sure where I got the idea of
1000 per list, but now I know 5k is the magic number for an entity.
thanks for the clarifications
[09:28am] ryan_google: while i'm here, are there any other questions
about the datastore?
[09:30am] ryan_google: *louder crickets*
[09:30am] ryan_google: hmm guess we can go home early
[09:30am] nickjohnson: Certainly quiet here today.
[09:30am] ikai_google: I rushed in for this ...
[09:30am] ikai_google:
[09:30am] ikai_google: so I'm guessing 1.3.1 was a crowd pleaser
[09:31am] cresloyd: should someone remind people about the outage that
is scheduled for later today?
[09:31am] ryan_google: hard to say, at least based on the quiet
[09:31am] ryan_google: i'd like to remind everyone that there's an
outage scheduled for later today.
[09:32am] cresloyd: really?  tell me more
[09:32am] cyonyx: do ae developers use memcache as a way to write lock
an entity, to prevent contention?
[09:32am] ikai_google: cresloyd: Yes, there's a downtime notification
list
[09:32am] ryan_google: kidding...but yes, there will be a read only
period at roughly 5pm
[09:32am] ikai_google: cyonyx: You could also transact on it
[09:33am] morais78: oh... I'd just logged it a few minutes ago and
thought "ohhh wrong time/day"
[09:33am] nickjohnson: cyonyx: You can do that, sort of, but it's not
recommended.
[09:33am] nickjohnson: Locking should be a last resort.
[09:34am] ikai_google: +1 on nick johnson's comment
[09:34am] ikai_google: if you need to lock, remember that every save
of that kind you do pays that price
[09:35am] lent: anything happening on the application loading area?
[09:35am] ikai_google: lent: lots
[09:35am] nickjohnson: If you've got a specific use-case in mind, feel
free to tell us about it and we may be able to suggest some
alternatives to locking.
[09:35am] ryan_google: more importantly, you shouldn't really use
memcache for any kind of correctness, like locking, since it's just a
cache
[09:35am] ikai_google: lent: There's a lot going on with regards to
things like trying to figure out ways to cut resource consumption
(resulting in less cycling)
[09:35am] ryan_google: your lock could be flushed, and another request
could come in and acquire it, and then you'd have two requests both
thinking they hold the lock
[09:35am] ikai_google: lent: As well as ways to optimize loading
requests in general
[09:38am] cyonyx: thanks for the feedback.  i was just thinking of all
the possibilities of ways to reduce contention, and considered
memcache write lock to be one of them, but from consensus, it should
probably be avoided.
[09:40am] nickjohnson: cyonyx: If you just need to reduce contention,
you can use that. I've met with some success using memcache.add with a
timeout, and caching results in memcache if it returns False
[09:40am] ryan_google: out of curiosity, are you actually seeing
contention? or are you just thinking ahead? or...?
[09:40am] ryan_google: and by contention, what do you mean
specifically?
[09:41am] cyonyx: just thinking ahead.  contention meaning where two
simultaneous ds requests, try to update the same entity at the same
time
[09:41am] nickjohnson:
http://code.google.com/p/googleappengine/source/browse/trunk/python/google/appengine/ext/ereporter/ereporter.py#218
- that snippet will run a transaction but generally be limited to one
per entity group per log_interval seconds
[09:41am] nickjohnson: (Though that's not guaranteed, of course, since
values can be evicted)
[09:43am] lent: In the blog entries about simple joins, it mentioned:
"Fourth, if you're using owned relationships, in order to use simple
joins you're going to need to upgrade your storage version and migrate
existing data."  What does this mean exactly?
[09:43am] ryan_google: cyonyx, we already handle that kind of
contention for you. the datastore backend automatically retries writes
outside of transactions, and the api code retries transactions
themselves
[09:43am] ryan_google: generally you only need to worry if you expect
*heavy* contention on a single entity group. occasional concurrent
writes won't really cause a problem.
[09:44am] nickjohnson: +1 to what Ryan said, of course.
[09:46am] cyonyx: what if a feature was created that was on a first
come first serve basis, and there were 500 simultaneous requests for
transacting on a given entity?  how would you recommend handling this?
[09:46am] ryan_google: honestly, you'd want to redesign that feature
[09:46am] ikai_google: lent: It just means that you need to update
your persistence classes
[09:46am] nickjohnson: cyonyx: that depends on the nature of the
updates. There are a number of techniques, including sharding, the
task queue, and simply dropping updates.
[09:46am] ryan_google: our general rule of thumb is that a single
entity group can handle at most 1-10qps of writes
[09:46am] ryan_google: (to be safe, read that as at most 1qps of
writes)
[09:47am] ikai_google: lent: My guess is storage version = either
optimistic locking or your serialization version
[09:47am] ryan_google: good points. as usual, my colleagues are much
more helpful than me.
[09:47am] ikai_google: cyonyx: That feature wouldn't scale anywhere.
Relational datastore, filesystem, etc
[09:48am] ikai_google: cynonyx: For some ideas, check out Nick
Johnson's blog about sharded counters
[09:48am] cyonyx: nickjohnson, when you say dropping updates, how
would this be done?
[09:49am] cyonyx: randomly?
[09:49am] ikai_google: cyonyx: You'd either have to sacrifice some
simplicity (sharding) or possible volatility/loss of data/
inconsistency (using memcache/flushing)
[09:49am] nickjohnson: cyonyx: See the snippet I linked to as an
example
[09:49am] nickjohnson: That one uses a memcache entity to reduce the
update frequency to one every n seconds (with more if the memcache key
gets evicted prematurely)
[09:50am] nickjohnson: Alternately, you can enqueue your updates on
the task queue with a key name; subsequent attempts in the same time
interval will return a NameERror.
[09:52am] cyonyx: thanks for all your suggestions everyone.
nickjohnson, i will take a look at the link you shared
[09:53am] cyonyx: on another note, any word on the much anticipated
map reduce?
[09:53am] ikai_google: cyonyx: being worked on
[09:54am] nickjohnson: cyonyx: Out of curiosity, are you using Java,
or Python?
[09:54am] cyonyx: python
[09:55am] nickjohnson: What's your use-case for map/reduce
functionality?
[09:55am] cyonyx: will map reduce be transactional?  such as global
txns?
[09:57am] nickjohnson: Unlikely; Global transactions in a distributed
system are difficult, and very high overhead.
[10:00am] ryan_google: you can use transactional tasks to get a weak
form of global txes though
[10:00am] nickjohnson: Good point!
[10:01am] ryan_google: basically, in a tx, write to entity group A and
enqueue a transactional task to write to entity group B
[10:01am] ryan_google: it's not a true global tx - it's more of an
eventually consistent tx, ie the B write is guaranteed to happen
eventually - but it's something
[10:01am] ryan_google: ...and on that note, looks like our hour is up
[10:01am] cyonyx: ryan_google:  i wasn't aware of that. thanks for the
suggestions
[10:02am] lurkdev: 
http://blog.notdot.net/2009/9/Distributed-Transactions-on-App-Engine
[10:02am] nickjohnson: That ends this App Engine developer chat. Feel
free to stick around; some of us hang around here semi-permanently,
and are always happy to answer questions (if we're awake!)
[10:03am] cyonyx: thanks for all the tips everyone
[10:04am] nickjohnson: Our pleasure

-- 
You received this message because you are subscribed to the Google Groups 
"Google App Engine for Java" group.
To post to this group, send email to google-appengine-j...@googlegroups.com.
To unsubscribe from this group, send email to 
google-appengine-java+unsubscr...@googlegroups.com.
For more options, visit this group at 
http://groups.google.com/group/google-appengine-java?hl=en.

[appengine-java] Chat Time transcript for February 17, 2010

Reply via email to