On 7 July 2011 16:55, Jim Fulton <j...@zope.com> wrote: > On Thu, Jul 7, 2011 at 10:49 AM, Laurence Rowe <l...@lrowe.co.uk> wrote: > ... >> One thing I found with my (rather naive) experiments building >> s3storage a few years ago is that you need to ensure requests to S3 >> are made in parallel to get reasonable performance. This would be a >> lesser problem with blobs, but even then you might have multiple file >> uploads in the same request. The boto library is really useful, but >> doesn't support async requests. > > Right, it occurred to me that commit performance with s3 might be an issue. > >> I guess the simplest implementation would only upload a blob to S3 in >> tpc_begin as that is where the tid is set (and presumably the tid will >> form part of the blob's S3 url.) With large files that might make >> tpc_begin take a long time to complete as it waits for the blob data >> to be loaded into S3. It might be better to upload large blobs to a >> temporary s3 url first and then only make an S3 copy in tpc_begin, >> you'd need to do some benchmarks to see if this was worthwhile for all >> files or only files over a certain size. > > I think I get where you're going, although I'd quibble with the details. > There is certainly some opportunity for doing things in parallel > up until you get to tpc_vote. I wonder if renames in S3 take much > time. I can image that they do.
Thinking about this again, perhaps it would be better to store a url or uuid in the blob's record. This would allow a blob's S3 url to be assigned much earlier as it need not contain the tid. The commit would not then need to involve any requests to S3 at all. While I don't suppose an S3 copy request should be any slower than a zero byte PUT (S3 only promises eventual consistency), you still need to pay the latency. Laurence _______________________________________________ For more information about ZODB, see the ZODB Wiki: http://www.zope.org/Wikis/ZODB/ ZODB-Dev mailing list - ZODB-Dev@zope.org https://mail.zope.org/mailman/listinfo/zodb-dev