Re: Serializing large data sets

Dave Engberg Fri, 11 Jun 2010 08:32:46 -0700

Evernote uses Thrift for all client-server communications, includingthird-party API integrations(http://www.evernote.com/about/developer/api/). We serialize messagesup to 55MB via Thrift. This is very efficient on the wire, butmarshalling and unmarshalling objects can take a fair amount of RAM dueto various temporary buffers built into the networking and IO runtimelibraries.



On 6/11/10 8:26 AM, Abhay M wrote:

Hi,

Are there any know concerns with serializing large data sets with Thrift? I
am looking to serialize messages with 10-150K records, sometimes resulting
in ~30M per message. These messages are serialized for storage.

I have been experimenting with Google protobuf and saw this in the
documentation (
http://code.google.com/apis/protocolbuffers/docs/techniques.html) -
"Protocol Buffers are not designed to handle large messages. As a general
rule of thumb, if you are dealing in messages larger than a megabyte each,
it may be time to consider an alternate strategy."
FWIW, I did switch to delimited write/parse API (Java only) as recommended
in the doc and it works well. But, Python protobuf impl lacks this API and
is slow.

Thanks
Abhay

Re: Serializing large data sets

Reply via email to