I gathered the following data using the thrift-protobuf-compare benchmarks 
(multiple runs with each change). It's probably not very representative of any 
real world workload, but it's what I have to work with at the moment.

Original (some pre-0.1 version of Thrift)
                        ,   Object create,       Serialize,  /w Same Object,    
 Deserialize, and Check Media,   and Check All,      Total Time, Serialized Size
thrift                  ,       318.93000,      4675.50000,      6708.00000,    
  5040.00000,             NaN,             NaN,      9715.50000,        353
thrift                  ,       352.25500,      4556.00000,      6658.50000,    
  5086.00000,             NaN,             NaN,      9642.00000,        353
thrift                  ,       350.59000,      4572.00000,      6748.50000,    
  5067.00000,             NaN,             NaN,      9639.00000,        353

Thrift 0.2
                        ,   Object create,       Serialize,  /w Same Object,    
 Deserialize, and Check Media,   and Check All,      Total Time, Serialized Size
thrift                  ,       352.36500,      4506.00000,      6686.00000,    
  5783.00000,             NaN,             NaN,     10641.36500,        353
thrift                  ,       329.13000,      4462.50000,      6757.00000,    
  5770.50000,             NaN,             NaN,     10562.13000,        353
thrift                  ,       331.42000,      4503.00000,      6803.00000,    
  5768.50000,             NaN,             NaN,     10602.92000,        353

Thrift 0.2 w/TCompactProtocol
                        ,   Object create,       Serialize,  /w Same Object,    
 Deserialize, and Check Media,   and Check All,      Total Time, Serialized Size
thrift                  ,       324.50500,      4467.00000,      6426.00000,    
  5703.00000,             NaN,             NaN,     10494.50500,        234
thrift                  ,       323.00500,      4418.00000,      6423.00000,    
  5703.00000,             NaN,             NaN,     10444.00500,        234
thrift                  ,       330.57000,      4388.50000,      5244.50000,    
  5704.00000,             NaN,             NaN,     10423.07000,        234

Thrift 0.2 w/TCompactProtocol + read() fix 
                        ,   Object create,       Serialize,  /w Same Object,    
 Deserialize, and Check Media,   and Check All,      Total Time, Serialized Size
thrift                  ,       327.55000,      4201.00000,      6163.50000,    
  4988.50000,             NaN,             NaN,      9517.05000,        234
thrift                  ,       326.82500,      4194.00000,      5223.00000,    
  4975.50000,             NaN,             NaN,      9496.32500,        234
thrift                  ,       324.36000,      4191.00000,      6217.00000,    
  4978.50000,             NaN,             NaN,      9493.86000,        234

Thrift 0.2 w/TCompactProtocol + read() fix + enum fix
                        ,   Object create,       Serialize,  /w Same Object,    
 Deserialize, and Check Media,   and Check All,      Total Time, Serialized Size
thrift                  ,       346.90500,      4130.00000,      5311.00000,    
  4637.50000,             NaN,             NaN,      9114.40500,        234
thrift                  ,       310.77500,      4168.50000,      5448.00000,    
  4718.00000,             NaN,             NaN,      9197.27500,        234
thrift                  ,       314.56000,      4147.50000,      5540.50000,    
  4645.50000,             NaN,             NaN,      9107.56000,        234


The "read() fix" reverts read() to use a switch on the field.id int rather than 
mapping to the enum and switching on that. The "enum fix" gets rid of the map 
in the enum classes and implements findByValue() as switch. I just applied 
these fixes by hand in the generated files.

I'll go ahead and open issues on both of these items.

Chad

-----Original Message-----
From: Bryan Duxbury [mailto:[email protected]] 
Sent: Tuesday, January 12, 2010 11:05 PM
To: [email protected]
Subject: Re: Perf regression in Java bindings?

Chad -

This is an interesting observation. Do you have a sense of how much of a
difference it makes?

I think you should open a ticket for this regression. Your proposed
solutions sound viable and easy, so I can probably whip something up prett
quickly.

-Bryan

On Tue, Jan 12, 2010 at 10:54 PM, Chad Walters
<[email protected]>wrote:

> I was playing around with upgrading some benchmarks to Thrift 0.2 and I
> think that THRIFT-623 <https://issues.apache.org/jira/browse/THRIFT-623>
> may have introduced a performance regression.
>
> Struct deserialization now does a map lookup to go from an integer to an
> enumeration, checks for null in case the integer was not known, and then
> does a switch on the value of the enumeration.
>
> But it appears to me that this mapping is entirely fixed at compile time.
> Furthermore the enumeration does not appear to be used for anything other
> than performing the switch (in the struct reading routine at least).
>
> This leads me to ask:
> 1. Why not just switch on the integer and have a default case for unknown
> values?
> 2. Why have a dynamic map for a mapping that is known at compile time?
> Wouldn't it be faster to just provide the mapping via a switch (when it is
> needed at all)?
>
> Perhaps I am missing some subtlety but I thought I'd ask...
>
> Chad
>
>
>

Reply via email to