ruby transport class read buffer very slow due to usage of slice!
-----------------------------------------------------------------

                 Key: THRIFT-401
                 URL: https://issues.apache.org/jira/browse/THRIFT-401
             Project: Thrift
          Issue Type: Improvement
          Components: Library (Ruby)
         Environment: # uname -a
Linux zvm.local 2.6.9-78.0.1.ELsmp #1 SMP Tue Aug 5 11:02:47 EDT 2008 i686 i686 
i386 GNU/Linux

            Reporter: Tyler Kovacs
            Priority: Minor
         Attachments: before.png

We use Thrift as a cross-language transport for Hypertable - an open-source 
distributed database.  While profiling queries with large response using the 
ruby Thrift libraries, we discovered that the majority of time was spent in 
thrift/transport.rb.  Specifically, the slice! method, which is used to manage 
the read buffer (@rbuf) was responsible for almost all latency.

We tried an alternative implementation that showed 300x speedup in our tests.  
Instead of repeatedly calling slice! to alter @rbuf (which apparently is 
extremely expensive), we maintain an offset counter (@rpos) which starts at 
zero and is incremented by sz each time we read from @rbuf.  Before and after 
screenshots from kcachegrind are attached.  

I'll copy the monkey patch that we use within the description below - and I'll 
try to assemble a patch later today.  

module Thrift
  class FramedTransport < Transport
    def initialize(transport, read=true, write=true)
      @transport = transport
      @rbuf      = ''
      @wbuf      = ''
      @read      = read
      @write     = write
      @rpos      = 0
    end

    def read(sz)
      return @transport.read(sz) unless @read

      return '' if sz <= 0

      read_frame if @rpos >= @rbuf.length

      @rpos += sz
      @rb...@rpos - sz, sz] || ''
    end

    def borrow(requested_length = 0)
      read_frame if @rpos >= @rbuf.length

      # there isn't any more coming, so if it's not enough, it's an error.
      raise EOFError if requested_length > (@rbuf.length - @rpos)

      @rb...@rpos, requested_length]
    end

    def consume!(size)
      @rpos += size
      @rb...@rpos - size, size]
    end

    private

    def read_frame
      sz = @transport.read_all(4).unpack('N').first

      @rpos = 0
      @rbuf = @transport.read_all(sz)
    end
  end
end


-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to