Issue Type: Bug Bug
Affects Versions: JRuby 1.7.0.pre1
Assignee: Unassigned
Components: Core Classes/Modules
Created: 31/Jul/12 2:22 AM
Description:

Reading a larger number of files (27K in our case) is about 10 times slower with JRuby trunk compared to 1.6.x. Both interpreters were in 1.9 mode. Running the same benchmark in 1.8 mode reduces the difference significantly, or even removes it.

script

require 'json'
start = Time.now
entries = Dir.glob("../var/jobs/*/*/*/*/*/request")
puts "Number of entries: #{entries.length}"
File.open("urls", "w") do |fh|
  entries.each do |file|
    content = File.read(file)
    # json = JSON.load(content)
    # url = ""
    # raise if url =~ /\n/
    # fh.puts url
  end
end

puts Time.now - start

benchmark

vjruby -v --1.9 extract.rb; jruby -v --1.9 extract.rb
jruby 1.7.0.preview2.dev (ruby-1.9.3-p203) (2012-07-26 3ae8885) (Java HotSpot(TM) 64-Bit Server VM 1.7.0_05) [darwin-x86_64-java]
Number of entries: 27077
29.691
jruby 1.6.4 (ruby-1.9.2-p136) (2011-08-23 17ea768) (Java HotSpot(TM) 64-Bit Server VM 1.7.0_05) [darwin-x86_64-java]
Number of entries: 27077
3.417

sample

vjruby --sample -v --1.9 extract.rb;
jruby 1.7.0.preview2.dev (ruby-1.9.3-p203) (2012-07-26 3ae8885) (Java HotSpot(TM) 64-Bit Server VM 1.7.0_05) [darwin-x86_64-java]
Number of entries: 27077
28.908

Flat profile of 30,25 secs (2711 total ticks): main

  Interpreted + native Method
 13,3% 361 + 0 sun.nio.cs.ext.ExtendedCharsets.<init>
  0,6% 0 + 17 sun.nio.ch.FileDispatcherImpl.preClose0
  0,6% 16 + 0 org.jruby.Ruby.getClassLoader
  0,4% 0 + 12 java.io.RandomAccessFile.open
  0,4% 0 + 12 java.lang.Throwable.fillInStackTrace
  0,4% 0 + 10 java.lang.Class.forName0
  0,4% 10 + 0 sun.net.www.MessageHeader.grow
  0,3% 0 + 9 java.io.UnixFileSystem.getBooleanAttributes0
  0,3% 8 + 1 com.kenai.jffi.Foreign.invokeN2O2
  0,3% 7 + 0 java.util.Arrays.copyOfRange
  0,3% 7 + 0 java.util.regex.Matcher.<init>
  0,2% 6 + 0 java.nio.HeapCharBuffer.<init>
  0,2% 6 + 0 jnr.ffi.provider.AbstractArrayMemoryIO.<init>
  0,2% 0 + 6 sun.misc.Unsafe.getObject
  0,2% 6 + 0 org.jruby.internal.runtime.methods.DynamicMethod.call
  0,2% 1 + 4 java.security.AccessController.doPrivileged
  0,2% 5 + 0 org.jruby.javasupport.util.RuntimeHelpers.restructureBlockArgs19
  0,2% 5 + 0 org.jruby.util.Dir.glob_helper
  0,2% 2 + 3 java.io.UnixFileSystem.list
  0,2% 5 + 0 java.nio.Bits.reserveMemory
  0,1% 0 + 4 sun.nio.ch.FileDispatcherImpl.size0
  0,1% 4 + 0 org.jruby.RubyHash.alloc
  0,1% 0 + 4 sun.nio.ch.FileDispatcherImpl.read0
  0,1% 4 + 0 org.jruby.RubyIO.read19
  0,1% 4 + 0 org.jruby.util.ByteList.<init>
 25,0% 552 + 125 Total interpreted (including elided)

     Compiled + native Method
 24,0% 645 + 6 java.util.TreeMap.put
 23,4% 635 + 0 sun.nio.cs.AbstractCharsetProvider.charset
  0,5% 0 + 13 sun.nio.cs.StreamDecoder.implRead
  0,3% 0 + 8 org.jruby.util.io.ChannelStream.readall
  0,2% 0 + 5 java.lang.ClassLoader.getSystemClassLoader
  0,2% 0 + 5 java.lang.Object.<init>
  0,1% 0 + 4 org.jruby.util.io.ChannelStream.fclose
  0,1% 4 + 0 sun.nio.cs.ext.ExtendedCharsets.<init>
  0,1% 0 + 3 java.util.zip.ZipFile$ZipFileInputStream.close
  0,1% 0 + 2 org.jruby.RubyFile$INVOKER$i$0$2$initialize19.call
  0,0% 0 + 1 java.net.URL.openConnection
  0,0% 0 + 1 java.util.jar.JarFile.getEntry
  0,0% 1 + 0 java.util.HashMap.put
 49,2% 1285 + 48 Total compiled

         Stub + native Method
  9,4% 20 + 235 java.security.AccessController.doPrivileged
  5,7% 13 + 142 java.io.UnixFileSystem.getBooleanAttributes0
  3,9% 0 + 105 java.util.zip.ZipFile.getEntry
  0,7% 0 + 20 java.lang.Throwable.fillInStackTrace
  0,7% 0 + 19 java.lang.Class.forName0
  0,6% 10 + 5 java.security.AccessController.doPrivileged
  0,6% 0 + 15 java.lang.System.arraycopy
  0,5% 0 + 14 java.io.RandomAccessFile.open
  0,5% 0 + 13 java.util.zip.ZipFile.read
  0,4% 0 + 10 java.lang.Thread.currentThread
  0,4% 0 + 10 com.kenai.jffi.Foreign.invokeN2O2
  0,3% 0 + 9 sun.nio.ch.FileDispatcherImpl.preClose0
  0,2% 0 + 6 java.lang.Class.getClassLoader0
  0,2% 0 + 6 sun.misc.Unsafe.getObjectVolatile
  0,2% 0 + 6 sun.misc.Unsafe.setMemory
  0,2% 0 + 5 java.lang.Object.hashCode
  0,1% 0 + 4 sun.nio.ch.FileDispatcherImpl.read0
  0,1% 0 + 3 java.io.RandomAccessFile.close0
  0,1% 0 + 2 java.util.zip.ZipFile.freeEntry
  0,1% 0 + 2 sun.nio.ch.FileDispatcherImpl.size0
  0,1% 0 + 2 java.lang.String.intern
  0,1% 0 + 2 java.util.zip.Inflater.inflateBytes
  0,1% 0 + 2 java.util.zip.ZipFile.getEntryCSize
  0,0% 0 + 1 java.util.zip.ZipFile.getEntryTime
  0,0% 0 + 1 java.util.zip.ZipFile.getEntryCrc
 25,3% 43 + 644 Total stub (including elided)

  Thread-local ticks:
  0,5% 14 Class loader


Flat profile of 0,00 secs (1 total ticks): DestroyJavaVM

  Thread-local ticks:
100,0% 1 Blocked (of total)


Global summary of 30,25 seconds:
100,0% 2733 Received ticks
  0,7% 19 Received GC ticks
 14,5% 395 Compilation
  0,0% 1 Other VM operations
  0,5% 14 Class loader

adhoc irc analysis

18:11 < enebo> reto_, headius: This is likely because we have no caching CharsetTranscoder atm
18:11 < reto_> okay, forget 1.8 mode completly, thats just startup
18:12 < headius> so we could cache based on to/from in the future, I get that
18:12 < headius> but I'm confused why we can't trust jcodings Encoding.getCharset first…are they reporting wrong Charset?
18:12 < enebo> Many common IO cases we can make a fully caching CharsetTranscoder but I did not go there yet since I suspect transcoding fixes are not complete yet
18:13 < headius> I see your comment about aliases, but if jcodings Encoding.getCharset is giving back a wrong charset, that's a bug
18:13 < headius> it *should* be the proper Charset at least
18:13 < enebo> headius: I don't remember exact details (that fix was from a long time ago) but it was definitely a problem with what jcodings had versus what we could get with forName
18:13 < headius> hmm
18:13 < enebo> Oh I remember
18:14 < headius> I suspect that Charset.forName is a lot more expensive than we realize, based on profiles
18:14 < enebo> ascii-8bit ends up as an alias for lots of ascii subsets
18:14 < headius> I remember at one point in the past it was actually a synchronization bottleneck
18:14 < enebo> It only has ONE charset associated with it
18:14 < enebo> All aliases actually have their own charset
18:14 < headius> ahhh sure
18:14 < enebo> So the solution would be to have no aliases in jcodings
18:14 < enebo> Or a map to charsets
18:14 < enebo> for aliases
18:15 < headius> hmmm yeah
18:15 < enebo> big change to jcodings
18:15 < enebo> but I think this problem is somewhat mitigated by specialized charsettranscoder
18:15 < headius> yeah, I think it would be
18:15 < enebo> Assuming this is not a tight loop doing n million new io and one op on it
18:15 < headius> one that has Charset fields for to and from, and then we just save the CharsetTranscoder instances
18:15 < enebo> Even then I suspect the charset lookup won't be the big cost
18:16 < headius> it doesn't mutate
18:16 < enebo> headius: yeah we just need one subclass when we know both to and from
18:16 < enebo> In that case it will only happen once at construction of transcoder
18:16 < headius> I don't think that would be a difficult addition
18:16 < enebo> yeah it is simple
18:16 < headius> subclass?
18:16 < enebo> subclas of CharsetTranscoder
18:16 < headius> I figured we'd have a cache in Ruby somewhere
18:17 < headius> if not there, create a transcoder and save it
18:17 < headius> it just seems simplest to me
18:17 < enebo> This is just because we don't want it for n io ops on once io object?
18:17 < headius> right
18:18 < enebo> saving it from all io creation seems like way more complexity
18:18 < enebo> Since you now have to manage the cache
18:18 < headius> manage schmanage
18:18 < enebo> and you need to key off to,from,codingactions
18:18 < enebo> I guess charset itself could be cached simply too
18:18 < headius> we could also just try a static forName cache first
18:18 < enebo> but heh
18:18 < headius> it could be static because we'd cache string => charset…all bootclasses
Environment: Java 1.7.0_05-b05
Project: JRuby
Priority: Major Major
Reporter: Reto Schüttel
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators.
For more information on JIRA, see: http://www.atlassian.com/software/jira
--------------------------------------------------------------------- To unsubscribe from this list, please visit: http://xircles.codehaus.org/manage_email

Reply via email to