[
https://issues.apache.org/jira/browse/SOLR-122?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12468498
]
Coda Hale commented on SOLR-122:
--------------------------------
Yonik -- The results switch back once the text gets more complicated than
"woot." Your escape function is really fast as long as the block passed to
String#gsub never gets called -- if there's nothing there to escape. Blocks are
pretty slow compared with other means of branching. Good catch on the regexp
compiling -- I didn't think that String#gsub compiled the first parameter to a
Regexp every time.
Here's how it looks with 1000 random characters of [A-Za-z0-9<>&], 100,000
times each:
user system total real
string concatenation: 9.320000 0.070000 9.390000 (
9.921551)
string substitution: 9.210000 0.050000 9.260000 (
9.660138)
string concatenation2: 7.610000 0.050000 7.660000 (
7.919240)
string substitution2: 7.550000 0.040000 7.590000 (
7.817162)
catenation w/ single pass escape: 12.640000 0.070000 12.710000 (
13.121503)
substitution w/ single pass escape: 12.420000 0.070000 12.490000 (
12.845156)
libxml: 2.050000 0.010000 2.060000 (
2.108470)
libxml back in the lead. ;-)
Also, if you're on Mac or Linux, you can install libxml-ruby as follows: sudo
gem install libxml-ruby
Be sure you've installed libxml2 first (sudo port install libxml2, sudo apt-get
install libxml2, sudo rpm something-or-other).
If you're on Windows, you'll just have to take my word for it.
====
require "benchmark"
require "rexml/document"
require "rubygems"
require "xml/libxml"
TESTS = 100_000
CHARS = ('A'..'Z').to_a + ('a'..'z').to_a + ('0'..'9').to_a + ['<', '>', '&']
TEXT = ""
1000.times do
TEXT << CHARS[rand(CHARS.size)]
end
def escape(text)
text.gsub(/([&<>])/) { |ch|
case ch
when '&' then '&'
when '<' then '<'
when '>' then '>'
end
}
end
Benchmark.bmbm do |results|
results.report("string concatenation:") do
TESTS.times do
x = "<blah>"
x << TEXT.gsub("&", "&").gsub("<", "<").gsub(">", ">")
x << "</blah>"
end
end
results.report("string substitution:") do
TESTS.times do
x = "<blah>#{TEXT.gsub("&", "&").gsub("<", "<").gsub(">",
">")}</blah>"
end
end
results.report("string concatenation2:") do
TESTS.times do
x = "<blah>"
x << TEXT.gsub(/&/, '&').gsub(/</, '<').gsub(/>/, '>')
x << "</blah>"
end
end
results.report("string substitution2:") do
TESTS.times do
x = "<blah>#{TEXT.gsub(/&/, '&').gsub(/</, '<').gsub(/>/,
'>')}</blah>"
end
end
results.report("catenation w/ single pass escape:") do
TESTS.times do
x = "<blah>"
x << escape(TEXT)
x << "</blah>"
end
end
results.report("substitution w/ single pass escape:") do
TESTS.times do
x = "<blah>#{escape(TEXT)}</blah>"
end
end
results.report("libxml:") do
TESTS.times do
e = XML::Node.new("blah")
e << TEXT
e.to_s
end
end
end
> Add optional support for Ruby-libxml2 (vs. REXML)
> -------------------------------------------------
>
> Key: SOLR-122
> URL: https://issues.apache.org/jira/browse/SOLR-122
> Project: Solr
> Issue Type: Improvement
> Components: clients - ruby - flare
> Reporter: Coda Hale
> Attachments: libxml.rb, libxml.rb
>
>
> This file adds drop-in support for the ruby-libxml2, which is a wrapper for
> the libxml2 library, which is an order of magnitude or so faster than REXML.
> This depends on my SOLR-121 patch for multi-document adds, since the behavior
> of Solr::Request::AddDocument#to_s is different.
> Requiring this makes some tests fail, but for trivial reasons: some tests are
> directly tied to REXML, others fail due to interelement whitespace added by
> libxml2 (which you can't disable via the Ruby interface). Functionally, it's
> identical, and passes all functional tests.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.