Setting asside the excellent responses that have already been made in this 
thread, there are fundemental discrepencies in what you are comparing in 
your respective timing tests.

first off: a micro benchmark like this is virtually useless -- unless you 
really plan on only ever executing a single query in a single run of a 
java application that then terminates, trying to time a single query is 
silly -- you should do lots and lots of iterations using a large set of 
sample inputs.

Second: what you are timing is vastly different between the two cases.

In your Solr timing, no communication happens over the wire to the solr 
server until the call to server.query() inside your time stamps -- if you 
were doing multiple requests using the same SolrServer object, the HTTP 
connection would get re-used, but as things stand your timing includes all 
of hte network overhead of connecting to the server, sending hte request, 
and reading the response.

in your oracle method however, the timestamps you record are only arround 
the call to executeQuery(), rs.next(), and rs.getString() ... you are 
ignoring the timing neccessary for the getConnection() and 
prepareStatement() methods, which may be significant as they both involved 
over the wire communication with the remote server (And it's not like 
these are one time execute and forget about them methods ... in a real 
long lived application you'd need to manage your connections, re-open if 
they get closed, recreate the prepared statement if your connection has to 
be re-open, etc... )

Your comparison is definitly apples and oranges.


Lastly, as others have mentioned: 150-200ms to request a single document 
by uniqueKey from an index containing 800K docs seems ridiculously slow, 
and suggests that something is poorly configured about your solr instance 
(another apples to oranges comparison: you've got an ad-hoc solr 
installation setup on your laptop and you're benchmarking it against a 
remote oracle server running on dedicated remote hardware that has 
probably been heavily tunned/optimized for queries).  

You haven't provided us any details however about how your index is setup, 
or how you have confiugred solr, or what JVM options you are using to run 
solr, or what physical resources are available to your solr process (disk, 
jvm heap ram, os file system cache ram) so there isn't much we can offer 
in the way of advice on how to speed things up.


FWIW:  On my laptop, using Solr 4.4 w/ the example configs and built in 
jetty (ie: "java -jart start.jar") i got a 3.4 GB max heap, and a 1.5 GB 
default heap, with plenty of physical ram left over for the os file system 
cache of an index i created containing 1,000,000 documents with 6 small 
fields containing small amounts of random terms.  I then used curl to 
execute ~4150 requests for documents by id (using simple search, not the 
/get RTG handler) and return the results using JSON.

This commpleted in under 4.5 seconds, or ~1.0ms/request.

Using the more verbose XML response format (after restarting solr to 
ensure nothing in the query result caches) only took 0.3 seconds longer on 
the total time (~1.1ms/request)

$ time curl -sS 
'http://localhost:8983/solr/collection1/select?q=id%3A[1-1000000:241]&wt=json&indent=true'
 > /dev/null

real    0m4.471s
user    0m0.412s
sys     0m0.116s
$ time curl -sS 
'http://localhost:8983/solr/collection1/select?q=id%3A[1-1000000:241]&wt=xml&indent=true'
 > /dev/null

real    0m4.868s
user    0m0.376s
sys     0m0.136s
$ java -version
java version "1.7.0_25"
OpenJDK Runtime Environment (IcedTea 2.3.10) (7u25-2.3.10-1ubuntu0.12.04.2)
OpenJDK 64-Bit Server VM (build 23.7-b01, mixed mode)
$ uname -a
Linux frisbee 3.2.0-52-generic #78-Ubuntu SMP Fri Jul 26 16:21:44 UTC 2013 
x86_64 x86_64 x86_64 GNU/Linux






-Hoss

Reply via email to