Re: CouchDB / NoSQL Benchmarking

Dave Amies Wed, 04 Nov 2015 06:14:46 -0800

Hi Craig,

Thanks for your comments, would you mind sharing your hardware details and
your benchmark results?



here are my initial thoughts:

* Convert StoreKV to use bulk POST - it is taking 10 mins to write the
initial 200k docs on my system, bulk should knock that down to a few seconds

I had initially thought about doing this, but decided against it because I
am trying to keep the code as generic as possible, so that with minimal
modifications it can be used with other NoSQL databases. I didn't want to
create the perception of giving one NoSQL database an unfair advantage. I
guess what I need to know is do all the other NoSQL databases (or at least
the Key Value Pair Store and Document Store ones) all have this bulk load
functionality via their REST API's?

Note this is also the reason I used the python requests (generic http) library
and not the python couchdb library, even though the python couchdb library
would have been easier.


* Set the content type in the put/post to 'application/json' - at the mo it
is blank
This is probably a good idea, I will do this. I put this benchmark tool
together in some spare time I had so didn't expect that I had it perfect
first go. Besides i'm not the worlds best programmer either :) the whole
point of making it open source is so people can critique and help improve
the code, so thanks.


* Add a script to initialize/reset the kvbench db

Hmm, again I was trying to keep things generic, but as this is not actually
part of the benchmark but the setup steps I guess we could do this. by the
same token, I didn't see the need at the moment as deleting a database in
futon is a 2 click operation, and re-creating the database and loading the
design document in futon was not much more, it didn't take very long to
reset manually and any script to do this would need to know the admin
user's name and password (unless admin party is on) and would then need to
deal with the complexity of prompting for the password or having it stored
in the script (bad practice).

Actually this can be scripted with 3 curl commands:

curl -X DELETE http://127.0.0.1:5984/kvbench
curl -X PUT http://127.0.0.1:5984/kvbench
curl -X PUT http://127.0.0.1:5984/kvbench/_design/KVB -d '{  "_id":
"_design/KVB",  "language": "javascript",  "views": {    "seconds": {
"map": "function(doc) {\n  emit(doc.seconds.toFixed(0), doc.seconds);\n}",
    "reduce": "_count"    },    "summary": {      "map": "function(doc) {\n
 emit(doc.summary, 1);\n}",      "reduce": "_count"    }  } }'

Naturally you will need to use -u in these curl commands if admin party is
disabled.






On Wed, Nov 4, 2015 at 11:05 PM, Craig Minihan <[email protected]>
wrote:

> Dave, code looks very useful. I'm developing an in-memory CouchDB API
> compatible DB (https://github.com/RipcordSoftware/AvanceDB) so being able
> to x-ref performance with CouchDB would be very handy.
>
> I'd like to PR a few changes into the repo if you don't mind:
> * Convert StoreKV to use bulk POST - it is taking 10 mins to write the
> initial 200k docs on my system, bulk should knock that down to a few seconds
> * Set the content type in the put/post to 'application/json' - at the mo
> it is blank
> * Add a script to initialize/reset the kvbench db
>
> I'm not a Python dev - apols in advance.
>
> Nice work!
> Craig
>
> -----Original Message-----
> From: Dave Amies [mailto:[email protected]]
> Sent: 04 November 2015 12:50
> To: [email protected]
> Subject: Re: CouchDB / NoSQL Benchmarking
>
> Hi Garren,
>
> Thanks for your kind words.
>
> I will post each test result separately rather than one enormous email,
> Unfortunately I lost the logs from the Couch DB crash.
> I tried to reproduce it but instead the benchmark completed successfully,
> so good news in a way.
>
> Dave.
>
>
> On Tue, Oct 27, 2015 at 8:18 PM, Garren Smith <[email protected]> wrote:
>
> > Hi Dave,
> >
> > This is very cool. Do you have the results and the scripts you used to
> > benchmarch CouchDB?
> >
> > Cheers
> > Garren
> >
> > On Thu, Oct 22, 2015 at 3:13 PM, Dave Amies <[email protected]> wrote:
> >
> > > Hi All,
> > >
> > > I'm sure by now most of you will have read at least some parts of
> > > this
> > > guide:
> > >
> > > http://guide.couchdb.org/draft/performance.html
> > >
> > > I was reading it the other day and noticed the "Call to Arms"
> > > section at the bottom of the page. I don't know if there are already
> > > any
> > benchmarking
> > > tools out there, but I decided to try writing one. Hopefully the one
> > > I
> > have
> > > written will be useful.
> > >
> > > About my background, for my day job i am a performance tester,
> > > usually specialising in Loadrunner, so this project was something to
> > > keep my mind occupied while waiting for my test system to be
> > > rebuilt. Given this I
> > have
> > > only spent a few hours on it and so there is probably still room for
> > > improvement, this email is about finding out if there is interest or
> > > if this will be useful to the CouchDB community, so really should I
> > > continue developing this tool, or am I wasting my time?
> > >
> > > In designing this benchmarking utility I reflected on all the
> > > systems I have tested and tried to come up with some common areas
> > > where database systems suffer in performance. Then bearing in mind
> > > the fundamental differences between traditional databases and NoSQL
> > > databases
> > (particularly
> > > CouchDB) I tried to construct some some common database usage
> scenarios.
> > >
> > > The 3 scenarios I came up with are:
> > >
> > >    1. Write heavy (each user performs 12 writes, 6 reads and 3
> searches /
> > >    index queries)
> > >    2. Index / Query / Search heavy (each user performs 1 write, 2
> > > reads
> > and
> > >    6 searches / index queries)
> > >    3. Read Heavy (each user performs 1 writes, 10 reads and 3 searches
> /
> > >    index queries)
> > >
> > > I have tried out my benchmarking tool on a couple of machines so
> > > far, in these tests I managed to cause CouchDB to encounter the
> > > following
> > > situations:
> > >
> > >    1. Performance degradation due to being Disk IO bound
> > >    2. Performance degradation due to being Memory bound
> > >    3. Performance degradation due to being CPU bound
> > >    4. Couch DB crashed
> > >    5. Benchmarking completed successfully and produce a performance
> > > score
> > >
> > > Based on these results I believe I have created an effective tool
> > > for benchmarking, so I decided the best next step was to release the
> > > tool as
> > an
> > > open source project, so I created a github project which can be
> > > found
> > here:
> > > https://github.com/damies13/kvbench. Here you will the readme file
> > > describes the 3 scenarios in more detail, the benchmark definition
> > > or design and also the pre benchmark data priming. You will also
> > > find here
> > the
> > > python script that is the benchmarking tool and some instructions
> > > for setting up a couch db database for the benchmarking process.
> > >
> > > As this is getting long i'll wrap up by noting that I deliberately
> > > did
> > not
> > > use the python couchdb libraries but instead I used the requests
> > > library (standard http) and json library because I wanted to keep
> > > the code as generic as possible, the intention is that this
> > > benchmarking tool should
> > be
> > > able to be used to benchmarking any key / value store, whether that
> > > be a document based NoSQL, and Key Value based NoSQL database or
> > > some other
> > Rest
> > > API / engine (e.g. backed by a traditional database).
> > >
> > > I look forward to some feed back, hopefully I have created something
> > > useful.
> > >
> > > Sincerely,
> > >
> > > Dave.
> > >
> >
>

Re: CouchDB / NoSQL Benchmarking

Reply via email to