Re: [Vala] sqlheavy?

Evan Nemerson Mon, 02 Jul 2012 22:32:27 -0700

Jim,

Thanks, this is good information.

I'm going to reply in-line people who are interested, but here is the
TL;DR version (of this whole thread, really):  SQLHeavy currently has
issues with parallelism.  If you need parallel execution of queries,
SQLHeavy is currently as big of a pain as SQLite.  If you just want an
API for SQLite which easier to use, or if you just want to avoid
blocking the UI (the original use case for the async stuff), SQLHeavy
may be a good fit.

To anyone who is reading this at some point in the future:  if there is
an SQLHeavy >= 0.2 release, the issues with parallelism have been
resolved.

On Mon, 2012-07-02 at 20:07 -0700, Jim Nelson wrote:
> Let me explain the situation with Geary and SQLHeavy.  The ticket Eric
> referred to earlier does not fully describe our issues.  (Warning:
> this is *long*.)
> 
> First, I have to correct what Eric said earlier.  It's not that
> SQLHeavy isn't up-to-date, rather, the features Geary most heavily
> relies upon are not as well- or fully-implemented as we need.
> Specifically, the Geary engine is a fully asynchronous library
> designed to handle multiple requests at once.  Virtually all our
> interaction with SQLHeavy is through its asynchronous interfaces.
> That was the strongest motivation toward using it rather than coding
> directly to SQLite.  However, during development we discovered serious
> flaws in SQLHeavy's async implementation.

You're absolutely right about this, though I'd like to emphasize the
distinction between "async" and "parallel".  Geary's problems with
SQLHeavy come mostly from trying to use the async interfaces to achieve
parallelism, as opposed to simply not blocking the UI thread (which was
the original purpose of the async API).

Most of the issues below are actually related pretty closely under the
hood and the proper fix is really to separate the connection from the
database so you can transparently execute multiple queries, which
completely replaces the existing async support (as well as the
synchronous support).  Some of these issues are probably resolvable
within the existing code base, but it would take a lot of time and only
mostly work.  I would rather use that time to get the proper fix in
place.

For anything which is missing a response, just imagine I wrote something
like "The right solution for this issue relies on the parallel execution
stuff."  I think that actually covers most of this message.

> But Evan's right, as far as using it as a direct replacement for
> SQLite, it seems as full-featured as one would want it to be.  Perhaps
> the ORM stuff isn't up to snuff, as Evan said, so people wanting to
> adopt SQLHeavy should consider that.  I would also examine the bug
> database (found at http://code.google.com/p/sqlheavy/issues/list)
> before jumping in with both feet.

The ORM stuff is, AFAIK, fully functional.  It is a bit limited since I
don't actually have a lot of experience with ORM (and no experience that
I enjoyed)...  I had hoped people who actually liked ORM would have some
ideas about how to make it a bit more full-featured, but nobody has come
up with anything so far (and now even if they did I don't really have
time to implement them anymore).

What *is* a problem is the sqlheavy-gen-orm tool (which generates Vala
source code which uses the ORM stuff in the library).  Though it will
live on in the 0.1 branch for all eternity, I'm removing it in master
(which will become 0.2 when I'm done with the parallel execution stuff).

> What's motivating our decision to move away from SQLHeavy are the following:
> 
> * Each asynchronous operation creates and destroys a thread:
> http://code.google.com/p/sqlheavy/issues/detail?id=11
> 
> When I say operation, I don't mean transaction, I mean each
> *operation*.  Thus, if your QueryResult has 50 items in its set, 50
> threads are created and destroyed while iterating the result.  1000
> items, 1000 threads.  This is a major performance hit, especially at
> startup, as Geary does a lot of database work while connecting to the
> server.  It's painful to run Geary under gdb because of this problem.
> Every time I go into Geary to tune or optimize, this problem is in the
> mix affecting performance.
> 
> I reported this bug a year ago, almost to the day.
>
> * Async transactions aren't asynchronous:
> http://code.google.com/p/sqlheavy/issues/detail?id=12
> 
> This was almost a showstopper for us.  Without diving in too deep to
> the mechanics of this one, Geary uses transactions throughout its
> database layer to guarantee atomicity.  SQLHeavy's async transactions
> can't make that guarantee.  I didn't discover this until gobs of
> database code had already been written.  The only short-term solution
> I could find meant, in essence, there's a master lock on Geary's
> database that guarantees only one block of async code may perform
> operations at a time.  In other words, we're serializing database
> access.  It's an ugly hack, but I coded it thinking the problem would
> be solved shortly, at which time I could rip it out.

Actually SQLite is serializing access anyways, so from a performance
perspective it probably isn't a big deal.  That's how SQLite deals with
concurrency unless you create multiple Sqlite.Database objects.  See
http://www.sqlite.org/threadsafe.html

> This bug is also almost a year old.
> 
> * Cancelling an async operation doesn't cancel the scheduled thread:
> http://code.google.com/p/sqlheavy/issues/detail?id=17
> * Use IOError.CANCELLED: http://code.google.com/p/sqlheavy/issues/detail?id=18
> * QueryResult.finished doesn't respect Cancellable:
> http://code.google.com/p/sqlheavy/issues/detail?id=19
> * next_async() can segfault if cancelled:
> http://code.google.com/p/sqlheavy/issues/detail?id=20
> 
> These are related in an oblique way.  The first one hurts performance
> because there might be a lot of I/O against the database when the user
> switches Folders, causing all of it to be cancelled at once.  Because
> the scheduled threads are not cancelled, the fresh, relevant I/O the
> user has requested is queued up waiting for all those threads ahead of
> it to die.
> 
> The second bug is not as serious, but it means that callers to the
> Geary engine had to do special-case checking for two different types
> of exceptions rather than one (since CANCELLED is ok if the caller
> did, indeed, cancel the operation).  I had to work around this to
> convert SQLHeavy's INTERRUPTED error to CANCELLED.  This check has to
> happen throughout Geary's database code.

The reason I've been dragging my feet on this one is that using
CANCELLED would actually require an API break.  Since Vala has typed
exceptions, everything would have to be changed to "throws
SQLHeavy.Error, GLib.Error", and valac would start emitting warnings
everywhere for existing code about uncaught exceptions.

My current plan is to just get rid of typed exceptions and just throw
GLib.Error everywhere, like Bump is doing.  I just don't think it's an
API break is justified just for this, so I'm planning on combining it
with the parallel execution stuff.  This will actually be the biggest
part of the API break--almost everything else should just work.

> The third and fourth required checking Cancellable.is_cancelled()
> everywhere in our database code.  Again, a utility method that must be
> called from a lot of places.

I think these two are actually pretty easy to fix... I can take care of
them over the weekend if that would help you.

> These three problems required me to go into our database layer and add
> a lot of special checking to correct and work around them.  These were
> reported 6 to 8 months ago.
> 
> * VersionedDatabase upgrade scripts don't run:
> http://code.google.com/p/sqlheavy/issues/detail?id=21
> 
> This is one of those features I was really excited about, so I was
> pretty disappointed when it didn't work.  In Shotwell, we do all our
> database upgrades through manual code.  I thought it was a great idea
> to handle it all through SQL scripts.  But it looks like an internal
> SQLHeavy lock prevents the upgrade scripts from running, causing the
> app to hang at startup.
> 
> We worked around this by implementing our own upgrade path.
> 
> I want to make it clear: Evan is fulfilling a major need in the GNOME
> community with SQLHeavy.  The alternative (GNOME-DB) is unpalatable
> for an app that needs an in-proc database and knows it's going to use
> SQLite.  He's done a ton of work and I think it's a solid foundation
> for the future.

Thanks, but I disagree.  Once I have the new engine in place it will be
a solid foundation.  Right now it's just a thin wrapper on top of SQLite
(which was the original purpose of the library).

> I also don't think Evan's "responsible" to solve these bugs for us.
> He's doing this in his free time, he's not being paid, and he's
> provided quite a lot of usable code that's working for others.  I
> honestly wonder if anyone else out there is exercising the async code
> paths as hard as Geary.  The expectations on my end are somewhat low.
> I only mention the time we've waited on these bugs to emphasize that
> we're not being capricious.

The Yorba team has been very patient, and I know they don't expect me to
drop everything to work on this.  I certainly didn't mean to imply
anything else, and if I did I apologize.

I'm pretty sure nobody is exercising the async code paths as hard as
Geary (at least not for parallelism), and frankly you've pushed it well
past what it was originally intended to do, which explains the issues.
I would love to support these uses, and I have plans that will allow
SQLHeavy to do so, but the current code simply isn't capable of that.

> But we do have our requirements and deadlines and we need a proper
> solution for our issues.  The workarounds we've put in place make
> maintaining and expanding the database layer difficult and unnatural.
> Our local performance is severely under par.  That puts a drain on the
> application as a whole.
> 
> Evan suggested someone from Yorba help out with these problems.  I've
> looked at the async code quite a bit, and to be honest, I don't think
> it's one week worth of work on my part.  Evan's familiar with the
> code, perhaps he can do it in one week, I won't speak for him.  But it
> looks to me what's needed is a complete rewrite of the thread,
> locking, and maybe even connection model, from the ground up.  The
> problem is architectural and not in implementation.  My experience is
> that sort of task is best suited for the project lead, not an outside
> contributer coming in to scratch an itch.  It would be great if
> someone stepped up to help Evan, but I think that person needs to be
> dedicated to the task for quite a while.

Sorry, I didn't mean to suggest someone from Yorba should be helping out
with these issues.  They are clearly with SQLHeavy and not your
responsibility to fix, and I certainly don't expect you to be
responsible for the database/connection split (which is basically a
rewrite of the SQLHeavy innards).

I was merely suggesting that if the issue was about making releases and
you didn't feel I was responsive enough last time you asked for one then
I would be happy to allow one of the Yorba people to do that without my
intervention.

> I'll point out that I offered to patch the thread creation/destruction
> problem last year using a ThreadPool, but Evan himself said the whole
> thing needed a rewrite, so I backed away.  (Reviewing the code later,
> I realized the problem was more fundamental than simply adding a
> ThreadPool.)  Also, I did make code from Geary available to Evan for
> Bump.  I don't know how much of that made it in, but it's there for
> the taking.  In fact, the idea from Bump came from a private email
> discussion between me and Evan.

All true.  I didn't end up actually including any of that code in Bump
due to architectural differences but it really helped me flesh out the
API, as did those emails.  Furthermore, a lot of the things I'm working
on for SQLHeavy are a result of Geary pushing SQLHeavy to do more.  In
short, Yorba, and Jim in particular, have been tremendously helpful.

> I've already coded a thin wrapper around SQLite and built a simple
> thread/connection/transaction model for Geary's async needs.  That
> wasn't the hard part; I coded the wrapper in less than a day, the
> async stuff in about a day more.  The bulk of the work is moving the
> existing database layer to this new model.  This is in a private repo
> and hasn't been fully tested yet, but it should solve all the issues
> described above and make Geary's database layer easier to maintain.
>
> I know this is long-winded, I wanted to be complete.  There's been
> some discussion in the GNOME community lately about transparency in
> design decisions.  I realized I hadn't verbalized my thoughts about
> SQLHeavy on our Redmine ticket, so I'm doing it here.
>
> 
> -- Jim
> 
> On Mon, Jul 2, 2012 at 4:24 PM, Evan Nemerson <e...@coeus-group.com> wrote:
> > On Mon, 2012-07-02 at 11:22 -0700, Eric Gregory wrote:
> >> On Thu, Jun 21, 2012 at 9:44 AM, Jonas Kulla 
> >> <nyocu...@googlemail.com>wrote:
> >>
> >> > I assume it's pretty up to date.
> >> >
> >>
> >> Sadly that's not the case, which is why we're in the process of removing
> >> SQLHeavy from Geary.
> >
> > I don't really understand this.  First, based on
> > http://redmine.yorba.org/issues/5034 it sounds like that's not the
> > primary reason.  If it were the issue, it would be trivial to resolve.
> > There isn't really anything that needs to be done to keep SQLHeavy up to
> > date, since SQLite isn't exactly exploding with new features.  If you
> > just want new tarballs released more regularly that's really not a
> > problem--I released one last time you guys requested it and I can do it
> > again.  The other option is that I can just add someone from Yorba to
> > the list of maintainers and you're welcome to make releases whenever you
> > want.  Removing the sqlheavy-gen-orm utility would make this even more
> > simple (and likely unnecessary), since it would remove the dependency on
> > libvala.
> >
> > The text of that bug leads me to believe that the primary reason is that
> > "... it's not clear that it will ever support concurrent database access
> > from multiple threads...".  That's actually not completely true--you can
> > still create multiple Database objects pointed to the same file on disk
> > (just like in SQLite).  What is missing is a layer to have
> > SQLHeavy.Datbase automatically create multiple connections to a single
> > database in order to support concurrent queries transparently.  That's
> > not a matter of keeping things up to date, that's a *major* new feature
> > which has basically necessitated the creation of a whole new library
> > (http://code.google.com/p/bump) as well as a rewrite of the SQLHeavy
> > internals.
> >
> > I would be disappointed to see Geary move away from SQLHeavy, but if you
> > have an alternative which works I certainly can't fault you for
> > migrating to it.  I think I need about a week of full-time work in order
> > to finish the database/connection split, and since I'm very busy at work
> > these days (plus other open source stuff) I don't know when I'll have
> > that kind of time.  Until I do, SQLHeavy is in basically the same
> > situation as SQLite, which isn't a problem for the majority of users
> > (including the software I created SQLHeavy for in the first place).
> >
> >
> > -Evan
> >
> > _______________________________________________
> > vala-list mailing list
> > vala-list@gnome.org
> > https://mail.gnome.org/mailman/listinfo/vala-list

_______________________________________________
vala-list mailing list
vala-list@gnome.org
https://mail.gnome.org/mailman/listinfo/vala-list

Re: [Vala] sqlheavy?

Reply via email to