Nico Williams, on 11/26/2012 03:05 PM wrote:
Vlad,
You keep saying that programmers don't understand "barriers". You've
provided no evidence of this. Meanwhile memory barriers are generally
well understood, and every programmer I know understands that a
"barrier" is a synchronization primitive
On Mon, Nov 26, 2012 at 6:05 PM, Larry Brasfield
wrote:
> Nico Williams emitted:
>
>> You keep saying that programmers don't understand "barriers". You've
>> provided no evidence of this. Meanwhile memory barriers are generally
>> well understood, and every programmer I know understands that a
>
Nico Williams emitted:
You keep saying that programmers don't understand "barriers". You've
provided no evidence of this. Meanwhile memory barriers are generally
well understood, and every programmer I know understands that a
"barrier" is a synchronization primitive that says that all operation
Vlad,
You keep saying that programmers don't understand "barriers". You've
provided no evidence of this. Meanwhile memory barriers are generally
well understood, and every programmer I know understands that a
"barrier" is a synchronization primitive that says that all operations
of a certain typ
Vladislav Bolkhovitin, on 11/17/2012 12:02 AM wrote:
The easiest way to implement this fsync would involve three things:
1. Schedule writes for all dirty pages in the fs cache that belong to
the affected file, wait for the device to report success, issue a cache
flush to the device (or request or
> The more money you pay for your storage, the less likely this is to be an
> issue (high end SSD's, enterprise class arrays, etc don't have volatile write
> caches and most SAS drives perform reasonably well with the write cache
> disabled).
"Performance" without a write cache is a physical prop
Chris Friesen, on 11/15/2012 05:35 PM wrote:
The easiest way to implement this fsync would involve three things:
1. Schedule writes for all dirty pages in the fs cache that belong to
the affected file, wait for the device to report success, issue a cache
flush to the device (or request ordering
杨苏立 Yang Su Li, on 11/15/2012 11:14 AM wrote:
1. fsync actually does two things at the same time: ordering writes (in a
barrier-like manner), and forcing cached writes to disk. This makes it very
difficult to implement fsync efficiently.
Exactly!
However, logically they are two distinctive fu
David Lang, on 11/15/2012 07:07 AM wrote:
There's no such thing as "barrier". It is fully artificial abstraction. After
all, at the bottom of your stack, you will have to translate it either to cache
flush, or commands order enforcement, or both.
When people talk about barriers, they are talkin
On 11/16/2012 10:54 AM, Howard Chu wrote:
Ric Wheeler wrote:
On 11/16/2012 10:06 AM, Howard Chu wrote:
David Lang wrote:
barriers keep getting mentioned because they are a easy concept to understand.
"do this set of stuff before doing any of this other set of stuff, but I don't
care when any o
On 11/16/2012 10:06 AM, Howard Chu wrote:
David Lang wrote:
barriers keep getting mentioned because they are a easy concept to understand.
"do this set of stuff before doing any of this other set of stuff, but I don't
care when any of this gets done" and they fit well with the requirements of th
On Fri, 16 Nov 2012, Howard Chu wrote:
David Lang wrote:
barriers keep getting mentioned because they are a easy concept to
understand.
"do this set of stuff before doing any of this other set of stuff, but I
don't
care when any of this gets done" and they fit well with the requirements of
th
Ric Wheeler wrote:
On 11/16/2012 10:06 AM, Howard Chu wrote:
David Lang wrote:
barriers keep getting mentioned because they are a easy concept to understand.
"do this set of stuff before doing any of this other set of stuff, but I don't
care when any of this gets done" and they fit well with th
David Lang wrote:
barriers keep getting mentioned because they are a easy concept to understand.
"do this set of stuff before doing any of this other set of stuff, but I don't
care when any of this gets done" and they fit well with the requirements of the
users.
Users readily accept that if the
On 11/15/2012 11:06 AM, Ryan Johnson wrote:
The easiest way to implement this fsync would involve three things:
1. Schedule writes for all dirty pages in the fs cache that belong to
the affected file, wait for the device to report success, issue a cache
flush to the device (or request ordering c
On 14/11/2012 8:17 PM, Vladislav Bolkhovitin wrote:
Nico Williams, on 11/13/2012 02:13 PM wrote:
declaring groups of internally-unordered writes where the groups are
ordered with respect to each other... is practically the same as
barriers.
Which barriers? Barriers meaning cache flush or barri
On Thu, Nov 15, 2012 at 10:29 AM, Simon Slavin wrote:
>
> On 15 Nov 2012, at 4:14pm, 杨苏立 Yang Su Li wrote:
>
> > 1. fsync actually does two things at the same time: ordering writes (in a
> > barrier-like manner), and forcing cached writes to disk. This makes it
> very
> > difficult to implement
On 15 Nov 2012, at 4:14pm, 杨苏立 Yang Su Li wrote:
> 1. fsync actually does two things at the same time: ordering writes (in a
> barrier-like manner), and forcing cached writes to disk. This makes it very
> difficult to implement fsync efficiently. However, logically they are two
> distinctive fun
On Thu, Nov 15, 2012 at 6:07 AM, David Lang wrote:
> On Wed, 14 Nov 2012, Vladislav Bolkhovitin wrote:
>
> Nico Williams, on 11/13/2012 02:13 PM wrote:
>>
>>> declaring groups of internally-unordered writes where the groups are
>>> ordered with respect to each other... is practically the same as
Nico Williams, on 11/13/2012 02:13 PM wrote:
declaring groups of internally-unordered writes where the groups are
ordered with respect to each other... is practically the same as
barriers.
Which barriers? Barriers meaning cache flush or barriers meaning commands order,
or barriers meaning bot
Alan Cox, on 11/13/2012 12:40 PM wrote:
Barriers are pretty much universal as you need them for power off !
I'm afraid, no storage (drives, if you like this term more) at the moment
supports
barriers and, as far as I know the storage history, has never supported.
The ATA cache flush is a wr
On Wed, 14 Nov 2012, Vladislav Bolkhovitin wrote:
Nico Williams, on 11/13/2012 02:13 PM wrote:
declaring groups of internally-unordered writes where the groups are
ordered with respect to each other... is practically the same as
barriers.
Which barriers? Barriers meaning cache flush or barrie
On Tue, Nov 13, 2012 at 11:40 AM, Alan Cox wrote:
>> > Barriers are pretty much universal as you need them for power off !
>>
>> I'm afraid, no storage (drives, if you like this term more) at the moment
>> supports
>> barriers and, as far as I know the storage history, has never supported.
>
> Th
> > Barriers are pretty much universal as you need them for power off !
>
> I'm afraid, no storage (drives, if you like this term more) at the moment
> supports
> barriers and, as far as I know the storage history, has never supported.
The ATA cache flush is a write barrier, and given you have
杨苏立 Yang Su Li, on 11/10/2012 11:25 PM wrote:
SATA's Native Command
Queuing (NCQ) is not equivalent; this allows the drive to reorder
requests (in particular read requests) so they can be serviced more
efficiently, but it does *not* allow the OS to specify a partial,
relative ordering of reques
Richard Hipp, on 11/02/2012 08:24 AM wrote:
SQLite cares. SQLite is an in-process, transaction, zero-configuration
database that is estimated to be used by over 1 million distinct
applications and to be have over 2 billion deployments. SQLite uses
ordinary disk files in ordinary directories, of
Alan Cox, on 11/02/2012 08:33 AM wrote:
b) most drives will internally re-order requests anyway
They will but only as permitted by the commands queued, so you have some
control depending upon the interface capabilities.
c) cheap drives won't support barriers
Barriers are pretty muc
Howard Chu, on 11/01/2012 08:38 PM wrote:
Alan Cox wrote:
How about that recently preliminary infrastructure to send ORDERED commands
instead of queue draining was deleted from the kernel, because "there's no
difference where to drain the queue, on the kernel or the storage side"?
Send patche
On Fri, Oct 26, 2012 at 8:54 PM, Vladislav Bolkhovitin wrote:
>
> Theodore Ts'o, on 10/25/2012 01:14 AM wrote:
>
>> On Tue, Oct 23, 2012 at 03:53:11PM -0400, Vladislav Bolkhovitin wrote:
>>
>>> Yes, SCSI has full support for ordered/simple commands designed
>>> exactly for that task: to have stea
On Thu 2012-10-25 14:29:48, Theodore Ts'o wrote:
> On Thu, Oct 25, 2012 at 11:03:13AM -0700, da...@lang.hm wrote:
> > I agree, this is why I'm trying to figure out the recommended way to
> > do this without needing to do full commits.
> >
> > Since in most cases it's acceptable to loose the last f
On Mon, Nov 05, 2012 at 05:37:02PM -0500, Richard Hipp wrote:
>
> Per the docs: "Only the superuser or a process possessing the
> CAP_SYS_RESOURCE capability can set or clear this attribute." That
> prevents most applications that run SQLite from being able to take
> advantage of this, since mos
On Mon, Nov 5, 2012 at 5:04 PM, Theodore Ts'o wrote:
> On Mon, Nov 05, 2012 at 09:03:48PM +0100, Pavel Machek wrote:
> > > Well, using data journalling with ext3/4 may do what you want. If you
> > > don't do any fsync, the changes will get written every 5 seconds when
> > > the automatic journal
On Mon, Nov 05, 2012 at 09:03:48PM +0100, Pavel Machek wrote:
> > Well, using data journalling with ext3/4 may do what you want. If you
> > don't do any fsync, the changes will get written every 5 seconds when
> > the automatic journal sync happens (and sub-4k writes will also get
>
> Hmm. But th
> Isn't any type of kernel-side ordering an exercise in futility, since
>a) the kernel has no knowledge of the disk's actual geometry
>b) most drives will internally re-order requests anyway
They will but only as permitted by the commands queued, so you have some
control depending upon the
On Thu, Nov 1, 2012 at 8:38 PM, Howard Chu wrote:
> Alan Cox wrote:
>
>> How about that recently preliminary infrastructure to send ORDERED
>>> commands
>>> instead of queue draining was deleted from the kernel, because "there's
>>> no
>>> difference where to drain the queue, on the kernel or the
Alan Cox, on 10/31/2012 05:54 AM wrote:
I don't want to flame on this topic, but you are not right here. As far as I can
see, a big chunk of Linux storage and file system developers are/were employed
by
the "gold-plated storage" manufacturers, starting from FusionIO, SGI and Oracle.
You know,
Alan Cox wrote:
How about that recently preliminary infrastructure to send ORDERED commands
instead of queue draining was deleted from the kernel, because "there's no
difference where to drain the queue, on the kernel or the storage side"?
Send patches.
Isn't any type of kernel-side ordering
> How about that recently preliminary infrastructure to send ORDERED commands
> instead of queue draining was deleted from the kernel, because "there's no
> difference where to drain the queue, on the kernel or the storage side"?
Send patches.
Alan
__
On 30 Oct 2012, at 10:22pm, Vladislav Bolkhovitin wrote:
> I fully understand your position. But "affordable" and "useful" are
> completely orthogonal things. The "high end" features are very useful, if you
> want to get high performance. Then ones, who can afford them, will use them,
> which
Theodore Ts'o, on 10/27/2012 12:44 AM wrote:
On Fri, Oct 26, 2012 at 09:54:53PM -0400, Vladislav Bolkhovitin wrote:
What different in our positions is that you are considering storage
as something you can connect to your desktop, while in my view
storage is something, which stores data and serv
> I don't want to flame on this topic, but you are not right here. As far as I
> can
> see, a big chunk of Linux storage and file system developers are/were
> employed by
> the "gold-plated storage" manufacturers, starting from FusionIO, SGI and
> Oracle.
>
> You know, RedHat from recent time
Hmm, so sorry I didn't notice the cc'ing of the linux-kernel list,
resulting in so much additional traffic to sqlite-users, which I'll
drop in my replies to the linux-kernel list.
Nico
--
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlit
On Thu, Oct 25, 2012, Nico Williams wrote:
Incidentally, here's a single-file, bag of b-trees that uses a COW
format: MDB, which can be found in
git://git.openldap.org/openldap.git, in the mdb.master branch.
Complete docs, design notes, and benchmark results are available here:
http://highlan
Theodore Ts'o, on 10/25/2012 09:50 AM wrote:
Yeah I don't buy that. One, flash is still too expensive. Two,
the capital costs to build enough Silicon foundries to replace the
current production volume of HDD's is way too expensive for any
company to afford (the cloud providers are buying
Theodore Ts'o, on 10/25/2012 01:14 AM wrote:
On Tue, Oct 23, 2012 at 03:53:11PM -0400, Vladislav Bolkhovitin wrote:
Yes, SCSI has full support for ordered/simple commands designed
exactly for that task: to have steady flow of commands even in case
when some of them are ordered.
SCSI does,
Nico Williams, on 10/24/2012 05:17 PM wrote:
Yes, SCSI has full support for ordered/simple commands designed exactly for
that task: [...]
[...]
But historically for some reason Linux storage developers were stuck with
"barriers" concept, which is obviously not the same as ORDERED commands,
hen
On Fri, Oct 26, 2012 at 09:54:53PM -0400, Vladislav Bolkhovitin wrote:
> What different in our positions is that you are considering storage
> as something you can connect to your desktop, while in my view
> storage is something, which stores data and serves them the best
> possible way with the be
On Thu, Oct 25, 2012 at 11:03:13AM -0700, da...@lang.hm wrote:
> I agree, this is why I'm trying to figure out the recommended way to
> do this without needing to do full commits.
>
> Since in most cases it's acceptable to loose the last few chunks
> written, if we had some way of specifying order
On Thu, 25 Oct 2012, Theodore Ts'o wrote:
Or does rsyslog *really* need to issue an fsync after each log
message? Or could it batch updates so that every N seconds, it
flushes writes to the disk?
In part this depends on how paranoid the admin is. By default rsyslog
doesn't do fsyncs, but adm
On 25 Oct 2012, at 2:04am, da...@lang.hm wrote:
> But unless you are a filesystem, how can you make sure that the message data
> is written to file1 before you write the metadata about the message to file2?
Wait for long enough for the disk subsystem to clear its backlog of write
commands. A
> > Hopefully, eventually the storage developers will realize the value
> > behind ordered commands and learn corresponding SCSI facilities to
> > deal with them.
>
> Eventually, drive manufacturers will realize that trying to price
> guage people who want advanced features such as TCQ, DIF/DIX, i
On Wed, Oct 24, 2012 at 11:58:49PM -0700, da...@lang.hm wrote:
> The frustrating thing is that when people point out how things like
> sqlite are so horribly slow, the reply seems to be "well, that's
> what you get for doing so many fsyncs, don't do that", when there is
> a 'problem' like the KDE "
On Thu, Oct 25, 2012 at 02:03:25PM +0100, Alan Cox wrote:
>
> I doubt they care. The profit on high end features from the people who
> really need them I would bet far exceeds any other benefit of giving it to
> others. Welcome to capitalism 8)
Yes, but it's a question of pricing. If they had pr
On Thu, 25 Oct 2012, Theodore Ts'o wrote:
On Wed, Oct 24, 2012 at 03:03:00PM -0700, da...@lang.hm wrote:
Like what is being described for sqlite, loosing the tail end of the
messages is not a big problem under normal conditions. But there is
a need to be sure that what is there is complete up t
On Thu, 25 Oct 2012, Theodore Ts'o wrote:
On Thu, Oct 25, 2012 at 12:18:47AM -0500, Nico Williams wrote:
By trusting fsync(). And if you don't care about immediate Durability
you can run the fsync() in a background thread and mark the associated
transaction as completed in the next transactio
On Thu, Oct 25, 2012 at 12:18:47AM -0500, Nico Williams wrote:
>
> By trusting fsync(). And if you don't care about immediate Durability
> you can run the fsync() in a background thread and mark the associated
> transaction as completed in the next transaction to be written after
> the fsync() co
On Wed, Oct 24, 2012 at 03:03:00PM -0700, da...@lang.hm wrote:
> Like what is being described for sqlite, loosing the tail end of the
> messages is not a big problem under normal conditions. But there is
> a need to be sure that what is there is complete up to the point
> where it's lost.
>
> this
On Tue, Oct 23, 2012 at 03:53:11PM -0400, Vladislav Bolkhovitin wrote:
> Yes, SCSI has full support for ordered/simple commands designed
> exactly for that task: to have steady flow of commands even in case
> when some of them are ordered.
SCSI does, yes --- *if* the device actually implements
On Wed, 24 Oct 2012, Nico Williams wrote:
On Wed, Oct 24, 2012 at 5:03 PM, wrote:
I'm doing some work with rsyslog and it's disk-baded queues and there is a
similar issue there. The good news is that we can have a version that is
linux specific (rsyslog is used on other OSs, but there is an e
On Wed, 24 Oct 2012, Nico Williams wrote:
Before that happens, people will keep returning again and again with those
simple questions: why the queue must be flushed for any ordered operation?
Isn't is an obvious overkill?
That [cache flushing] is not what's being asked for here. Just a
light-
On Wed, Oct 24, 2012 at 8:04 PM, wrote:
> On Wed, 24 Oct 2012, Nico Williams wrote:
>> COW is "copy on write", which is actually a bit of a misnomer -- all
>> COW means is that blocks aren't over-written, instead new blocks are
>> written. In particular this means that inodes, indirect blocks, d
On Wed, Oct 24, 2012 at 7:17 PM, Simon Slavin wrote:
> A) fsync() doesn't work the way it's meant to on the majority of user
> platforms. It effectively does nothing. Here are typical notes for Windows
> Server and FreeBSD:
Many systems lie, that's true. For example: Virtual Box by default
l
On Wed, Oct 24, 2012 at 5:03 PM, wrote:
> I'm doing some work with rsyslog and it's disk-baded queues and there is a
> similar issue there. The good news is that we can have a version that is
> linux specific (rsyslog is used on other OSs, but there is an existing queue
> implementation that they
On 24 Oct 2012, at 10:17pm, Nico Williams wrote:
> That [cache flushing] is not what's being asked for here. Just a
> light-weight barrier. My proposal works without having to add new
> system calls: a) use a COW format, b) have background threads doing
> fsync()s, c) in each transaction's roo
On Tue, Oct 23, 2012 at 2:53 PM, Vladislav Bolkhovitin
wrote:
>> As most of the time the order we need do not involve too many blocks
>> (certainly a lot less than all the cached blocks in the system or in
>> the disk's cache), that topological order isn't likely to be very
>> complicated, and I i
杨苏立 Yang Su Li, on 10/11/2012 12:32 PM wrote:
I am not quite whether I should ask this question here, but in terms
of light weight barrier/fsync, could anyone tell me why the device
driver / OS provide the barrier interface other than some other
abstractions anyway? I am sorry if this sounds like
Richard Hipp writes:
>
> Fsync() is a very close approximation to a write barrier since (when it
> works as advertised) all pending I/O reaches persistent storage before the
> fsync() returns. And since no subsequent I/Os are issued until after the
> fsync() returns, the requirements above a cle
On Fri, Oct 12, 2012 at 5:14 PM, Simon Slavin wrote:
> I think I understand what you're asking for, but I see no point in being
> informed about D, because I can't see anything useful a program can do if the
> transaction gets marked 'complete' but D doesn't succeed. Either you see D
> as bein
On 12 Oct 2012, at 10:23pm, Nico Williams wrote:
> Here's some more examples of where delayed-D ACKs would be nice:
> distributed services. These are really just a variant of my earlier
> UI example, but still: a server might respond with an ACK as soon as a
> transaction completes with ACI and
On Fri, Oct 12, 2012 at 4:08 PM, Simon Slavin wrote:
> If all you're doing is showing something on a display that's fine. But if
> that's what you're doing I see no point in distinguishing between 'success'
> and 'durable'. As far as I can see your program has nothing to do between
> the two
On 12 Oct 2012, at 10:01pm, Nico Williams wrote:
> On Fri, Oct 12, 2012 at 3:53 PM, Simon Slavin wrote:
>> That's an interesting idea. I have a question. Suppose your program
>> received the 'success' result for a transaction and carried on to do other
>> transactions. Later you test to se
On Fri, Oct 12, 2012 at 3:53 PM, Simon Slavin wrote:
> That's an interesting idea. I have a question. Suppose your program
> received the 'success' result for a transaction and carried on to do other
> transactions. Later you test to see whether the transaction is durable and
> find that it
On 12 Oct 2012, at 6:00pm, Nico Williams wrote:
> I do think that applications should be able to request deferred
> durability *and* find out when a given transaction has indeed become
> durable.
>
> A distinction between success and durability in the API might bleed
> into UIs too. Imagine a
On Fri, Oct 12, 2012 at 2:58 AM, Dan Kennedy wrote:
> On 10/11/2012 11:38 PM, Nico Williams wrote:
>> There is something you can do: [...]
>
> SQLite WAL mode comes close to that if you run your checkpoints
> in the background. [...]
Right. WAL mode comes close to being a COW on-disk format.
>
r.kernel.org; d...@hwaci.com
> Subject: EXT :Re: [sqlite] light weight write barriers
>
> On Thu, Oct 11, 2012 at 11:32:27AM -0500, ? Yang Su Li wrote:
>> I am not quite whether I should ask this question here, but in terms
>> of light weight barrier/fsync, could anyone
x-ker...@vger.kernel.org; d...@hwaci.com
Subject: EXT :Re: [sqlite] light weight write barriers
On Thu, Oct 11, 2012 at 11:32:27AM -0500, ? Yang Su Li wrote:
> I am not quite whether I should ask this question here, but in terms
> of light weight barrier/fsync, could anyone tell me why
On Thu, Oct 11, 2012 at 11:32:27AM -0500, ? Yang Su Li wrote:
> I am not quite whether I should ask this question here, but in terms
> of light weight barrier/fsync, could anyone tell me why the device
> driver / OS provide the barrier interface other than some other
> abstractions anyway?
On 10/11/2012 11:38 PM, Nico Williams wrote:
On Wed, Oct 10, 2012 at 12:48 PM, Richard Hipp wrote:
Could you list the requirements of such a light weight barrier?
i.e. what would it need to do minimally, what's different from
fsync/fdatasync ?
For SQLite, the write barrier needs to involve tw
Lying hardware is a different problem. Richards was asking for something else.
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
On 11 Oct 2012, at 10:41pm, Nico Williams wrote:
> On Thu, Oct 11, 2012 at 11:59 AM, Simon Slavin wrote:
>> On 11 Oct 2012, at 5:38pm, Nico Williams wrote:
>>> There is something you can do: use a combination of COW on-disk
>>> formats in such a way that it's possible to detect partially-commi
On Thu, Oct 11, 2012 at 11:59 AM, Simon Slavin wrote:
> On 11 Oct 2012, at 5:38pm, Nico Williams wrote:
>> There is something you can do: use a combination of COW on-disk
>> formats in such a way that it's possible to detect partially-committed
>> transactions and rollback to the last good known
On 11 Oct 2012, at 5:38pm, Nico Williams wrote:
> There is something you can do: use a combination of COW on-disk
> formats in such a way that it's possible to detect partially-committed
> transactions and rollback to the last good known root
This is actually the problem, not the solution. Tra
To expand a bit, the on-disk format needs to allow the roots of N of
the last transactions to be/remain reachable at all times. At open
time you look for the latest transaction, verify that it has been
written[0] completely, then use it, else look for the preceding
transaction, verify it, and so o
On Wed, Oct 10, 2012 at 12:48 PM, Richard Hipp wrote:
>> Could you list the requirements of such a light weight barrier?
>> i.e. what would it need to do minimally, what's different from
>> fsync/fdatasync ?
>
> For SQLite, the write barrier needs to involve two separate inodes. The
> requirement
I am not quite whether I should ask this question here, but in terms
of light weight barrier/fsync, could anyone tell me why the device
driver / OS provide the barrier interface other than some other
abstractions anyway? I am sorry if this sounds like a stupid questions
or it has been discussed bef
Richard Hipp writes:
>
> We would really, really love to have some kind of write-barrier that is
> lighter than fsync(). If there is some method other than fsync() for
> forcing a write-barrier on Linux that we don't know about, please enlighten
> us.
Could you list the requirements of such a lig
On Wed, Oct 10, 2012 at 1:17 PM, Andi Kleen wrote:
> Richard Hipp writes:
> >
> > We would really, really love to have some kind of write-barrier that is
> > lighter than fsync(). If there is some method other than fsync() for
> > forcing a write-barrier on Linux that we don't know about, please
87 matches
Mail list logo