Hi Benoit,

See my comments below.

On 26/10/2015 01:11, Tellier Benoit wrote:

[...]

Well, I have troubles to see how to make this work in a distributed
system. Message systems do not offer perfect guaranties and we might get
a lot of troubles in case of network partitions. Double event delivery,
or no event delivery at all might arise. It is not that bad with IDLE,
but can lead SelectedMailboxImpl to be inconsistent.

You can actually choose between double delivery and lost messages : if you go for ACK (synchronous delivery), you can have double delivery for example, but no message lost. It introduces latency but you can be sure you won't lost any message.

I guess we have
several options :

# Go stateless on this (note : only for distributed implementation)

## Option 1
      - We can recompute the correspondence between UID and Message
Sequence Number each time a message sequence number is used. It might
cost compute and network resources. But no node stores specific data. We
can imagine give a configuration option, that gives the choice between
the two options.

Depending on the implementation, we get different trade-off :
Option 1 a)
     On a read request using Message Sequence Number, select all the
data (all the message content) from the database, and select the message
we want. It is consistent but highly ineffective.

It's not even consistent in my opinion

Note : this works for
Reads, but not deletions...
Option 1 b)
     On a read request we first fetch the mailbox to have informations
about the UID to be used, and then gets the message data. But due to
delay between read and write, our information can become inconsistent.
Thus we can do some serious damages (eg : delete the wrong message)

## Option 2
      - Store the message sequence number and update it. To do this in a
consistent way, we need a CP data store with the notion of transaction
and attach the Message Sequence Number directly to the Message, stored
in the database.

Cassandra is either CP or AP with configurable consistency and you don't always need transaction, sometimes atomicity is enough (and you have it on a single row).

It works, we have ineffective queries like **UPDATE
messages SET seq_number = seq_number - 1 WHERE seq_number > deleted AND
mailboxId=159**. We might have to handle transaction that fails to commit.
Option 2 is still dangerous on databases that lacks the notion of
transaction. For example a process can crash before updating the
sequence number. On Cassandra and other AP data stores, we have
consistencies problems on concurrent updates of sequence number (might
lead to a wrong result, and even messages having the same sequence number).

Other note : adding a message requires to know the last UID used for a
mailbox that stills correspond to an existing message. Here we have two
options : store it or recompute it. In both case we have troubles
without the notion of transactions.

I think you can achieve what you would do with transactions based on row atomicity. There are two design tricks you can leverage :

1. model things with immutability in mind. You can create a row that contains all indices and then change the Mailbox entry to point to this new row (that gives you a commit semantic).

2. use wide rows to get atomicity on a list of things.

[ ... lot of ideas ... ]

I have troubles thinking about any implementation that would work : even with an IDLE channel open, how an IMAP client is supposed to handle message deletion AT ALL ? I mean, I want to delete message 12 but message 11 is deleted at the same time from another user or device : there's no way to be sure what is message 12 from the client point of view.

Maybe just every single IMAP client uses UID for these reasons ?

--
Matthieu Baechler

---------------------------------------------------------------------
To unsubscribe, e-mail: server-dev-unsubscr...@james.apache.org
For additional commands, e-mail: server-dev-h...@james.apache.org

Reply via email to