Would a distributed im-memory backend such as hazelcast be an option to start?

http://docs.hazelcast.org/docs/3.5/manual/html/licenses.html

On 2015-10-26 14:51, Benoit Tellier wrote:


Le 26/10/2015 14:08, Matthieu Baechler a écrit :


On 26/10/2015 13:53, Benoit Tellier wrote:

[...]

I think we could create a Session row that map MESSAGE SEQUENCE NUMBER
to the real messages. Then, for a given session, we never remap things,
we only add messages to this row. In case of EXPUNGE, we create a new
row that won't be used until a new session is open.

Message Sequence Number are defined as mutable by the RFC. Having them
being immutable look like non RFC compliant to me.

I mean, the idea is nice, but client using MESSAGE SEQUENCE NUMBERS
won't be designed to work this way, I think.

https://tools.ietf.org/html/rfc3501#section-2.3.1.2

Let's quote it :

    Message sequence numbers can be reassigned during the session.  For
    example, when a message is permanently removed (expunged) from the
    mailbox, the message sequence number for all subsequent messages is
    decremented.  The number of messages in the mailbox is also
    decremented.  Similarly, a new message can be assigned a message
    sequence number that was once held by some other message prior to an
    expunge.

The only way to reassign Message sequence numbers is EXPUNGED messages.

I agree on your last sentence : "The only way to reassign Message
sequence numbers is EXPUNGED messages."

The criticism was on this sentence : "In case of EXPUNGE, we create a
new row that won't be used until a new session is open."

It seems opposed to the RFC, where, from my understanding, the client
can assume the message sequence number for all subsequent messages to be
decremented right after his request.



That way, a client can't use a MESSAGE SEQUENCE NUMBER wrongly because
it always maps to a given message.

With the model you suggest, I understand that a Session Row stops from
being used only when a clients is disconnected. It represents his view
of the correspondence on his mailbox selection moment. He use it until
he is disconnected.

Is this what you suggest ?

Somehow.

Do you get it ?


Hence, as it is immutable, he can not address added messages after this
selection using MESSAGE SEQUENCE NUMBERS as these changes will be
visible upon re-selection.

Look like a problem to me...

That's not correct : you can add messages to this view.

You are right. I thought, from what you explained that message were only
added to the future session.


Also, you have data races unless you manage APPEND and EXPUNGE
operations in atomic way. This means MESSAGE SEQUENCE NUMBERS should be
included in the process. This means, to have a data race free
implementation, we need the mailbox manager to handle MESSAGE SEQUENCE
NUMBER consistency, and not the protocol part, that did it threw the
event system. Do we agree on this ?

What race condition do you think about ? I mean, any mapping should be
ok, it can even be generated asynchronously as long as you assign an
existing mapping to a given session.

I meant this.

Todays implementation works in two steps :

  1 : Update the database
  2 : From the event generated from the database, update the mapping.

Consider the following mapping :

UID | 2 | 4 | 6 | 8 |
---------------------
MSN | 1 | 2 | 3 | 4 |

Bobs issue these commands, in this order :

Command 1 : EXPUNGE 1
Command 2 : EXPUNGE 2

The expected result is :

UID | 4 | 8 |
-------------
MSN | 1 | 2 |

Because the client can assume that after executing command 1 mapping
will be :

UID | 4 | 6 | 8 |
-----------------
MSN | 1 | 2 | 3 |

Now, because the mapping update and the EXPUNGE completion are
dissociated, mapping update might not have been done (delay in event
delivery, a long MailboxListener preceding the SelectedMailboxImpl,
etc... )

Hence the result might look like :

UID | 6 | 8 |
-------------
MSN | 1 | 2 |

We have deleted the wrong message.

Now, reading this : https://tools.ietf.org/html/rfc3501#section-5.5

You might argue that these two operations are ambiguous...

However, we have two possible outcomes out of this commands sequence
depending on the timing. This is what I call a data race.

If you want, I can write an MPT test for this.



--------------------

A friend (Erwan Guyomarc'h), who is working on James with me, suggested
an other solution. It has its problems, but I put it here as it is
interesting...

Suppose we have a at least one delivery distributed event system.

The idea is, instead of maintaining a double index between UID and
Message Sequence Number, we only store on each SelectedMailboxImpl an
uid set.

ADDED events is an insertion on this set
EXPUNGE events is a deletion on this set

Of course these operations are not CRDT.

The problem comes from ADDED operation being reordered after EXPUNGED
operations.

ADDEDD 7
EXPUNGED 7

is not equivalent to

EXPUNGED 7
ADDED 7

Now assume we have a causal order.
We also have the certitude to see only one ADDED event per UID.
If we reject EXPUNGE commands if the specified uid is absent from the
SelecteMailboxImpl uid set.

We might have concurrent problems, but with this we have an eventual
consistent MESSAGE SEQUENCE NUMBERS <=> UID correspondance across
servers. Of course with this solution the difficulty is to have causal
ordering. Which means vector clocks...


It means creating the set for every selected mailbox : it looks like a
performance killer, don't you think ?


---------------------------------------------------------------------
To unsubscribe, e-mail: server-dev-unsubscr...@james.apache.org
For additional commands, e-mail: server-dev-h...@james.apache.org


--
Eric Charles http://datalayer.io

---------------------------------------------------------------------
To unsubscribe, e-mail: server-dev-unsubscr...@james.apache.org
For additional commands, e-mail: server-dev-h...@james.apache.org

Reply via email to