Re: Uid Generation Strategy

Eric Charles Mon, 17 May 2010 12:28:58 -0700

Hi Norman,

As discussed, it may be worth to reread the RFC requirements:
I've copy/paste the RFC paragraph here after.

Can we deduce from that that the uid must be incremented by one for eachmail arriving in a mailbox?

Tks,
Eric


2.3.1.1.        Unique Identifier (UID) Message Attribute

   A 32-bit value assigned to each message, which when used with the
   unique identifier validity value (see below) forms a 64-bit value
   that MUST NOT refer to any other message in the mailbox or any
   subsequent mailbox with the same name forever.  Unique identifiers
   are assigned in a strictly ascending fashion in the mailbox; as each
   message is added to the mailbox it is assigned a higher UID than the
   message(s) which were added previously.  Unlike message sequence
   numbers, unique identifiers are not necessarily contiguous.

   The unique identifier of a message MUST NOT change during the
   session, and SHOULD NOT change between sessions.  Any change of
   unique identifiers between sessions MUST be detectable using the
   UIDVALIDITY mechanism discussed below.  Persistent unique identifiers
   are required for a client to resynchronize its state from a previous
   session with the server (e.g., disconnected or offline access
   clients); this is discussed further in [IMAP-DISC].

   Associated with every mailbox are two values which aid in unique
   identifier handling: the next unique identifier value and the unique
   identifier validity value.

   The next unique identifier value is the predicted value that will be
   assigned to a new message in the mailbox.  Unless the unique
   identifier validity also changes (see below), the next unique
   identifier value MUST have the following two characteristics.  First,
   the next unique identifier value MUST NOT change unless new messages
   are added to the mailbox; and second, the next unique identifier
   value MUST change whenever new messages are added to the mailbox,
   even if those new messages are subsequently expunged.

        Note: The next unique identifier value is intended to
        provide a means for a client to determine whether any
        messages have been delivered to the mailbox since the
        previous time it checked this value.  It is not intended to
        provide any guarantee that any message will have this
        unique identifier.  A client can only assume, at the time
        that it obtains the next unique identifier value, that
        messages arriving after that time will have a UID greater
        than or equal to that value.

   The unique identifier validity value is sent in a UIDVALIDITY
   response code in an OK untagged response at mailbox selection time.
   If unique identifiers from an earlier session fail to persist in this
   session, the unique identifier validity value MUST be greater than
   the one used in the earlier session.

        Note: Ideally, unique identifiers SHOULD persist at all
        times.  Although this specification recognizes that failure
        to persist can be unavoidable in certain server
        environments, it STRONGLY ENCOURAGES message store
        implementation techniques that avoid this problem.  For
        example:

         1) Unique identifiers MUST be strictly ascending in the
            mailbox at all times.  If the physical message store is
            re-ordered by a non-IMAP agent, this requires that the
            unique identifiers in the mailbox be regenerated, since
            the former unique identifiers are no longer strictly
            ascending as a result of the re-ordering.

         2) If the message store has no mechanism to store unique
            identifiers, it must regenerate unique identifiers at
            each session, and each session must have a unique
            UIDVALIDITY value.

         3) If the mailbox is deleted and a new mailbox with the
            same name is created at a later date, the server must
            either keep track of unique identifiers from the
            previous instance of the mailbox, or it must assign a
            new UIDVALIDITY value to the new instance of the
            mailbox.  A good UIDVALIDITY value to use in this case
            is a 32-bit representation of the creation date/time of
            the mailbox.  It is alright to use a constant such as
            1, but only if it guaranteed that unique identifiers
            will never be reused, even in the case of a mailbox
            being deleted (or renamed) and a new mailbox by the
            same name created at some future time.

         4) The combination of mailbox name, UIDVALIDITY, and UID
            must refer to a single immutable message on that server
            forever.  In particular, the internal date, [RFC-2822  
<http://www.faqs.org/rfcs/rfc2822.html>]
            size, envelope, body structure, and message texts
            (RFC822  <http://www.faqs.org/rfcs/rfc822.html>,RFC822  
<http://www.faqs.org/rfcs/rfc822.html>.HEADER,RFC822  
<http://www.faqs.org/rfcs/rfc822.html>.TEXT, and all BODY[...]
            fetch data items) must never change.  This does not
            include message numbers, nor does it include attributes
            that can be set by a STORE command (e.g., FLAGS).

2.3.1.2.        Message Sequence Number Message Attribute

   A relative position from 1 to the number of messages in the mailbox.
   This position MUST be ordered by ascending unique identifier.  As
   each new message is added, it is assigned a message sequence number
   that is 1 higher than the number of messages in the mailbox before
   that new message was added.

   Message sequence numbers can be reassigned during the session.  For
   example, when a message is permanently removed (expunged) from the
   mailbox, the message sequence number for all subsequent messages is
   decremented.  The number of messages in the mailbox is also
   decremented.  Similarly, a new message can be assigned a message
   sequence number that was once held by some other message prior to an
   expunge.

   In addition to accessing messages by relative position in the
   mailbox, message sequence numbers can be used in mathematical
   calculations.  For example, if an untagged "11 EXISTS" is received,
   and previously an untagged "8 EXISTS" was received, three new
   messages have arrived with message sequence numbers of 9, 10, and 11.
   Another example, if message 287 in a 523 message mailbox has UID
   12345, there are exactly 286 messages which have lesser UIDs and 236
   messages which have greater UIDs.



Read more: http://www.faqs.org/rfcs/rfc3501.html#ixzz0oDUA57mw


Read more: http://www.faqs.org/rfcs/rfc3501.html#ixzz0oDTtT7f8


On 05/12/2010 07:46 AM, Norman Maurer wrote:

Hi Eric,

thx to follow up on this. Comments inside

2010/5/11 Eric Charles<[email protected]>:

Hi,

Section 2.3.1.1 of IMAP RFC (http://www.faqs.org/rfcs/rfc3501.html) states
that Unique identifiers MUST be strictly ascending in the mailbox at all
times.
This is currently enforced in
org.apache.james.imap.store.mail.model.Mailbox#consumeUid() for the JPA
store.

A JPAStressTest has been setup by Norman to verify that the uid were
correctly generated.
This showed that parallel threads could give issues in critical section of
org.apache.james.imap.jpa.JPAMailbox#reserveNextUid(mailboxSession) (the
section between the transaction begin and commit).

Which brings me to a JCRStressTest which fails :/

After chatting and patching with Norman, at least 4 strategies are
identified:

1. Use locking specific mecanism of each store. This is what is actually
implemented for database store (via LockModeType.PESSIMISTIC_WRITE and
adequate timeout) on JPAMaibox. The JCR store would need something similar.
Simply document the reserveNextUid to indicate that a locking mecanism
should be implemented. Currently, this solution works for JPA and uid are
correctly generated.

Going this way would allow to use the "most" performant solation per
implementation. This would also allow us to support clustering in JCR
etc. That would not be possible when using a "JVM lock".

Locking in JCR is an other problem, but I will do some more research
first before go into the details

2. Implement some ReentrantLock (or equivalent synchronization) an
abstraction level higher. All specific implementations would benefit from
this mecanism. Lock would be for all maiboxes.

See above...

3. Change in a way the api (change
org.apache.james.imap.mailbox.Mailbox#appendMessage signature for example)
to oblige each implementation to have a threadsafe way of generating the
uid.

What is different here from what we have atm in the abstract method of
StoreMailbox ?

4. Design and implement a more evolved solution that would, per mailbox,
maintain the lastuid and queue all uid generation request per mailbox.

Could you give me some more details about how you think this could be
done ? At the moment I think about adding an interface called
UidConsumer. Which only has one method like:

long reserveNextUid(Mailbox mailbox, MailboxSession session);

The instance of the Consumer would get instances in the
StoreMailboxManager and then passed to the StoreMailbox in the
constructor. So it would be easy for developers to provide their own
strategy.

So the question is "to which strategy should be go" ?

Eric

Bye,
Norman

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Re: Uid Generation Strategy

Reply via email to