So, kind-of "let the store create the uid (range) and use it as nextuid"

In current impl, each mail persistence reads in a locking transaction the persisted "lastUid" (for jpa), consumes it (+1) and re-persists it. Each incoming mail queues for this thanks to the locking transaction, giving performance limitation.

But we don't want serial, but parallel processing...

If we let the individual store create the UID for each mail insert (different threads), I suppose we will still rely on a unique component responsible to give the UIDNEXT. This is why I introduced the notion of "cache", but it could be any other mechanism.

My point is that if store (db,...) creates the UID (range), it must be provided back to the imap client reading the mailbox in a managed way.

So, you told us how you see the UID generation with the auto_increment (for the database store for example). Could you sketch us how yo see the UIDNEXT management/generation for a mailbox, with parallel mails being inserted (and their UID being automatically generated by the store) ?

Tks,

Eric


On 17/08/2010 19:40, Norman Maurer wrote:
Well,

I think its not the generation of the UIDNEXT which is a performance
problem. Its how it is used atm. We currently use it as uid for the
next message which will get append to the mailbox. It would be more
performant to use an auto_increment column in jpa for example. Other
backends have other features which can match the uid generation stuff.

Bye,
Norman

2010/8/17 Tim-Christian Mundt<d...@tim-erwin.de>:
Eric,

you are right about the UIDVALIDITY, the default shouldn't be a random
number, but the current timestamp which would guarantee that it won't
occur again.
I also thought about checking whether the next uid would wrap the uid
counter to the negative - which would mean we need to regenerate the
uids and create a new UIDVALIDITY. However, it's just to improbable to
have more then 9 quintillion mails arriving in a mailbox.

UIDVALIDITY + UID as id or not simply means: the combination MUST never
refer to any other mail - ever. If for some reason you have to
regenerate the uids or do so by default, the UIDVALIDITY MUST change -
making the previous uids invalid. The "refer forever" part is in a
"STRONGLY ENCOURAGES" section.

Caching NEXTUIDS etc sounds really complicated for a simple matter. I'll
try to understand why this is such a performance blocker. It's just
reading or writing a value, isn't it? Later...

Regards
Tim

Am Dienstag, den 17.08.2010, 17:56 +0200 schrieb Eric Charles:
Hi Norman,

I've read the http://www.rfc-editor.org/rfc/rfc3501.txt (section 2.3.1
Message Numbers) and http://www.rfc-editor.org/rfc/rfc2683.txt (section
3.4.3. UIDs and UIDVALIDITY)

A first point is RFC talks about backend server not being able to store
the UIDs. In this case, the UID are to be regenerated each time, with a
different UIDVALIDITY, so a there are no risk to confuse mails. I had
also to read twice the sentence and associated explanation  "It seems to
be a common misunderstanding that "the UIDVALIDITY and the UID, taken
together, form a 64-bit identifier that uniquely identifies a message on
a server" ." However, it is said at another place :  "The combination of
mailbox name, UIDVALIDITY, and UID must refer to a single immutable
message on that server forever".

Finally, this may give as requirement that the store API should not
prevent to implement a store that wouldn't be capable of storing UIDs
(seems strange, but considered at numerous places in RFCs).
I think the current store API already allows that ?

On the UIDVALIDITY, it is now generated for example in JPA with a
Math.abs(RANDOM.nextInt()). RFC states: "A good UIDVALIDITY value to use
in this case is a 32-bit representation of the creation date/time of the
mailbox".
It seems reasonable to provide utility methods such as existing
randomUidValidity() to the store impl, each store having the freedom to
use it or not.
(was just wondering what is the difference between the imap-mailbox and
imap-store projects - not always obvious at first sight to define the
responsibilities of each).

Coming to the UIDNEXT and as you pointed, I also understand that the
returned UIDNEXT value has nothing to do with the UID that will be given
to the next coming message. That value needs however to be equals or higher.
I suppose the idea would be to have per mailbox a cache in memory. That
cache would be used to return the UIDNEXT (that would be the current
cache value), but also to assign the UID for coming mails (cache+1).
I am wondering how we can ensure in case of abrupt shutdown that the
last value of the cache be stored.
If we can't ensure that, and this will be probably the case, there we an
have a strategy to init the cache with a value recomputed from the all
the stored UID (something like "give me the highest value from all the
UID of that mailbox).
This would need an initial step when the cache is not initialized but
should not be a penality for a JPA store, even for mailbox with many
mails. Don't know for the other stores (jcr, maildir,...) ?
Should we care on the cache time-to-live? If we don't care about that,
we will have a growing memory,  even if for each mailbox, we only need
an Integer (that wouldn't represent much KB). But there is also the
possibility to define a ttl of a few hours, with a scheduled cache
manager that would cleanup things.

Tks,

Eric


On 15/08/2010 17:25, Norman Maurer wrote:
Hi there,

After looking a bit over the store api again the last days I think
there is some room for improvements. This improvements will break the
api (again), so I think we should do it now and after that cut the 0.1
release. I will try to explain you why I think there should be some
improvements made and whats my point of view. Please feel free to
comment ..

NEXTUID (IMAP-193):
The NEXTUID generation / house-keeping is just a big performance
killer. We really guaranteer to use the value of NEXTUID for the next
message which will get saved. Thats not needed. We just need to
guaranteer its equal or greater then the value returned by NEXTUID. So
its prolly more performant to just hold the informations in memory and
update it every x writes (or something like that). So the
implementation could use an auto-increment field to generate the
unique uid when storing the message or just an AtomicInteger for
generation. Maybe again with a new abstract class called UIDKeeper ?

Does this sound like something which make sense ?

---------------------------------------------------------------------
To unsubscribe, e-mail: server-dev-unsubscr...@james.apache.org
For additional commands, e-mail: server-dev-h...@james.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: server-dev-unsubscr...@james.apache.org
For additional commands, e-mail: server-dev-h...@james.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: server-dev-unsubscr...@james.apache.org
For additional commands, e-mail: server-dev-h...@james.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: server-dev-unsubscr...@james.apache.org
For additional commands, e-mail: server-dev-h...@james.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: server-dev-unsubscr...@james.apache.org
For additional commands, e-mail: server-dev-h...@james.apache.org

Reply via email to