New Design Doc - User-data Wavelets and Supplements

David Wang Thu, 02 Dec 2010 19:39:05 -0800

Just thought to share the new design doc on UDW David Hearnden wrote:
https://sites.google.com/a/waveprotocol.org/wave-protocol/protocol/design-proposals/udw


The contents are copied below:

User-data Wavelets and Supplements

   Author: David Hearden ([email protected])
   Date: 3-Dec-2010

   A wave is a container that holds a collection of wavelets. In any wave,
   state that is specific to a particular user is stored in a wavelet called
   the user-data wavelet (UDW) which only has that user as the participant.
   This state tracks information like what parts of the wave that user has
   read, private gadget state, and thread collapse state. The model that
   describes this collection of information is called the supplement.

   Because the current code is sourced from existing Google Wave code, there
   are many more pieces of information stored in this wavelet that is not used
   by WIAB right now. e.g. folder allocations, indexing state (following and
   archive), seen wavelets, etc. However, we will still describe them here as
   they may be used by WIAB in the future.

   There are three layers to the supplement model:
   - the primitive data - just describes the data structures that are used
      to hold the supplement data. Think of it as simple getter and setters.
      - the semantic, context-free model - semantics of that data in terms
      of conversation-specific queries and actions (like marking blips as read,
      where it ensures read versions only increase). It is
context-free in that it
      has no implicit knowledge of any wave data - that information must be
      supplied to each method. i.e. it works with blip ids and not blip objects.
      - the semantic, contextual model - binds the context-free supplement
      with a particular wave, in order to contextualise the supplement
queries and
      actions to a particular wave (this is essentially just currying).

   UDW ID
   The user-data wavelet in a wave (domain = D1, id = I) for a particular
   user (domain = D2, id = U) has the id use...@d2. e.g.

   wave://acmewave.com/w+ABCDEFG/[email protected]

   is the UDW for bob from initech.com on the wave w+ABCDEFG at acmewave.com.
   This id construction is defined by:

   wave.model.id.IdGenerator.newUserDataWaveletId(String address)

   The supplement model is agnostic about the lifecycle of the UDW wavelet;
   it is not specified how, when, or if it gets created.

   Primitive layer
   The data model for the supplement is built using wave's toolkit for
   embedding concurrent data types in wavelets. This lets the data be described
   using far more appropriate data types than annotated XML, and the concurrent
   toolkit takes care of embedding those types in XML in a manner that works
   well with concurrency and operational transformation. Example scenarios of
   concurrent writes on a user's UDW are:
   - use by two clients simultaneously;
      - use by a client and a server (e.g., marking blips as read in a
      client while a server is applying a filter or a remote mark-as-read
      request); and
      - bringing back online a client that read content while offline, where
      that content has been marked as read in the interim.
   This section describes the data model in terms of these abstract types,
   with some XML snippets to show their concrete embeddings.

   Read state
   Read state is a map of wavelet ids to 4-tuples:

   WaveletReadState = Tuple (
   blips: MonotonicMap [ String -> Int ],
   participants: MonotonicValue [ Int ],
   tags: MonotonicValue [ Int ],
   wavelet: MonotonicValue [ Int ]
   )

   ReadState = Map [ String -> WaveletReadState ]


   This structure tracks, for each conversation in a wave, the last-read
   version of each blip, the last-read version of that conversation's
   participants, the last-read version of that conversation's tags collection
   (tags are a Google Wave feature not currently supported in Wave-In-A-Box),
   and the last-read version of the conversation as a whole. The monotonicity
   of the data types MonotonicValue and MonotonicMap just mean that values are
   only allowed to increase, and that the resolution of concurrent writes is
   the maximum value (rather than last-one-wins or any other resolution
   strategy).

   For example, the following read state describes two conversations in a
   wave (a root conversation and a private reply). Three blips in the root
   conversation are read at versions 104, 230 and 250, the entire root
   conversation was marked as read at version 157, and the participants
   collection read at version 7. The private conversation conv+d0jfkt has
   similar read state.

   {
   "acmewave.com!conv+root" → {
   blips: {
   "b+lurL8WUGA" → 104,
   "b+rAUrAGUGA" → 230,
   "b+fF872cQNK" → 250,
   },
   participants: 7,
   wavelet: 157
   ),
   "acmewave.com!conv+d0jfkt" → (
   blips: {
   "b+aA8LAcUGK" → 35,
   "b+cuU78GQGA" → 1440,
   }
   )
   }


   In XML, this is embedded at the document root of the m/read document in
   the UDW (the supplement data model uses an m/ prefix on all the documents
   it uses in the UDW. The use of this prefix was once, but is no longer,
   necessary; it is legacy and is still used only for compatibility with old
   data). The read state for each wavelet is a top-level element, and the four
   structures of the blip-map, participant value, tag value, and wavelet value,
   are all superimposed inside those top-level elements:

   <wavelet i="acmewave.com!conv+root">
   <blip i="b+lurL8WUGA" v="104"/>
   <blip i="b+rAUrAGUGA" v="230"/>
   <blip i="b+fF872cQNK" v="250"/>
   <participants v="7"/>
   <all v="157"/>
   </wavelet>

   <wavelet 
i="acmewave.com<http://www.google.com/url?sa=D&q=http%3A%2F%2Facmewave.com>
   !conv+dOjfkt">

   <blip i="b+aA8LAcUGK" v="35"/>
   <blip i="b+cuU78GQGA" v="1440"/>
   </wavelet>


   Thread state
   Thread state is similar to read state: it maps, for each conversation,
   thread ids to their presentation state. Currently this is just collapsed or
   expanded, but it is open to extension for other thread states in the future
   (like summarised, or partially expanded).

   WaveletThreadReadState = Map [ String -> (COLLAPSED | EXPANDED) ]
   ThreadState = Map [ String -> WaveletThreadState ]


   Gadget state
   Map [ String -> Map [ String -> String ] ]

   Gadget state is simply a map of gadget id to a key-value map. This allows
   gadgets to record private, per-user information as key/value pairs. The
   Gadget doodad exposes this key/value pair map as part of the Wave Gadget
   API.

   Google-Wave state
   These fields are leftover from use in the Google Wave product, and are
   not currently used by Wave In A Box. The following parts may be of use in
   the future for Wave In A Box.

   - Folders: Set [ Int ]this is just a set of folder ids, embedded in
      m/folder.
      - Indexing: Tuple ( archive: MonotonicMap [ String -> Int ],
      following: Boolean )this tracks the versions at which conversations
      have been archived, and an optional bit specifying whether this wave is
      being followed or not (delivered to the user's Inbox when changes occur
      since archived). These are embedded in m/archive and m/muted.
      - Seen: Tuple( seen: MonotonicMap [ String -> HashedVersion ],
      notified: MonotonicMap [ String -> Int ], pending : Boolean )The
      'seen' map tracks the unforgeable/signed version at which a user
has 'seen'
      conversations in a wave. Actions interpreted as 'seeing' include
performing
      some action on the wave (like marking as read or moving to a folder),
      regardless of whether it has actually been rendered. Seen
versions are used
      as a proof that a user has access to a particular conversation at that
      version, in order to provide them access to those versions for
all time in
      case they later lose access due to being dropped as a participant. The
      'notified' and 'pending' parts are for external gateway
notifications (e.g.,
      like email), and track when notifications have been sent, and whether
      further notifications are needed. This structure has race conditions.

   The rest of the state in the supplement (m/abuse, m/cleared) is obsolete
   and of no interest.

   Liveness
   All the structures in the data model are live and broadcast events when
   they change. This capability comes for free from the concurrent toolkit.

   Code
   The primitive data model is defined in PrimitiveSupplement, with the
   observable extension in ObservablePrimitiveSupplement. There are two
   canonical implementations: one embeds its state in a wavelet
   (WaveletBasedSupplement), and the other in POJO structures
   (PrimitiveSupplementImpl). The pojo version is used for testing,
   snapshotting supplement state for faster server-side processing, and is also
   used as a fake persistence layer when UDWs are not present and creating them
   on demand is not desirable.

   Context-free semantic layer
   The context-free semantic layer defines the actions and queries that the
   supplement model provides, defines in terms of the primitive data model and
   input wave state. For example, it defines the readness of a blip as:

   a blip is unread if, and only if
   - the read-version for that blip either does not exist or is less than
      the blip's last-modified version; and
      - the wavelet-override version either does not exist or is less than
      the blip's last-modified version.

   The signature of that query is:

   boolean isBlipUnread(WaveletId waveletId, String blipId, int
   blipVersion);

   Note that all the relevant wave state for this query (wavelet id, blip
   id, and blip version) is input explicitly, which is why this layer is
   context-free. As well as queries, this layer also defines actions like
   marking blips or wavelets as read, marking threads as expanded/collapsed,
   etc.

   CodeInterfaces: {Readable,Writeable,Observable}Supplement
   Implementations: SupplementImpl

   Contextual semantic layer
   The contextual semantic layer associates the context-free supplement
   object with a particular wave, in order to curry out all the wave state
   parameters from its queries and actions. For example, the signature of the
   blip read/unread query simply becomes:
   boolean isBlipUnread(ConversationBlip blip);
   The view of conversation state that this layer requires in order to
   contextualize the supplement is defined in SupplementWaveView. This
   interface is more restrictive than the full WaveView interface, and exposes
   only the relevant parts of a wave, mainly version numbers. Defining the
   supplement layer in terms of this smaller interface means that a supplement
   model can run against a wave representation that is cheaper than a full wave
   model. There are two implementation of this layer, one adding observability
   to the other.

   CodeInterfaces: {Readable,Writeable,Observable}SupplementedWave
   Implementations: SupplementedWaveImpl, LiveSupplementedWaveImpl

   Example Code
   Given a Wave model wavelet, and its conversation model conversations, the
   live implementation of the supplement model is created with the following
   snippet:

   Wavelet udw = w.getUserData();
   ObservablePrimitiveSupplement data = WaveletBasedSupplement.create(udw);
   ObservableSupplementedWave supplement = new LiveSupplementedWaveImpl(
   data, wavelet, user, DefaultFollow.ALWAYS, conversations);

   Future Work
   This document has described the objective state of the supplement model,
   without explaining its evolutionary history or trajectory. There are a
   number of parts of it that could or should be changed. The supplement's
   structure has a number of legacy concerns that are no longer relevant in
   Google Wave, and/or will never be relevant for Wave-In-A-Box.

   First, there is the obvious task of deleting obsolete parts that only
   exist to interpret old data specific to Google Wave.

   Second, the use case that drove splitting the semantic layer into two
   (the context-free and the contextual) no longer exists, and so code size and
   complexity would be reduced by merging the two semantic layers back into one
   (the contextual supplement).

   Third, the separation of interfaces for readers, writers, and observers
   of supplements, while being semantically satisfying, does add volume to the
   universe of supplement-related types when expressed in Java, and the utility
   of these role-specific views may not be enough to mitigate that complexity
   burden.

   Fourth, version numbers in the supplement model have only 32-bit
   integers, whereas in the wave model proper they are 64-bit integers. This is
   purely a client-specific optimization, because GWT's faithful emulation of
   64-bit numbers in JavaScript (that lacks a native 64-bit number) is a speed
   concern. This sacrifice of correctness for the sake of speed is something
   that could be rectified.

   Fifth, although used successfully in production for over a year, there
   are still some race conditions in the supplement logic.

   Finally, the supplement model initially started as just a model for
   read-state, and grew incrementally into a bag of disparate concerns related
   only by the property of being user-specific. Exposing all this data through
   the one sum interface (PrimitiveSupplement) and implementation
   (WaveletBasedSupplement) is not necessarily a good way to scale, and it is
   perhaps time to split the supplement into individual models (e.g., reading,
   indexing, notifications, etc).

-- 
You received this message because you are subscribed to the Google Groups "Wave 
Protocol" group.
To post to this group, send email to [email protected].
To unsubscribe from this group, send email to 
[email protected].
For more options, visit this group at 
http://groups.google.com/group/wave-protocol?hl=en.

New Design Doc - User-data Wavelets and Supplements

Reply via email to