[twitter-dev] Early look at Annotations

Marcel Molina Fri, 16 Apr 2010 10:54:50 -0700

Hey everyone. One of the things we talked about at Chirp is the new
Annotations feature we're working on. In short, it allows you to annotate a
tweet with structured metadata. We're still working on Annotations, but I
wanted to share with a wider audience beyond those I was able to talk to in
person at Chirp about how we're thinking of doing Annotations.


* What is an annotation more exactly exactly?

First off let's be clearer about what an annotation is. An annotation is a
namespace, key, value triple. A tweet can have one or more annotations.
Namespaces can have one or more key/value pairs.

* How do I specify what annotations a tweet should have?

Annotations are specified for a tweet when the tweet is created. When
submitting a POST to /statuses/update, you'll include an "annotations"
parameter with your annotations. We're thinking we'll provide two mechanisms
for specifying what a tweet's annotations are:

  1. JSON
  2. form encoded parameters

* How big can an annotation be and how many annotations can I attach to a
tweet?

There is no limit on the size of any given namespace, key or value but the
entire set of all annotations for a given tweet can not exceed some fixed
byte size. That size isn't set in stone yet. We will be starting small
(probably 512 bytes) and growing it gradually as we incrementally roll out
the feature so we can gauge its scalability at various sizes. We'd like to
(no promises) have it end up around 2K. How you use that 2K is up to you.
You can attach one honking annotation, or a thousand+ tiny ones. You can
attach one namespace with hundreds of key/value pairs, or hundreds of
namespaces with just one key/value pair. We want to keep things as flexible
and open ended as possible.

* What kind of data can go into an annotation?

We'd like to allow for any arbitrary data to be stored in an annotation.
Arbitrary Unicode? Sure. MIDI? Go for it. Emoji? Yes please! There might be
some tricky edge cases though. Skip the rest of this paragraph if you don't
care about the details of edge cases... For one, since these annotations
will be serialized to, among other formats, XML, and we'd like to keep the
XML succinct, the namespace and key components of an annotation triple would
likely be an XML tag with its value as, well, its value. If that's the case
then the data of the key must be a valid XML tag. This greatly limits what
it can contain (not even spaces for example). If allowing all three elements
of the triple to contain any arbitrary data is more important than a
succinct XML payload then we'll design a more verbose XML payload. Up to you
all really. I've included examples of both options below. Make a case for
another proposal if you have strong opinions.

* What constitutes a valid annotation?

Aside from the size and data type restrictions listed above, another
requirement is that namespaces and keys be non-empty values. Values, on the
other hand, may be empty. In this way the namespace/key pair can be treated
like a flag of sorts. It should be noted: I'd encourage everyone to always
think of a namespace as a namespace, to think of a key as a key and to think
of a value as a value. Don't take the fact that a value can be empty to mean
that you can skip out on the whole namespace think and morph the namespace
into a key and the key into a value. While open endedness and flexibility is
a quality of the Annotations feature that I'm most excited about for the
developer community, this kind of approach seems prone to causing confusion
by undermining namespaces.

* What namespaces can I write to? What namespaces can I read from?

Anyone can write to or read from any namespace. We aren't planning on
enforcing any policy that restricts someone else from adding an annotation
with "your" namespace or seeing annotations only if they are logged in with
a certain account. In the absence of some really compelling reason to do
that, we want to err on the side of making this feature as flexible and open
ended as possible. Namespaces aren't intended as a way for people to claim
their little slice of the tweet space. Rather they are intended to
dramatically increase the possible significance of a given key/value pair.
If you want a given key to mean one thing and someone else wants that same
key to mean something else, and someone else still wants another meaning,
consumers of your annotations are put in a tricky spot trying to figure out
how to interpret a given annotation without the disambiguation of a
namespace.

* How do we consume annotations?

For convenience, we plan on including annotations for a tweet directly
embedded into that tweet's payload. The XML payload of a tweet I just
inspected at random came out to about 2K in size. The "worst case"
annotation would a little more than double that payload to probably about
5k. We're erring on the side of thinking that the moderate increase in
payload size for tweets with annotations, even on slow connections, is both
more convenient and faster than the latency and inconvenience incurred by
adding another HTTP round trip. Though we'd like to provide an embedded and
non embedded option, the maintenance cost and fragment cache space increase
makes supporting both likely unrealistic so we're going with what we think
satisfies the 80% case. Push back as appropriate.

* What will the payloads look like?

This isn't final. The payloads could end up wildly different after we noodle
around in things like RDF and the semantic web's literature and all that
kind of stuff. You can't see me but my hands are waving vigorously.

Given a hypothetical tweet, "Just got 'Although Of Course You End Up
Becoming Yourself' in the mail. Hopeful. Heart broken."

  JSON

'annotations':
{
  'iso':
  {
    'isbn': '030759243X'
  },
  'amazon':
  {
    'url': '
http://www.amazon.com/Although-Course-You-Becoming-Yourself/dp/030759243X'
  }
}

  XML option #1 which is succinct but restricts the possible values of
namespaces and keys

  <annotations>
    <iso>
      <isbn>030759243X</isbn>
    </iso>
    <amazon>
      <url>
http://www.amazon.com/Although-Course-You-Becoming-Yourself/dp/030759243X
</url>
    </amazon>
  </annotations>

  XML option #2 which is more verbose but allows for namespaces and keys to
contain arbitrary data

  <annotations>
    <annotation>
      <namespace>iso</namespace>
      <key>isbn</key>
      <value>030759243X</value>
    </annotation>
    <annotation>
      <namespace>amazon</namespace>
      <key>url</key>
      <value>
http://www.amazon.com/Although-Course-You-Becoming-Yourself/dp/030759243X
</value>
    </annotation>
  </annotations>

If we went with XML option #2 it may or may not be a problem that it isn't
"symmetrical" with the JSON representation. On the other hand, JSON and XML
tend to be culturally at opposite sides of the Pithiness Spectrum.

* Can I add annotations to a tweet after the tweet has been created?

No. Like the text of a tweet, its annotations are also immutable. They can
only be specified when the tweet they are being attached to is created. For
talking purposes, though, if you want to add annotations to a tweet after
the fact, you could retweet the original tweet and attach annotations to the
retweet.

* Ok, great. What should I use annotations for though?

We don't know! That's the cool thing. Annotations are a blank slate that
lend themselves to myriad divergent use cases. We want to provide open-ended
utility for all the developers to innovate on top of. Some of us have
 initial ideas of cool potential uses cases that I'm sure we'll start to
share just to seed the conversation as we get closer to launch. Developers
will experiment with annotations. Certain ideas and approaches will catch
on. Certain annotations will become standards democratically because
everyone agrees. Some might have diverging opinions. It's something that we
hope will grow organically and be driven by sociological and cultural
forces.

* Ok, great. How are we going to figure out what Joe Random's annotations
actually mean?

That's something we need to figure out as a community. But here is an early
idea: People could add some agreed upon "meta-annotation" that points to
something which *describes* the annotation or annotations that person is
using. Think something sort of like XML DTD, though not necessarily machine
readable. This meta annotation could point to a URL that simply has an HTML
document that gives a description with some examples of the various
annotations you're experimenting with or standardizing on.

* Will it be in search? Streaming? Mobile? My toaster?

We hope so! When we launch you will at minimum be able to attach annotations
to a tweet and consume annotations from a tweet's payload via the REST API.
Of course it would be awesome to be able to say to search or the streaming
API, "give me all tweets with this namespace", or "give me all tweets with
this namespace and key", or etc. We're working with the Search, Streaming
and other teams to make all this happen. We can't promise it'll be ready by
launch but we know it's killer and a must have and are trying to get it
ready soon.

* When is it going to launch?

This is, pretty much, the only thing a couple of us are going to be working
on until it's launched. We really can't wait to get it in your hands to see
all the cool things you'll do with it, so we're cranking to get it out as
soon as possible. If I had to provide a guestimate, I'd wave my hands in the
direction of 2 months for a early, incremental roll out. We not only need to
implement all the functionality, but we also need to productionize it in a
measured and responsible way to ensure its quality of service is high.

In closing:

We're really excited about Annotations. Annotations mark one of our first of
many departures from keeping in lock step with features on the web site. To
truly be a platform, we want to expose high-leverage general purpose utility
for the developer community to innovate on top of. Annotations is just the
first of several high-leverage-general-purpose-utlity features we're hoping
to get to after Annotations.

Think big. Blow our minds.

-- 
Marcel Molina
Twitter Platform Team
http://twitter.com/noradio


-- 
Subscription settings: 
http://groups.google.com/group/twitter-development-talk/subscribe?hl=en

[twitter-dev] Early look at Annotations

Reply via email to