Hi Martin, Thanks for reply. On Mon, Mar 31, 2014 at 4:49 PM, Martin Kleppmann <mkleppm...@linkedin.com>wrote:
> > Say you make a change to the schema. Your database now contains some > records that were written before the schema change (i.e. encoded with > schema v1) and some records that were written afterwards (encoded with > schema v2). Ideally, an application should be able to read them all > transparently and not have to care which schema version is used in the > underlying store. > Absolutely. > How does Gora handle this? I looked through the website but couldn't find > a clear answer. > > > Right now we maintain only the Writer's schema, which as I mentioned is appended within the generated Persistent Java bean. In my own experience (and as you've hinted at :) ) this had/has caused us problems in the past. For example we added a new (pretty innocent) string Field 'batchId' to our WebPage Schema [0] over in Nutch meaning that new Records being written included it and older records already within the data set did not. {"name": "batchId", "type": "string"} This inevitably threw NPE when certain Tools attempted to access certain records which the batchId Field and value was absent. So taking a bit of advice from a well recognized voice in this area (uh hum ;)) "If you're storing records in a database one-by-one, you may end up with different schema versions written at different times, and so you have to annotate each record with its schema version. If storing the schema itself is too much overhead, you can use a hash of the schema, or a sequential schema version number. You then need a schema registry where you can look up the exact schema definition for a given version number." Fortunately in the above example this particular Schema has only changed once in some 2 or 3 years. However it HAS changed. Looks like I am also taking a lesson from this thread and we have a bit more work to do on Gora to address the above points. This is of course unless I have missed something! [0] https://svn.apache.org/repos/asf/nutch/branches/2.x/src/gora/webpage.avsc