I sent this to the Avro list but got no reply, so I thought I'd try here. Is it possible to name string elements in the schema of an array? Specifically, below I want to name the email addresses in the from/to/cc/bcc/reply_to fields, so they don't get auto-named ARRAY_ELEM by Pig's AvroStorage. I know I can probably fix this in Java in the Pig AvroStorage UDF, but I'm hoping I can also fix it more easily in the schema. Last time I read Avro's array docs in this context, my hit-points dropped by a third, so pardom me if I've not rtfm this time :)
Complete description of what I'm doing follows: Avro schema for my emails: { "namespace": "agile.data.avro", "name": "Email", "type": "record", "fields": [ {"name":"message_id", "type": ["string", "null"]}, {"name":"from","type": [{"type":"array", "items":"string"}, "null"]}, {"name":"to","type": [{"type":"array", "items":"string"}, "null"]}, {"name":"cc","type": [{"type":"array", "items":"string"}, "null"]}, {"name":"bcc","type": [{"type":"array", "items":"string"}, "null"]}, {"name":"reply_to", "type": [{"type":"array", "items":"string"}, "null"]}, {"name":"in_reply_to", "type": [{"type":"array", "items":"string"}, "null"]}, {"name":"subject", "type": ["string", "null"]}, {"name":"body", "type": ["string", "null"]}, {"name":"date", "type": ["string", "null"]} ] } Pig to publish my Avros: grunt> emails = load '/me/tmp/emails' using AvroStorage(); grunt> describe emails emails: { message_id: chararray, from: { PIG_WRAPPER: (*ARRAY_ELEM*: chararray) }, to: { PIG_WRAPPER: (*ARRAY_ELEM*: chararray) }, cc: { PIG_WRAPPER: (*ARRAY_ELEM*: chararray) }, bcc: { PIG_WRAPPER: (*ARRAY_ELEM*: chararray) }, reply_to: { PIG_WRAPPER: (*ARRAY_ELEM*: chararray) }, in_reply_to: { PIG_WRAPPER: (*ARRAY_ELEM*: chararray) }, subject: chararray, body: chararray, date: chararray } grunt> store emails into 'mongodb://localhost/agile_data.emails' using MongoStorage(); My emails in MongoDB: > db.emails.findOne() { "_id" : ObjectId("4f738a35414e113e75707b97"), "message_id" : "<4f71abddc19ec_145449e389847...@li169-134.mail>", "from" : [ { "*ARRAY_ELEM*" : "da...@jobchangealerts.com" } ], "to" : [ { "*ARRAY_ELEM*" : "russell.jur...@gmail.com" } ], "cc" : null, "bcc" : null, "reply_to" : null, "in_reply_to" : null, "subject" : "Daily Job Change Alerts from SalesLoft", "body" : "Daily Job Change Alerts from SalesLoft", "date" : "2012-03-27T08:00:29" } My email on screen: [image: Inline image 1] My face when I see ARRAY_ELEM, because it means more complex presentation code: *:(* What I really want is just an array of strings. Is this possible? -- Russell Jurney twitter.com/rjurney russell.jur...@gmail.com datasyndrome.com