selecting JSON nested field in storage plugin

Jiang Wu Fri, 04 Mar 2016 17:05:00 -0800

We are working on a custom Drill storage plugin to retrieve data from a 
proprietary a JSON based storage.  The drill version being used is 1.4.0.  
Assuming the data is a JSON looking like this:


     {
...
          "topping": {
"id": "5001, 5002, ...",
...
}
...
     }

When we submit this query:

select t.topping.id from meld.project1.event.table1 t;

The plugin receives a SchemaPath object for "topping.id".  The plugin then 
creates an output vector with the provided SchemaPath as the field name, e.g.

final MajorType type = Types.optional(MinorType.VARCHAR);   // assuming we 
always use this type
final MaterializedField field = MaterializedField.create(schemaPath, type);     
// schemaPath is given to the plugin

final Class<? extends ValueVector> clazz = (Class<? extends ValueVector>) 
TypeHelper.getValueVectorClass(type.getMinorType(), type.getMode());

ValueVector vector = output.addField(field, clazz);

The data is then added into the vector and returned. However, the returned 
field cannot be matched with what Drill expects.  So we get something like this:

0: jdbc:drill:zk=local> select t.topping.id from meld.project1.event.table1 t;
+---------+
| EXPR$0  |
+---------+
| null    |
| null    |
| null    |
| null    |
| null    |
| null    |
+---------+

When we are expecting to get this:

0: jdbc:drill:zk=local> select t.`topping.id` from meld.project1.event.table1 t;
+-------------------------------------------+
|                topping.id                 |
+-------------------------------------------+
| 5001, 5002, 5003, 5004                    |
| 5001, 5002, 5005, 5007, 5006, 5003, 5004  |
| 5001, 5002, 5003, 5004                    |
| 5001, 5002, 5005, 5007, 5006, 5003, 5004  |
| 5001, 5002, 5005, 5003, 5004              |
| 5001, 5002, 5005, 5003, 5004              |
+-------------------------------------------+

In the second query, the SchemaPath is a nested path.  Our plugin accepts this 
specification and retrieve the same results.  So what are we doing wrong here?  
How do we correctly return values for a nested JSON field?

Thanks.

-- Jiang

selecting JSON nested field in storage plugin

Reply via email to