Re: Schema import dependencies

Doug Cutting Wed, 28 May 2014 12:10:52 -0700

Your userInfo.avsc is not a standalone schema since it depends on
mailing_address already being defined.  A schema included in a data file is
always standalone, and would include the mailing_address schema definition
within the userInfo schema's "address" field.


Some tools will process such non-standalone schemas in separate files.  For
example, the Java schema compiler will accept multiple schema files on the
command line, and those later on the command line may reference types
defined earlier.  Java's maven tasks also permit references to other files,
but these are probably not of interest to a Python developer.

The IDL tool uses the JVM as its runtime but is not Java-specific.

Doug


On Wed, May 28, 2014 at 11:53 AM, Wai Yip Tung <[email protected]> wrote:

> I want to extend this question somewhat. I begin to realized avro has
> accommodation to compose schema from user defined type. I want to check if
> I understand it correctly and also the proper way to use it.
>
> I take a single, two level nested schema from the web (see using an
> embedded record").
>
> http://docs.oracle.com/cd/E26161_02/html/GettingStartedGuide/avroschemas.html
>
> I break it down to two separate records. The main `userInfo` record and
> the embedded `mailing_address` record as two separate JSON object.
>
>
> ------------------------------------------------------------------------
> userInfo.avsc
>
> {
> "type" : "record",
> "name" : "userInfo",
> "namespace" : "my.example",
> "fields" : [{"name" : "username",
>              "type" : "string",
>              "default" : "NONE"},
>
>             {"name" : "age",
>              "type" : "int",
>              "default" : -1},
>
>              {"name" : "phone",
>               "type" : "string",
>               "default" : "NONE"},
>
>              {"name" : "housenum",
>               "type" : "string",
>               "default" : "NONE"},
>
>              {"name" : "address",
>               "type" : "mailing_address",   <--- user defined type
>               "default" : "NONE"},
> ]
> }
>
> ------------------------------------------------------------------------
> mailing_address.avsc
>
> {
>  "type" : "record",
>  "name" : "mailing_address",                 <--- defined here
>  "fields" : [
>     {"name" : "street",
>      "type" : "string",
>      "default" : "NONE"},
>
>     {"name" : "city",
>      "type" : "string",
>      "default" : "NONE"},
>
>     {"name" : "state_prov",
>      "type" : "string",
>      "default" : "NONE"},
>
>     {"name" : "country",
>      "type" : "string",
>      "default" : "NONE"},
>
>     {"name" : "zip",
>      "type" : "string",
>      "default" : "NONE"}
>     ]}
> }
> ------------------------------------------------------------------------
>
> Is this a valid composite avro schema definition?
>
> The second question is how can we actually use this in practice. If we
> have two separate file, is there a standard API that load them both.
> Hrishikesh P mentions avro maven plugin. I mainly use the Python API so I
> am unfamiliar with this. Is a comparable API exist?
>
> I understand the IDL form has explicit linking of schema files. I will
> look into it next.
>
> Wai Yip
>
>
>   Doug Cutting <[email protected]>
>  Thursday, May 22, 2014 2:57 PM
> You might instead use Avro IDL to define your schemas. It permits you
> define multiple schemas in a single file, so that you can determine
> the order they're defined in. It also permits ordered inclusion of
> types from other files, both IDL files and schema files.
>
> Doug
>
> On Thu, May 22, 2014 at 10:46 AM, Hrishikesh P
>
>   Hrishikesh P <[email protected]>
>  Thursday, May 22, 2014 10:46 AM
> I have a few avro schemas that I am generating the code from using the
> avro maven plugin. I have dependencies in the schemas which I was able to
> resolve by putting the schemas in separate folders and/or renaming the
> schema file names with 01-, 02-, ...etc so that the dependencies get
> compiled first. However, this only works on mac but not on RHEL (probably
> because of the different ways the directories are read on them?). Anybody
> knows the best way to handle schema dependencies? If I specify individual
> schema names in the POM in the imports section, the schemas get compiled
> but I have listed the folders and I would like to avoid listing individual
> files if possible.
>
> Here's a related issue: https://issues.apache.org/jira/browse/AVRO-1367
>
> Thanks in advance.
>
>

Re: Schema import dependencies

Reply via email to