Re: Schema import dependencies

Doug Cutting Wed, 28 May 2014 15:55:28 -0700

IDL is a language-independent way let you merge two schema files into one
standalone schema file.


Doug


On Wed, May 28, 2014 at 3:40 PM, Wai Yip Tung <w...@tungwaiyip.info> wrote:

> Let's say we are interested to keep 2 schema file because they come from 2
> separate organization. When we generate a data file they need to be merged
> into one standalone schema. The maven plugin does this. Otherwise we have
> to merge it ourselves. This is not too hard to merge. I just want make sure
> I'm not missing some exiting tool or API available.
>
> Wai Yip
>
>   Doug Cutting <cutt...@apache.org>
>  Wednesday, May 28, 2014 12:09 PM
> Your userInfo.avsc is not a standalone schema since it depends on
> mailing_address already being defined.  A schema included in a data file is
> always standalone, and would include the mailing_address schema definition
> within the userInfo schema's "address" field.
>
> Some tools will process such non-standalone schemas in separate files.
>  For example, the Java schema compiler will accept multiple schema files on
> the command line, and those later on the command line may reference types
> defined earlier.  Java's maven tasks also permit references to other files,
> but these are probably not of interest to a Python developer.
>
> The IDL tool uses the JVM as its runtime but is not Java-specific.
>
> Doug
>
>
>
>   Wai Yip Tung <w...@tungwaiyip.info>
>  Wednesday, May 28, 2014 11:53 AM
>  I want to extend this question somewhat. I begin to realized avro has
> accommodation to compose schema from user defined type. I want to check if
> I understand it correctly and also the proper way to use it.
>
> I take a single, two level nested schema from the web (see using an
> embedded record").
>
> http://docs.oracle.com/cd/E26161_02/html/GettingStartedGuide/avroschemas.html
>
> I break it down to two separate records. The main `userInfo` record and
> the embedded `mailing_address` record as two separate JSON object.
>
>
> ------------------------------------------------------------------------
> userInfo.avsc
>
> {
> "type" : "record",
> "name" : "userInfo",
> "namespace" : "my.example",
> "fields" : [{"name" : "username",
>              "type" : "string",
>              "default" : "NONE"},
>
>             {"name" : "age",
>              "type" : "int",
>              "default" : -1},
>
>              {"name" : "phone",
>               "type" : "string",
>               "default" : "NONE"},
>
>              {"name" : "housenum",
>               "type" : "string",
>               "default" : "NONE"},
>
>              {"name" : "address",
>               "type" : "mailing_address",   <--- user defined type
>               "default" : "NONE"},
> ]
> }
>
> ------------------------------------------------------------------------
> mailing_address.avsc
>
> {
>  "type" : "record",
>  "name" : "mailing_address",                 <--- defined here
>  "fields" : [
>     {"name" : "street",
>      "type" : "string",
>      "default" : "NONE"},
>
>     {"name" : "city",
>      "type" : "string",
>      "default" : "NONE"},
>
>     {"name" : "state_prov",
>      "type" : "string",
>      "default" : "NONE"},
>
>     {"name" : "country",
>      "type" : "string",
>      "default" : "NONE"},
>
>     {"name" : "zip",
>      "type" : "string",
>      "default" : "NONE"}
>     ]}
> }
> ------------------------------------------------------------------------
>
> Is this a valid composite avro schema definition?
>
> The second question is how can we actually use this in practice. If we
> have two separate file, is there a standard API that load them both.
> Hrishikesh P mentions avro maven plugin. I mainly use the Python API so I
> am unfamiliar with this. Is a comparable API exist?
>
> I understand the IDL form has explicit linking of schema files. I will
> look into it next.
>
> Wai Yip
>
>
>   Doug Cutting <cutt...@apache.org>
>  Thursday, May 22, 2014 2:57 PM
> You might instead use Avro IDL to define your schemas. It permits you
> define multiple schemas in a single file, so that you can determine
> the order they're defined in. It also permits ordered inclusion of
> types from other files, both IDL files and schema files.
>
> Doug
>
> On Thu, May 22, 2014 at 10:46 AM, Hrishikesh P
>
>   Hrishikesh P <hrishi.engin...@gmail.com>
>  Thursday, May 22, 2014 10:46 AM
> I have a few avro schemas that I am generating the code from using the
> avro maven plugin. I have dependencies in the schemas which I was able to
> resolve by putting the schemas in separate folders and/or renaming the
> schema file names with 01-, 02-, ...etc so that the dependencies get
> compiled first. However, this only works on mac but not on RHEL (probably
> because of the different ways the directories are read on them?). Anybody
> knows the best way to handle schema dependencies? If I specify individual
> schema names in the POM in the imports section, the schemas get compiled
> but I have listed the folders and I would like to avoid listing individual
> files if possible.
>
> Here's a related issue: https://issues.apache.org/jira/browse/AVRO-1367
>
> Thanks in advance.
>
>

Re: Schema import dependencies

Reply via email to