IDL is a language-independent way let you merge two schema files into one standalone schema file.
Doug On Wed, May 28, 2014 at 3:40 PM, Wai Yip Tung <w...@tungwaiyip.info> wrote: > Let's say we are interested to keep 2 schema file because they come from 2 > separate organization. When we generate a data file they need to be merged > into one standalone schema. The maven plugin does this. Otherwise we have > to merge it ourselves. This is not too hard to merge. I just want make sure > I'm not missing some exiting tool or API available. > > Wai Yip > > Doug Cutting <cutt...@apache.org> > Wednesday, May 28, 2014 12:09 PM > Your userInfo.avsc is not a standalone schema since it depends on > mailing_address already being defined. A schema included in a data file is > always standalone, and would include the mailing_address schema definition > within the userInfo schema's "address" field. > > Some tools will process such non-standalone schemas in separate files. > For example, the Java schema compiler will accept multiple schema files on > the command line, and those later on the command line may reference types > defined earlier. Java's maven tasks also permit references to other files, > but these are probably not of interest to a Python developer. > > The IDL tool uses the JVM as its runtime but is not Java-specific. > > Doug > > > > Wai Yip Tung <w...@tungwaiyip.info> > Wednesday, May 28, 2014 11:53 AM > I want to extend this question somewhat. I begin to realized avro has > accommodation to compose schema from user defined type. I want to check if > I understand it correctly and also the proper way to use it. > > I take a single, two level nested schema from the web (see using an > embedded record"). > > http://docs.oracle.com/cd/E26161_02/html/GettingStartedGuide/avroschemas.html > > I break it down to two separate records. The main `userInfo` record and > the embedded `mailing_address` record as two separate JSON object. > > > ------------------------------------------------------------------------ > userInfo.avsc > > { > "type" : "record", > "name" : "userInfo", > "namespace" : "my.example", > "fields" : [{"name" : "username", > "type" : "string", > "default" : "NONE"}, > > {"name" : "age", > "type" : "int", > "default" : -1}, > > {"name" : "phone", > "type" : "string", > "default" : "NONE"}, > > {"name" : "housenum", > "type" : "string", > "default" : "NONE"}, > > {"name" : "address", > "type" : "mailing_address", <--- user defined type > "default" : "NONE"}, > ] > } > > ------------------------------------------------------------------------ > mailing_address.avsc > > { > "type" : "record", > "name" : "mailing_address", <--- defined here > "fields" : [ > {"name" : "street", > "type" : "string", > "default" : "NONE"}, > > {"name" : "city", > "type" : "string", > "default" : "NONE"}, > > {"name" : "state_prov", > "type" : "string", > "default" : "NONE"}, > > {"name" : "country", > "type" : "string", > "default" : "NONE"}, > > {"name" : "zip", > "type" : "string", > "default" : "NONE"} > ]} > } > ------------------------------------------------------------------------ > > Is this a valid composite avro schema definition? > > The second question is how can we actually use this in practice. If we > have two separate file, is there a standard API that load them both. > Hrishikesh P mentions avro maven plugin. I mainly use the Python API so I > am unfamiliar with this. Is a comparable API exist? > > I understand the IDL form has explicit linking of schema files. I will > look into it next. > > Wai Yip > > > Doug Cutting <cutt...@apache.org> > Thursday, May 22, 2014 2:57 PM > You might instead use Avro IDL to define your schemas. It permits you > define multiple schemas in a single file, so that you can determine > the order they're defined in. It also permits ordered inclusion of > types from other files, both IDL files and schema files. > > Doug > > On Thu, May 22, 2014 at 10:46 AM, Hrishikesh P > > Hrishikesh P <hrishi.engin...@gmail.com> > Thursday, May 22, 2014 10:46 AM > I have a few avro schemas that I am generating the code from using the > avro maven plugin. I have dependencies in the schemas which I was able to > resolve by putting the schemas in separate folders and/or renaming the > schema file names with 01-, 02-, ...etc so that the dependencies get > compiled first. However, this only works on mac but not on RHEL (probably > because of the different ways the directories are read on them?). Anybody > knows the best way to handle schema dependencies? If I specify individual > schema names in the POM in the imports section, the schemas get compiled > but I have listed the folders and I would like to avoid listing individual > files if possible. > > Here's a related issue: https://issues.apache.org/jira/browse/AVRO-1367 > > Thanks in advance. > >