On Thu, 16 Jan 2020 at 17:21, Ryan Skraba <r...@skraba.com> wrote:

> didn't find anything currently in the avro-tools that uses both
> reader and writer schemas while deserializing data...  It should be a
> pretty easy feature to add as an option to the DataFileReadTool
> (a.k.a. tojson)!
>

Thanks for that suggestion. I've been delving into that code a bit and
trying to understand what's going on.

At the heart of it is this code:

    GenericDatumReader<Object> reader = new GenericDatumReader<>();
    try (DataFileStream<Object> streamReader = new
DataFileStream<>(inStream, reader)) {
      Schema schema = streamReader.getSchema();
      DatumWriter<Object> writer = new GenericDatumWriter<>(schema);
      JsonEncoder encoder = EncoderFactory.get().jsonEncoder(schema, out,
pretty);

I'm trying to work out where the best place to put the specific reader
schema (taken from a command line flag) might be.

Would it be best to do it when creating the DatumReader (it looks like
there might be a way to create that with a generic writer schema and a
specific reader schema, although I can't quite see how to do that atm), or
when creating the DatumWriter?
Or perhaps there's a better way?

Thanks for any guidance.

   cheers,
    rog.

>
> You are correct about running ./build.sh dist in the java directory --
> it fails with JDK 11 (likely fixable:
> https://issues.apache.org/jira/browse/MJAVADOC-562).
>
> You should probably do a simple mvn clean install instead and find the
> jar in lang/java/tools/target/avro-tools-1.10.0-SNAPSHOT.jar.  That
> should work with JDK11 without any problem (well-tested in the build).
>
> Best regards, Ryan
>
>
>
> On Thu, Jan 16, 2020 at 5:49 PM roger peppe <rogpe...@gmail.com> wrote:
> >
> > Update: I tried running `build.sh dist` in `lang/java` and it failed (at
> least, it looks like a failure message) after downloading a load of Maven
> deps with the following errors:
> https://gist.github.com/rogpeppe/df05d993254dc5082253a5ef5027e965
> >
> > Any hints on what I should do to build the avro-tools jar?
> >
> >   cheers,
> >     rog.
> >
> > On Thu, 16 Jan 2020 at 16:45, roger peppe <rogpe...@gmail.com> wrote:
> >>
> >>
> >> On Thu, 16 Jan 2020 at 13:57, Ryan Skraba <r...@skraba.com> wrote:
> >>>
> >>> Hello!  Is it because you are using brew to install avro-tools?  I'm
> >>> not entirely familiar with how it packages the command, but using a
> >>> direct bash-like solution instead might solve this problem of mixing
> >>> stdout and stderr.  This could be the simplest (and right) solution
> >>> for piping.
> >>
> >>
> >> No, I downloaded the jar and am directly running it with "java -jar
> ~/other/avro-tools-1.9.1.jar".
> >> I'm using Ubuntu Linux 18.04 FWIW - the binary comes from Debian
> package openjdk-11-jre-headless.
> >>
> >> I'm going to try compiling avro-tools myself to investigate but I'm a
> total Java ignoramus - wish me luck!
> >>
> >>>
> >>> alias avrotoolx='java -jar
> >>> ~/.m2/repository/org/apache/avro/avro-tools/1.9.1/avro-tools-1.9.1.jar'
> >>> avrotoolx tojson x.out 2> /dev/null
> >>>
> >>> (As Fokko mentioned, the 2> /dev/null isn't even necessary -- the
> >>> warnings and logs should not be piped along with the normal content.)
> >>>
> >>> Otherwise, IIRC, there is no way to disable the first illegal
> >>> reflective access warning when running in Java 9+, but you can "fix"
> >>> these module errors, and deactivate the NativeCodeLoader logs with an
> >>> explicit log4j.properties:
> >>>
> >>> java -Dlog4j.configuration=file:///tmp/log4j.properties --add-opens
> >>> java.security.jgss/sun.security.krb5=ALL-UNNAMED -jar
> >>> ~/.m2/repository/org/apache/avro/avro-tools/1.9.1/avro-tools-1.9.1.jar
> >>> tojson x.out
> >>
> >>
> >> Thanks for that suggestion! I'm afraid I'm not familiar with log4j
> properties files though. What do I need to put in /tmp/log4j.properties to
> make this work?
> >>
> >>> None of that is particularly satisfactory, but it could be a
> >>> workaround for your immediate use.
> >>
> >>
> >> Yeah, not ideal, because if something goes wrong, stdout will be
> corrupted, but at least some noise should go away :)
> >>
> >>> I'd also like to see a more unified experience with the CLI tool for
> >>> documentation and usage.  The current state requires a bit of Avro
> >>> expertise to use, but it has some functions that would be pretty
> >>> useful for a user working with Avro data.  I raised
> >>> https://issues.apache.org/jira/browse/AVRO-2688 as an improvement.
> >>>
> >>> In my opinion, a schema compatibility tool would be a useful and
> >>> welcome feature!
> >>
> >>
> >> That would indeed be nice, but in the meantime, is there really nothing
> in the avro-tools commands that uses a chosen schema to read a data file
> written with some other schema? That would give me what I'm after currently.
> >>
> >> Thanks again for the helpful response.
> >>
> >>    cheers,
> >>      rog.
> >>
> >>>
> >>> Best regards, Ryan
> >>>
> >>>
> >>>
> >>> On Thu, Jan 16, 2020 at 12:25 PM roger peppe <rogpe...@gmail.com>
> wrote:
> >>> >
> >>> > Hi Fokko,
> >>> >
> >>> > Thanks for your swift response!
> >>> >
> >>> > Stdout and stderr definitely seem to be merged on this platform at
> least. Here's a sample:
> >>> >
> >>> > % avrotool random --count 1 --schema '"int"'  x.out
> >>> > % avrotool tojson x.out > x.json
> >>> > % cat x.json
> >>> > 125140891
> >>> > WARNING: An illegal reflective access operation has occurred
> >>> > WARNING: Illegal reflective access by
> org.apache.hadoop.security.authentication.util.KerberosUtil
> (file:/home/rog/other/avro-tools-1.9.1.jar) to method
> sun.security.krb5.Config.getInstance()
> >>> > WARNING: Please consider reporting this to the maintainers of
> org.apache.hadoop.security.authentication.util.KerberosUtil
> >>> > WARNING: Use --illegal-access=warn to enable warnings of further
> illegal reflective access operations
> >>> > WARNING: All illegal access operations will be denied in a future
> release
> >>> > 20/01/16 11:00:37 WARN util.NativeCodeLoader: Unable to load
> native-hadoop library for your platform... using builtin-java classes where
> applicable
> >>> > %
> >>> >
> >>> > I've just verified that it's not a problem with the java executable
> itself (I ran a program that printed to System.err and the text correctly
> goes to the standard error).
> >>> >
> >>> > > Regarding the documentation, the CLI itself contains info on all
> the available commands. Also, there are excellent online resources:
> https://www.michael-noll.com/blog/2013/03/17/reading-and-writing-avro-files-from-the-command-line/
> Is there anything specific that you're missing?
> >>> >
> >>> > There's the single line summary produced for each command by running
> "avro-tools" with no arguments, but that's not as much info as I'd ideally
> like. For example, it often doesn't say what file format is being written
> or read. For some commands, the purpose is not very clear.
> >>> >
> >>> > For example the description of the recodec command is "Alters the
> codec of a data file". It doesn't describe how it alters it or how one
> might configure the alteration parameters. I managed to get some usage help
> by passing it more than two parameters (specifying "--help" gives an
> exception), but that doesn't provide much more info:
> >>> >
> >>> > % avro-tools recodec a b c
> >>> > Expected at most an input file and output file.
> >>> > Option             Description
> >>> > ------             -----------
> >>> > --codec <String>   Compression codec (default: null)
> >>> > --level <Integer>  Compression level (only applies to deflate and
> xz) (default:
> >>> >                      -1)
> >>> >
> >>> > For the record, I'm wondering it might be possible to get avrotool
> to tell me if one schema is compatible with another so that I can check
> hypotheses about schema-checking in practice without having to write Java
> code.
> >>> >
> >>> >   cheers,
> >>> >     rog.
> >>> >
> >>> >
> >>> > On Thu, 16 Jan 2020 at 10:30, Driesprong, Fokko <fo...@driesprong.frl>
> wrote:
> >>> >>
> >>> >> Hi Rog,
> >>> >>
> >>> >> This is actually a warning produced by the Hadoop library, that
> we're using. Please note that htis isn't part of the stdout:
> >>> >>
> >>> >> $ find /tmp/tmp
> >>> >> /tmp/tmp
> >>> >> /tmp/tmp/._SUCCESS.crc
> >>> >> /tmp/tmp/part-00000-9300fba6-ccdd-4ecc-97cb-0c3ae3631be5-c000.avro
> >>> >>
> /tmp/tmp/.part-00000-9300fba6-ccdd-4ecc-97cb-0c3ae3631be5-c000.avro.crc
> >>> >> /tmp/tmp/_SUCCESS
> >>> >>
> >>> >> $ avro-tools tojson
> /tmp/tmp/part-00000-9300fba6-ccdd-4ecc-97cb-0c3ae3631be5-c000.avro
> >>> >> 20/01/16 11:26:10 WARN util.NativeCodeLoader: Unable to load
> native-hadoop library for your platform... using builtin-java classes where
> applicable
> >>> >> {"line_of_text":{"string":"Hello"}}
> >>> >> {"line_of_text":{"string":"World"}}
> >>> >>
> >>> >> $ avro-tools tojson
> /tmp/tmp/part-00000-9300fba6-ccdd-4ecc-97cb-0c3ae3631be5-c000.avro >
> /tmp/tmp/data.json
> >>> >> 20/01/16 11:26:20 WARN util.NativeCodeLoader: Unable to load
> native-hadoop library for your platform... using builtin-java classes where
> applicable
> >>> >>
> >>> >> $ cat /tmp/tmp/data.json
> >>> >> {"line_of_text":{"string":"Hello"}}
> >>> >> {"line_of_text":{"string":"World"}}
> >>> >>
> >>> >> So when you pipe the data, it doesn't include the warnings.
> >>> >>
> >>> >> Regarding the documentation, the CLI itself contains info on all
> the available commands. Also, there are excellent online resources:
> https://www.michael-noll.com/blog/2013/03/17/reading-and-writing-avro-files-from-the-command-line/
> Is there anything specific that you're missing?
> >>> >>
> >>> >> Hope this helps.
> >>> >>
> >>> >> Cheers, Fokko
> >>> >>
> >>> >> Op do 16 jan. 2020 om 09:30 schreef roger peppe <rogpe...@gmail.com
> >:
> >>> >>>
> >>> >>> Hi,
> >>> >>>
> >>> >>> I've been trying to use avro-tools to verify Avro implementations,
> and I've come across an issue. Perhaps someone here might be able to help?
> >>> >>>
> >>> >>> When I run avro-tools with some subcommands, it prints a bunch of
> warnings (see below) to the standard output. Does anyone know a way to
> disable this? I'm using openjdk 11.0.5 under Ubuntu 18.04 and avro-tools
> 1.9.1.
> >>> >>>
> >>> >>> The warnings are somewhat annoying because they can corrupt output
> of tools that print to the standard output, such as recodec.
> >>> >>>
> >>> >>> Aside: is there any documentation for the commands in avro-tools?
> Some seem to have some command-line help (though unfortunately there
> doesn't seem to be a standard way of showing it), but often that help often
> doesn't describe what the command actually does.
> >>> >>>
> >>> >>> Here's the output that I see:
> >>> >>>
> >>> >>> WARNING: An illegal reflective access operation has occurred
> >>> >>> WARNING: Illegal reflective access by
> org.apache.hadoop.security.authentication.util.KerberosUtil
> (file:/home/rog/other/avro-tools-1.9.1.jar) to method
> sun.security.krb5.Config.getInstance()
> >>> >>> WARNING: Please consider reporting this to the maintainers of
> org.apache.hadoop.security.authentication.util.KerberosUtil
> >>> >>> WARNING: Use --illegal-access=warn to enable warnings of further
> illegal reflective access operations
> >>> >>> WARNING: All illegal access operations will be denied in a future
> release
> >>> >>> 20/01/16 08:12:39 WARN util.NativeCodeLoader: Unable to load
> native-hadoop library for your platform... using builtin-java classes where
> applicable
> >>> >>>
> >>> >>>   cheers,
> >>> >>>     rog.
> >>> >>>
>

Reply via email to