In the doc you refer: // The Avro records get converted to Spark types, filtered, and// then written back out as Avro recordsval df = spark.read.avro("/tmp/episodes.avro")df.filter("doctor > 5").write.avro("/tmp/output")
Alternatively you can specify the format to use instead: [image: Copy to clipboard]Copy val df = spark.read .format("com.databricks.spark.avro") .load("/tmp/episodes.avro") As far as I know spark-avro is not built-in in spark 2.x. That is not the problem, because also in that databricks doc said: *"At the moment, it ignores docs, aliases and other properties present in the Avro file."* Regards. 2017-11-06 22:29 GMT+01:00 Gourav Sengupta <gourav.sengu...@gmail.com>: > Hi, > > I may be wrong about this, but when you are using format("....") you are > basically using old SPARK classes, which still exists because of backward > compatibility. > > Please refer to the following documentation to take advantage of the > recent changes in SPARK: https://docs.databricks.com/spark/latest/ > data-sources/read-avro.html > > Kindly let us know how things are going on. > > Regards, > Gourav Sengupta > > On Mon, Nov 6, 2017 at 8:04 PM, Gaspar Muñoz <gmu...@datiobd.com> wrote: > >> Of course, >> >> right now I'm trying in local with spark 2.2.0 and spark-avro 4.0.0. >> I've just uploaded a snippet https://gist.github.co >> m/gasparms/5d0740bd61a500357e0230756be963e1 >> >> Basically, my avro schema has a field with an alias and in the last part >> of code spark-avro is not able to read old data with old name using the >> alias. >> >> In spark-avro library Readme said that is not supported and I am asking >> if any of you has a workaround or how do you manage schema evolution? >> >> Regards. >> >> 2017-11-05 20:13 GMT+01:00 Gourav Sengupta <gourav.sengu...@gmail.com>: >> >>> Hi Gaspar, >>> >>> can you please provide the details regarding the environment, versions, >>> libraries and code snippets please? >>> >>> For example: SPARK version, OS, distribution, running on YARN, etc and >>> all other details. >>> >>> >>> Regards, >>> Gourav Sengupta >>> >>> On Sun, Nov 5, 2017 at 9:03 AM, Gaspar Muñoz <gmu...@datiobd.com> wrote: >>> >>>> Hi there, >>>> >>>> I use avro format to store historical due to avro schema evolution. I >>>> manage external schemas and read them using avroSchema option so we have >>>> been able to add and delete columns. >>>> >>>> The problem is when I introduced aliases and Spark process didn't work >>>> as expected and then I read in spark-avro library "At the moment, it >>>> ignores docs, aliases and other properties present in the Avro file". >>>> >>>> How do you manage aliases and column renaming? Is there any workaround? >>>> >>>> Thanks in advance. >>>> >>>> Regards >>>> >>>> -- >>>> Gaspar Muñoz Soria >>>> >>>> Vía de las dos Castillas, 33 >>>> <https://maps.google.com/?q=V%C3%ADa+de+las+dos+Castillas,+33&entry=gmail&source=g>, >>>> Ática 4, 3ª Planta >>>> 28224 Pozuelo de Alarcón, Madrid >>>> Tel: +34 91 828 6473 >>>> >>> >>> >> >> >> -- >> Gaspar Muñoz Soria >> >> Vía de las dos Castillas, 33 >> <https://maps.google.com/?q=V%C3%ADa+de+las+dos+Castillas,+33&entry=gmail&source=g>, >> Ática 4, 3ª Planta >> 28224 Pozuelo de Alarcón, Madrid >> Tel: +34 91 828 6473 >> > > -- Gaspar Muñoz Soria Vía de las dos Castillas, 33, Ática 4, 3ª Planta 28224 Pozuelo de Alarcón, Madrid Tel: +34 91 828 6473