Hi Lewis, Thanks for pointing out [0]! I guess it makes sense, and there might be some performance to be gained when doing the transformation directly from Avro to Arrow.
Yes, Lewis I totally agree with you in that having Gora to serialize all Hadoop metrics would be an awesome project. Is that a project for GSoC already? Are you planning to mentor any projects? Also regarding this project integration topic, have you thought about proving Any23 a way to read/write xml, html, json objects through Gora? Do you think that would be an interesting project for the Any23 community? Best, Renato M. 2018-03-15 8:26 GMT+01:00 lewis john mcgibbney <lewi...@apache.org>: > I should also say, ALL of the projects below which I have named require > the Gora dependency to be upgraded. > Lewis > > On Thu, Mar 15, 2018 at 12:24 AM, lewis john mcgibbney <lewi...@apache.org > > wrote: > >> Hi Renato, >> >> On Wed, Mar 14, 2018 at 3:22 PM, Renato Marroquín Mogrovejo < >> renatoj.marroq...@gmail.com> wrote: >> >>> Hey guys, >>> >>> There might not be an integration/convertors of Arrow to Avro (and/or >>> viceversa) because there are parquet readers that can take avro and once >>> stuff is in parquet, then arrow can be used directly. >>> >> >> Yes there might not be. I actually raised this issue [0] a wee while ago >> on the Arrow list. At that time I was told, "...The use case you outline >> makes a lot of sense for Arrow to help out with. We don't yet have an AVRO >> <> Arrow converter written but it is something that would be great to >> have." So maybe that would be something to keep in mind. >> >> [0] https://s.apache.org/2GwS >> >> >>> Regarding if an integration of Parquet with Gora, I think it would be >>> interesting to make it easier for people to read and write parquet files by >>> providing a higher level api as Gora provides. However, for you @Talat, >>> that knows Gora pretty well, maybe you could take another project that >>> helps Gora more. For example, fixing the integration with Nutch. There are >>> multiple loose ends in Nutch 2.x and Gora that we have neglected as a >>> community. >>> IMHO that should be GSOC project. >>> >> >> ACK, other existing projects which consume Gora are (off the top of my >> head), >> >> - Chukwa - https://s.apache.org/cW6a >> - Giraph - https://github.com/apache/giraph/tree/trunk/giraph-gora >> - Camel - https://camel.apache.org/gora.html >> - Nutch 2.X - https://github.com/apache/nutch/tree/2.x >> >> An interesting idea I had where Gora could be implemented would be in >> Hadoop metrics >> >> https://hadoop.apache.org/docs/current/hadoop-project-dist/ >> hadoop-common/Metrics.html >> >> This would provide provide a text book usage for Gora to store Hadoop >> metrics in some datastore which would then be exposed for query and >> analysis. >> >>> I can't mentored it because I do not have enough insights on this, but >>> @Lewis and @Talat you can probably tackle this as mentor and student. This >>> would be an awesome contribution to the project as there are quite a lot of >>> people going over Nutch and trying to use it with Gora. >>> Just my 2c >>> >>> >> Understood Renato, no biggie. Thanks for your input. I know you are >> working with Parquet alot these days so your input is appreciated. >> Lewis >> > > > > -- > http://home.apache.org/~lewismc/ > http://people.apache.org/keys/committer/lewismc >