Re: GSoC Ideas

Renato Marroquín Mogrovejo Thu, 15 Mar 2018 03:00:13 -0700

Hi Lewis,

Thanks for pointing out [0]! I guess it makes sense, and there might be
some performance to be gained when doing the transformation directly from
Avro to Arrow.


Yes, Lewis I totally agree with you in that having Gora to serialize all
Hadoop metrics would be an awesome project. Is that a project for GSoC
already?   Are you planning to mentor any projects?

Also regarding this project integration topic, have you thought about
proving Any23 a way to read/write xml, html, json objects through Gora? Do
you think that would be an interesting project for the Any23 community?


Best,

Renato M.

2018-03-15 8:26 GMT+01:00 lewis john mcgibbney <lewi...@apache.org>:

> I should also say, ALL of the projects below which I have named require
> the Gora dependency to be upgraded.
> Lewis
>
> On Thu, Mar 15, 2018 at 12:24 AM, lewis john mcgibbney <lewi...@apache.org
> > wrote:
>
>> Hi Renato,
>>
>> On Wed, Mar 14, 2018 at 3:22 PM, Renato Marroquín Mogrovejo <
>> renatoj.marroq...@gmail.com> wrote:
>>
>>> Hey guys,
>>>
>>> There might not be an integration/convertors of Arrow to Avro (and/or
>>> viceversa) because there are parquet readers that can take avro and once
>>> stuff is in parquet, then arrow can be used directly.
>>>
>>
>> Yes there might not be. I actually raised this issue [0] a wee while ago
>> on the Arrow list. At that time I was told, "...The use case you outline
>> makes a lot of sense for Arrow to help out with. We don't yet have an AVRO
>> <> Arrow converter written but it is something that would be great to
>> have." So maybe that would be something to keep in mind.
>>
>> [0] https://s.apache.org/2GwS
>>
>>
>>> Regarding if an integration of Parquet with Gora, I think it would be
>>> interesting to make it easier for people to read and write parquet files by
>>> providing a higher level api as Gora provides. However, for you @Talat,
>>> that knows Gora pretty well, maybe you could take another project that
>>> helps Gora more. For example, fixing the integration with Nutch. There are
>>> multiple loose ends in Nutch 2.x and Gora that we have neglected as a
>>> community.
>>> IMHO that should be GSOC project.
>>>
>>
>> ACK, other existing projects which consume Gora are (off the top of my
>> head),
>>
>>    - Chukwa - https://s.apache.org/cW6a
>>    - Giraph - https://github.com/apache/giraph/tree/trunk/giraph-gora
>>    - Camel - https://camel.apache.org/gora.html
>>    - Nutch 2.X - https://github.com/apache/nutch/tree/2.x
>>
>> An interesting idea I had where Gora could be implemented would be in
>> Hadoop metrics
>>
>> https://hadoop.apache.org/docs/current/hadoop-project-dist/
>> hadoop-common/Metrics.html
>>
>> This would provide provide a text book usage for Gora to store Hadoop
>> metrics in some datastore which would then be exposed for query and
>> analysis.
>>
>>> I can't mentored it because I do not have enough insights on this, but
>>> @Lewis and @Talat you can probably tackle this as mentor and student. This
>>> would be an awesome contribution to the project as there are quite a lot of
>>> people going over Nutch and trying to use it with Gora.
>>> Just my 2c
>>>
>>>
>> Understood Renato, no biggie. Thanks for your input. I know you are
>> working with Parquet alot these days so your input is appreciated.
>> Lewis
>>
>
>
>
> --
> http://home.apache.org/~lewismc/
> http://people.apache.org/keys/committer/lewismc
>

Re: GSoC Ideas

Reply via email to