Re: GSoC Ideas

Kevin Ratnasekera Mon, 26 Mar 2018 22:15:35 -0700

Hi Talat,

Please make sure you submit a proposal before the deadline. Deadline is
March 27 16:00 UTC.


Regards
Kevin

On Fri, Mar 16, 2018 at 6:18 AM, lewis john mcgibbney <lewi...@apache.org>
wrote:

> Hi Talat,
> In all honesty I don't have the same time I used to, to look into this.
> I have been experimenting using Arrow with multi-dimensional array-based
> data but nothing else.
> I would therefore be learning probably as much as you if this project was
> to go ahead.
> Lewis
>
> On Thu, Mar 15, 2018 at 3:46 PM, Talat Uyarer <ta...@uyarer.com> wrote:
>
>> @Lewis I found a PR[0] on Arrow Git repo. I guess they stuck with avro-c
>> library. Do you know do they need implement same thing for all languages
>> which are supported by them or they just need to implement a wrapper ?
>>
>> If we can use Arrow for our internal serialization, Gora will be super
>> fast with zero copy support. :)
>>
>> [0] https://github.com/apache/arrow/pull/1026
>>
>> My 2 cent
>>
>> On Thu, Mar 15, 2018 at 12:24 AM, lewis john mcgibbney <
>> lewi...@apache.org> wrote:
>>
>>> Hi Renato,
>>>
>>> On Wed, Mar 14, 2018 at 3:22 PM, Renato Marroquín Mogrovejo <
>>> renatoj.marroq...@gmail.com> wrote:
>>>
>>>> Hey guys,
>>>>
>>>> There might not be an integration/convertors of Arrow to Avro (and/or
>>>> viceversa) because there are parquet readers that can take avro and once
>>>> stuff is in parquet, then arrow can be used directly.
>>>>
>>>
>>> Yes there might not be. I actually raised this issue [0] a wee while ago
>>> on the Arrow list. At that time I was told, "...The use case you outline
>>> makes a lot of sense for Arrow to help out with. We don't yet have an AVRO
>>> <> Arrow converter written but it is something that would be great to
>>> have." So maybe that would be something to keep in mind.
>>>
>>> [0] https://s.apache.org/2GwS
>>>
>>>
>>>> Regarding if an integration of Parquet with Gora, I think it would be
>>>> interesting to make it easier for people to read and write parquet files by
>>>> providing a higher level api as Gora provides. However, for you @Talat,
>>>> that knows Gora pretty well, maybe you could take another project that
>>>> helps Gora more. For example, fixing the integration with Nutch. There are
>>>> multiple loose ends in Nutch 2.x and Gora that we have neglected as a
>>>> community.
>>>> IMHO that should be GSOC project.
>>>>
>>>
>>> ACK, other existing projects which consume Gora are (off the top of my
>>> head),
>>>
>>>    - Chukwa - https://s.apache.org/cW6a
>>>    - Giraph - https://github.com/apache/giraph/tree/trunk/giraph-gora
>>>    - Camel - https://camel.apache.org/gora.html
>>>    - Nutch 2.X - https://github.com/apache/nutch/tree/2.x
>>>
>>> An interesting idea I had where Gora could be implemented would be in
>>> Hadoop metrics
>>>
>>> https://hadoop.apache.org/docs/current/hadoop-project-dist/h
>>> adoop-common/Metrics.html
>>>
>>> This would provide provide a text book usage for Gora to store Hadoop
>>> metrics in some datastore which would then be exposed for query and
>>> analysis.
>>>
>>>> I can't mentored it because I do not have enough insights on this, but
>>>> @Lewis and @Talat you can probably tackle this as mentor and student. This
>>>> would be an awesome contribution to the project as there are quite a lot of
>>>> people going over Nutch and trying to use it with Gora.
>>>> Just my 2c
>>>>
>>>>
>>> Understood Renato, no biggie. Thanks for your input. I know you are
>>> working with Parquet alot these days so your input is appreciated.
>>> Lewis
>>>
>>
>>
>>
>> --
>> Talat UYARER
>> Websitesi: http://talat.uyarer.com
>> Twitter: http://twitter.com/talatuyarer
>> Linkedin: http://tr.linkedin.com/pub/talat-uyarer/10/142/304
>>
>
>
>
> --
> http://home.apache.org/~lewismc/
> http://people.apache.org/keys/committer/lewismc
>

Re: GSoC Ideas

Reply via email to