Re: Want to contribute to Beam project

2017-04-01 Thread Jean-Baptiste Onofré
Hi Tarush, welcome aboard ! You can take a look on https://beam.apache.org/contribute/. Any contribution is valuable (not only code): documentation, etc. I propose to you to take a look on the Jira, experiment Beam to find new features/improvement, and be involved on the mailing list.

Re: [PROPOSAL] ORC support

2017-04-01 Thread Jean-Baptiste Onofré
+1 By the way, around the same topic, I'm working on Apache CarbonData support (http://carbondata.apache.org/). Regards JB On 04/01/2017 05:31 PM, Tibor Kiss wrote: Hello, Recently the Optimized Row Columnar (ORC) file format was spin off from Hive and became a top level Apache Project:

Want to contribute to Beam project

2017-04-01 Thread tarush grover
Hi Members, Let me introduce myself I am Tarush Grover with 3 years working in the big data technologies as senior software engineer. I find Apache Beam to be an exciting project. I request community members to please involve me in this exciting journey. Please guide me to where and how to start

Re: [PROPOSAL] ORC support

2017-04-01 Thread Ismaël Mejía
+1 >From my previous work experience ORC in certain cases performs better than Parquet and really deserves to be supported. On Sat, Apr 1, 2017 at 5:58 PM, Ted Yu wrote: > +1 > >> On Apr 1, 2017, at 8:31 AM, Tibor Kiss wrote: >> >> Hello, >> >>

Re: Update of Pei in Alibaba

2017-04-01 Thread Ismaël Mejía
Excellent news, Pei it would be great to have a new runner. I am curious about how different are the implementations of storm among them considering that there are already three 'versions': Storm, Jstorm and Heron, I wonder if one runner could traduce to an API that would cover all of them (of

Re: [PROPOSAL] ORC support

2017-04-01 Thread Ted Yu
+1 > On Apr 1, 2017, at 8:31 AM, Tibor Kiss wrote: > > Hello, > > Recently the Optimized Row Columnar (ORC) file format was spin off from Hive > and became a top level Apache Project: https://orc.apache.org/ > > It is similar to Parquet in a sense that it uses column

Re: Update of Pei in Alibaba

2017-04-01 Thread Tibor Kiss
Exciting times, looking forward to try it out! I shall mention that Taylor Goetz also started creating a BEAM runner using Storm. His work is available in the storm repo: https://github.com/apache/storm/commits/beam-runner Maybe it's worth while to take a peek and see if something is reusable

Re: Call for help: let's add Splittable DoFn to Spark, Flink and Apex runners

2017-04-01 Thread Eugene Kirpichov
Hey all, The Flink PR has been merged, and thus - Flink becomes the first distributed runner to support Splittable DoFn!!! Thank you, Aljoscha! Looking forward to Spark and Apex, and continuing work on Dataflow. I'll also send proposals about a couple of new ideas related to SDF next week. On