Re: [DISCUSS] Flink 1.6 features

sagar loke Mon, 18 Jun 2018 18:11:10 -0700

Thanks @zhangminglei for replying.

I agree, hive on Flink would be a big project.


By the way, i looked at the Jira ticket related to ORC format which you
shared.

Couple of comments/requests about the pull request in th ticket:

1. Sorry for nitpicking but meatSchema is mispelled. I think it should be
metaSchema.

2. Will you be able to add more unit tests in the commit ? Eg. Writing some
example data with simple schema which will initialize OrcWriter object and
sinking it to local hdfs node ?

3. Are there plans to add support for other data types ?

Thanks,
Sagar

On Sun, Jun 17, 2018 at 6:45 AM zhangminglei <18717838...@163.com> wrote:

> But if we do hive on flink , I think it should be a very big project.
>
>
> > 在 2018年6月17日，下午9:36，Will Du <will...@gmail.com> 写道：
> >
> > Agree, two missing pieces I think could make Flink more competitive
> against Spark SQL/Stream and Kafka Stream
> > 1. Flink over Hive or Flink SQL hive table source and sink
> > 2. Flink ML on stream
> >
> >
> >> On Jun 17, 2018, at 8:34 AM, zhangminglei <18717838...@163.com> wrote:
> >>
> >> Actually, I have been an idea, how about support hive on flink ? Since
> lots of business are written by hive sql. And users wants to transform map
> reduce to fink without changing the sql.
> >>
> >> Zhangminglei
> >>
> >>
> >>
> >>> 在 2018年6月17日，下午8:11，zhangminglei <18717838...@163.com> 写道：
> >>>
> >>> Hi, Sagar
> >>>
> >>> There already has relative JIRAs for ORC and Parquet, you can take a
> look here:
> >>>
> >>> https://issues.apache.org/jira/browse/FLINK-9407 <
> https://issues.apache.org/jira/browse/FLINK-9407> and
> https://issues.apache.org/jira/browse/FLINK-9411 <
> https://issues.apache.org/jira/browse/FLINK-9411>
> >>>
> >>> For ORC format, Currently only support basic data types, such as Long,
> Boolean, Short, Integer, Float, Double, String.
> >>>
> >>> Best
> >>> Zhangminglei
> >>>
> >>>
> >>>
> >>>> 在 2018年6月17日，上午11:11，sagar loke <sagar...@gmail.com> 写道：
> >>>>
> >>>> We are eagerly waiting for
> >>>>
> >>>> - Extends Streaming Sinks:
> >>>>   - Bucketing Sink should support S3 properly (compensate for
> eventual consistency), work with Flink's shaded S3 file systems, and
> efficiently support formats that compress/index arcoss individual rows
> (Parquet, ORC, ...)
> >>>>
> >>>> Especially for ORC and Parquet sinks. Since, We are planning to use
> Kafka-jdbc to move data from rdbms to hdfs.
> >>>>
> >>>> Thanks,
> >>>>
> >>>> On Sat, Jun 16, 2018 at 5:08 PM Elias Levy <
> fearsome.lucid...@gmail.com <mailto:fearsome.lucid...@gmail.com>> wrote:
> >>>> One more, since it we have to deal with it often:
> >>>>
> >>>> - Idling sources (Kafka in particular) and proper watermark
> propagation: FLINK-5018 / FLINK-5479
> >>>>
> >>>> On Fri, Jun 8, 2018 at 2:58 PM, Elias Levy <
> fearsome.lucid...@gmail.com <mailto:fearsome.lucid...@gmail.com>> wrote:
> >>>> Since wishes are free:
> >>>>
> >>>> - Standalone cluster job isolation:
> https://issues.apache.org/jira/browse/FLINK-8886 <
> https://issues.apache.org/jira/browse/FLINK-8886>
> >>>> - Proper sliding window joins (not overlapping hoping window joins):
> https://issues.apache.org/jira/browse/FLINK-6243 <
> https://issues.apache.org/jira/browse/FLINK-6243>
> >>>> - Sharing state across operators:
> https://issues.apache.org/jira/browse/FLINK-6239 <
> https://issues.apache.org/jira/browse/FLINK-6239>
> >>>> - Synchronizing streams:
> https://issues.apache.org/jira/browse/FLINK-4558 <
> https://issues.apache.org/jira/browse/FLINK-4558>
> >>>>
> >>>> Seconded:
> >>>> - Atomic cancel-with-savepoint:
> https://issues.apache.org/jira/browse/FLINK-7634 <
> https://issues.apache.org/jira/browse/FLINK-7634>
> >>>> - Support dynamically changing CEP patterns :
> https://issues.apache.org/jira/browse/FLINK-7129 <
> https://issues.apache.org/jira/browse/FLINK-7129>
> >>>>
> >>>>
> >>>> On Fri, Jun 8, 2018 at 1:31 PM, Stephan Ewen <se...@apache.org
> <mailto:se...@apache.org>> wrote:
> >>>> Hi all!
> >>>>
> >>>> Thanks for the discussion and good input. Many suggestions fit well
> with the proposal above.
> >>>>
> >>>> Please bear in mind that with a time-based release model, we would
> release whatever is mature by end of July.
> >>>> The good thing is we could schedule the next release not too far
> after that, so that the features that did not quite make it will not be
> delayed too long.
> >>>> In some sense, you could read this as as "what to do first" list,
> rather than "this goes in, other things stay out".
> >>>>
> >>>> Some thoughts on some of the suggestions
> >>>>
> >>>> Kubernetes integration: An opaque integration with Kubernetes should
> be supported through the "as a library" mode. For a deeper integration, I
> know that some committers have experimented with some PoC code. I would let
> Till add some thoughts, he has worked the most on the deployment parts
> recently.
> >>>>
> >>>> Per partition watermarks with idleness: Good point, could one
> implement that on the current interface, with a periodic watermark
> extractor?
> >>>>
> >>>> Atomic cancel-with-savepoint: Agreed, this is important. Making this
> work with all sources needs a bit more work. We should have this in the
> roadmap.
> >>>>
> >>>> Elastic Bloomfilters: This seems like an interesting new feature -
> the above suggested feature set was more about addressing some longer
> standing issues/requests. However, nothing should prevent contributors to
> work on that.
> >>>>
> >>>> Best,
> >>>> Stephan
> >>>>
> >>>>
> >>>> On Wed, Jun 6, 2018 at 6:23 AM, Yan Zhou [FDS Science] <
> yz...@coupang.com <mailto:yz...@coupang.com>> wrote:
> >>>> +1 on https://issues.apache.org/jira/browse/FLINK-5479 <
> https://issues.apache.org/jira/browse/FLINK-5479>
> >>>> [FLINK-5479] Per-partition watermarks in ... <
> https://issues.apache.org/jira/browse/FLINK-5479>
> >>>> issues.apache.org <http://issues.apache.org/>
> >>>> Reported in ML:
> http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/Kafka-topic-partition-skewness-causes-watermark-not-being-emitted-td11008.html
> <
> http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/Kafka-topic-partition-skewness-causes-watermark-not-being-emitted-td11008.html>
> It's normally not a common case to have Kafka partitions not producing any
> data, but it'll probably be good to handle this as well. I ...
> >>>>
> >>>> From: Rico Bergmann <i...@ricobergmann.de <mailto:
> i...@ricobergmann.de>>
> >>>> Sent: Tuesday, June 5, 2018 9:12:00 PM
> >>>> To: Hao Sun
> >>>> Cc: d...@flink.apache.org <mailto:d...@flink.apache.org>; user
> >>>> Subject: Re: [DISCUSS] Flink 1.6 features
> >>>>
> >>>> +1 on K8s integration
> >>>>
> >>>>
> >>>>
> >>>> Am 06.06.2018 um 00:01 schrieb Hao Sun <ha...@zendesk.com <mailto:
> ha...@zendesk.com>>:
> >>>>
> >>>>> adding my vote to K8S Job mode, maybe it is this?
> >>>>>> Smoothen the integration in Container environment, like "Flink as a
> Library", and easier integration with Kubernetes services and other proxies.
> >>>>>
> >>>>>
> >>>>>
> >>>>> On Mon, Jun 4, 2018 at 11:01 PM Ben Yan <yan.xiao.bin.m...@gmail.com
> <mailto:yan.xiao.bin.m...@gmail.com>> wrote:
> >>>>> Hi Stephan,
> >>>>>
> >>>>> Will  [ https://issues.apache.org/jira/browse/FLINK-5479 <
> https://issues.apache.org/jira/browse/FLINK-5479> ]  (Per-partition
> watermarks in FlinkKafkaConsumer should consider idle partitions) be
> included in 1.6? As we are seeing more users with this issue on the mailing
> lists.
> >>>>>
> >>>>> Thanks.
> >>>>> Ben
> >>>>>
> >>>>> 2018-06-05 5:29 GMT+08:00 Che Lui Shum <sh...@us.ibm.com <mailto:
> sh...@us.ibm.com>>:
> >>>>> Hi Stephan,
> >>>>>
> >>>>> Will FLINK-7129 (Support dynamically changing CEP patterns) be
> included in 1.6? There were discussions about possibly including it in 1.6:
> >>>>>
> http://mail-archives.apache.org/mod_mbox/flink-user/201803.mbox/%3cCAMq=ou7gru2o9jtowxn1lc1f7nkcxayn6a3e58kxctb4b50...@mail.gmail.com%3e
> <
> http://mail-archives.apache.org/mod_mbox/flink-user/201803.mbox/%3cCAMq=ou7gru2o9jtowxn1lc1f7nkcxayn6a3e58kxctb4b50...@mail.gmail.com%3e
> >
> >>>>>
> >>>>> Thanks,
> >>>>> Shirley Shum
> >>>>>
> >>>>> Stephan Ewen ---06/04/2018 02:21:47 AM---Hi Flink Community! The
> release of Apache Flink 1.5 has happened (yay!) - so it is a good time
> >>>>>
> >>>>> From: Stephan Ewen <se...@apache.org <mailto:se...@apache.org>>
> >>>>> To: d...@flink.apache.org <mailto:d...@flink.apache.org>, user <
> user@flink.apache.org <mailto:user@flink.apache.org>>
> >>>>> Date: 06/04/2018 02:21 AM
> >>>>> Subject: [DISCUSS] Flink 1.6 features
> >>>>>
> >>>>>
> >>>>>
> >>>>> Hi Flink Community!
> >>>>>
> >>>>> The release of Apache Flink 1.5 has happened (yay!) - so it is a
> good time to start talking about what to do for release 1.6.
> >>>>>
> >>>>> == Suggested release timeline ==
> >>>>>
> >>>>> I would propose to release around end of July (that is 8-9 weeks
> from now).
> >>>>>
> >>>>> The rational behind that: There was a lot of effort in release
> testing automation (end-to-end tests, scripted stress tests) as part of
> release 1.5. You may have noticed the big set of new modules under
> "flink-end-to-end-tests" in the Flink repository. It delayed the 1.5
> release a bit, and needs to continue as part of the coming release cycle,
> but should help make releasing more lightweight from now on.
> >>>>>
> >>>>> (Side note: There are also some nightly stress tests that we created
> and run at data Artisans, and where we are looking whether and in which way
> it would make sense to contribute them to Flink.)
> >>>>>
> >>>>> == Features and focus areas ==
> >>>>>
> >>>>> We had a lot of big and heavy features in Flink 1.5, with FLIP-6,
> the new network stack, recovery, SQL joins and client, ... Following
> something like a "tick-tock-model", I would suggest to focus the next
> release more on integrations, tooling, and reducing user friction.
> >>>>>
> >>>>> Of course, this does not mean that no other pull request gets
> reviewed, an no other topic will be examined - it is simply meant as a help
> to understand where to expect more activity during the next release cycle.
> Note that these are really the coarse focus areas - don't read this as a
> comprehensive list.
> >>>>>
> >>>>> This list is my first suggestion, based on discussions with
> committers, users, and mailing list questions.
> >>>>>
> >>>>> - Support Java 9 and Scala 2.12
> >>>>>
> >>>>> - Smoothen the integration in Container environment, like "Flink as
> a Library", and easier integration with Kubernetes services and other
> proxies.
> >>>>>
> >>>>> - Polish the remaing parts of the FLIP-6 rewrite
> >>>>>
> >>>>> - Improve state backends with asynchronous timer snapshots,
> efficient timer deletes, state TTL, and broadcast state support in RocksDB.
> >>>>>
> >>>>> - Extends Streaming Sinks:
> >>>>>   - Bucketing Sink should support S3 properly (compensate for
> eventual consistency), work with Flink's shaded S3 file systems, and
> efficiently support formats that compress/index arcoss individual rows
> (Parquet, ORC, ...)
> >>>>>   - Support ElasticSearch's new REST API
> >>>>>
> >>>>> - Smoothen State Evolution to support type conversion on snapshot
> restore
> >>>>>
> >>>>> - Enhance Stream SQL and CEP
> >>>>>   - Add support for "update by key" Table Sources
> >>>>>   - Add more table sources and sinks (Kafka, Kinesis, Files, K/V
> stores)
> >>>>>   - Expand SQL client
> >>>>>   - Integrate CEP and SQL, through MATCH_RECOGNIZE clause
> >>>>>   - Improve CEP Performance of SharedBuffer on RocksDB
> >>>>>
> >>>>>
> >>>>>
> >>>>>
> >>>>>
> >>>>
> >>>>
> >>>>
> >>>> --
> >>>> Cheers,
> >>>> Sagar
> >>>
> >>
> >>
>
>
> --
Cheers,
Sagar

Re: [DISCUSS] Flink 1.6 features

Reply via email to