Hi, Sagar There already has relative JIRAs for ORC and Parquet, you can take a look here:
https://issues.apache.org/jira/browse/FLINK-9407 <https://issues.apache.org/jira/browse/FLINK-9407> and https://issues.apache.org/jira/browse/FLINK-9411 <https://issues.apache.org/jira/browse/FLINK-9411> For ORC format, Currently only support basic data types, such as Long, Boolean, Short, Integer, Float, Double, String. Best Zhangminglei > 在 2018年6月17日,上午11:11,sagar loke <sagar...@gmail.com> 写道: > > We are eagerly waiting for > > - Extends Streaming Sinks: > - Bucketing Sink should support S3 properly (compensate for eventual > consistency), work with Flink's shaded S3 file systems, and efficiently > support formats that compress/index arcoss individual rows (Parquet, ORC, ...) > > Especially for ORC and Parquet sinks. Since, We are planning to use > Kafka-jdbc to move data from rdbms to hdfs. > > Thanks, > > On Sat, Jun 16, 2018 at 5:08 PM Elias Levy <fearsome.lucid...@gmail.com > <mailto:fearsome.lucid...@gmail.com>> wrote: > One more, since it we have to deal with it often: > > - Idling sources (Kafka in particular) and proper watermark propagation: > FLINK-5018 / FLINK-5479 > > On Fri, Jun 8, 2018 at 2:58 PM, Elias Levy <fearsome.lucid...@gmail.com > <mailto:fearsome.lucid...@gmail.com>> wrote: > Since wishes are free: > > - Standalone cluster job isolation: > https://issues.apache.org/jira/browse/FLINK-8886 > <https://issues.apache.org/jira/browse/FLINK-8886> > - Proper sliding window joins (not overlapping hoping window joins): > https://issues.apache.org/jira/browse/FLINK-6243 > <https://issues.apache.org/jira/browse/FLINK-6243> > - Sharing state across operators: > https://issues.apache.org/jira/browse/FLINK-6239 > <https://issues.apache.org/jira/browse/FLINK-6239> > - Synchronizing streams: https://issues.apache.org/jira/browse/FLINK-4558 > <https://issues.apache.org/jira/browse/FLINK-4558> > > Seconded: > - Atomic cancel-with-savepoint: > https://issues.apache.org/jira/browse/FLINK-7634 > <https://issues.apache.org/jira/browse/FLINK-7634> > - Support dynamically changing CEP patterns : > https://issues.apache.org/jira/browse/FLINK-7129 > <https://issues.apache.org/jira/browse/FLINK-7129> > > > On Fri, Jun 8, 2018 at 1:31 PM, Stephan Ewen <se...@apache.org > <mailto:se...@apache.org>> wrote: > Hi all! > > Thanks for the discussion and good input. Many suggestions fit well with the > proposal above. > > Please bear in mind that with a time-based release model, we would release > whatever is mature by end of July. > The good thing is we could schedule the next release not too far after that, > so that the features that did not quite make it will not be delayed too long. > In some sense, you could read this as as "what to do first" list, rather than > "this goes in, other things stay out". > > Some thoughts on some of the suggestions > > Kubernetes integration: An opaque integration with Kubernetes should be > supported through the "as a library" mode. For a deeper integration, I know > that some committers have experimented with some PoC code. I would let Till > add some thoughts, he has worked the most on the deployment parts recently. > > Per partition watermarks with idleness: Good point, could one implement that > on the current interface, with a periodic watermark extractor? > > Atomic cancel-with-savepoint: Agreed, this is important. Making this work > with all sources needs a bit more work. We should have this in the roadmap. > > Elastic Bloomfilters: This seems like an interesting new feature - the above > suggested feature set was more about addressing some longer standing > issues/requests. However, nothing should prevent contributors to work on that. > > Best, > Stephan > > > On Wed, Jun 6, 2018 at 6:23 AM, Yan Zhou [FDS Science] <yz...@coupang.com > <mailto:yz...@coupang.com>> wrote: > +1 on https://issues.apache.org/jira/browse/FLINK-5479 > <https://issues.apache.org/jira/browse/FLINK-5479> > [FLINK-5479] Per-partition watermarks in ... > <https://issues.apache.org/jira/browse/FLINK-5479> > issues.apache.org <http://issues.apache.org/> > Reported in ML: > http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/Kafka-topic-partition-skewness-causes-watermark-not-being-emitted-td11008.html > > <http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/Kafka-topic-partition-skewness-causes-watermark-not-being-emitted-td11008.html> > It's normally not a common case to have Kafka partitions not producing any > data, but it'll probably be good to handle this as well. I ... > > From: Rico Bergmann <i...@ricobergmann.de <mailto:i...@ricobergmann.de>> > Sent: Tuesday, June 5, 2018 9:12:00 PM > To: Hao Sun > Cc: d...@flink.apache.org <mailto:d...@flink.apache.org>; user > Subject: Re: [DISCUSS] Flink 1.6 features > > +1 on K8s integration > > > > Am 06.06.2018 um 00:01 schrieb Hao Sun <ha...@zendesk.com > <mailto:ha...@zendesk.com>>: > >> adding my vote to K8S Job mode, maybe it is this? >> > Smoothen the integration in Container environment, like "Flink as a >> > Library", and easier integration with Kubernetes services and other >> > proxies. >> >> >> >> On Mon, Jun 4, 2018 at 11:01 PM Ben Yan <yan.xiao.bin.m...@gmail.com >> <mailto:yan.xiao.bin.m...@gmail.com>> wrote: >> Hi Stephan, >> >> Will [ https://issues.apache.org/jira/browse/FLINK-5479 >> <https://issues.apache.org/jira/browse/FLINK-5479> ] (Per-partition >> watermarks in FlinkKafkaConsumer should consider idle partitions) be >> included in 1.6? As we are seeing more users with this issue on the mailing >> lists. >> >> Thanks. >> Ben >> >> 2018-06-05 5:29 GMT+08:00 Che Lui Shum <sh...@us.ibm.com >> <mailto:sh...@us.ibm.com>>: >> Hi Stephan, >> >> Will FLINK-7129 (Support dynamically changing CEP patterns) be included in >> 1.6? There were discussions about possibly including it in 1.6: >> http://mail-archives.apache.org/mod_mbox/flink-user/201803.mbox/%3cCAMq=ou7gru2o9jtowxn1lc1f7nkcxayn6a3e58kxctb4b50...@mail.gmail.com%3e >> >> <http://mail-archives.apache.org/mod_mbox/flink-user/201803.mbox/%3cCAMq=ou7gru2o9jtowxn1lc1f7nkcxayn6a3e58kxctb4b50...@mail.gmail.com%3e> >> >> Thanks, >> Shirley Shum >> >> Stephan Ewen ---06/04/2018 02:21:47 AM---Hi Flink Community! The release of >> Apache Flink 1.5 has happened (yay!) - so it is a good time >> >> From: Stephan Ewen <se...@apache.org <mailto:se...@apache.org>> >> To: d...@flink.apache.org <mailto:d...@flink.apache.org>, user >> <user@flink.apache.org <mailto:user@flink.apache.org>> >> Date: 06/04/2018 02:21 AM >> Subject: [DISCUSS] Flink 1.6 features >> >> >> >> Hi Flink Community! >> >> The release of Apache Flink 1.5 has happened (yay!) - so it is a good time >> to start talking about what to do for release 1.6. >> >> == Suggested release timeline == >> >> I would propose to release around end of July (that is 8-9 weeks from now). >> >> The rational behind that: There was a lot of effort in release testing >> automation (end-to-end tests, scripted stress tests) as part of release 1.5. >> You may have noticed the big set of new modules under >> "flink-end-to-end-tests" in the Flink repository. It delayed the 1.5 release >> a bit, and needs to continue as part of the coming release cycle, but should >> help make releasing more lightweight from now on. >> >> (Side note: There are also some nightly stress tests that we created and run >> at data Artisans, and where we are looking whether and in which way it would >> make sense to contribute them to Flink.) >> >> == Features and focus areas == >> >> We had a lot of big and heavy features in Flink 1.5, with FLIP-6, the new >> network stack, recovery, SQL joins and client, ... Following something like >> a "tick-tock-model", I would suggest to focus the next release more on >> integrations, tooling, and reducing user friction. >> >> Of course, this does not mean that no other pull request gets reviewed, an >> no other topic will be examined - it is simply meant as a help to understand >> where to expect more activity during the next release cycle. Note that these >> are really the coarse focus areas - don't read this as a comprehensive list. >> >> This list is my first suggestion, based on discussions with committers, >> users, and mailing list questions. >> >> - Support Java 9 and Scala 2.12 >> >> - Smoothen the integration in Container environment, like "Flink as a >> Library", and easier integration with Kubernetes services and other proxies. >> >> - Polish the remaing parts of the FLIP-6 rewrite >> >> - Improve state backends with asynchronous timer snapshots, efficient >> timer deletes, state TTL, and broadcast state support in RocksDB. >> >> - Extends Streaming Sinks: >> - Bucketing Sink should support S3 properly (compensate for eventual >> consistency), work with Flink's shaded S3 file systems, and efficiently >> support formats that compress/index arcoss individual rows (Parquet, ORC, >> ...) >> - Support ElasticSearch's new REST API >> >> - Smoothen State Evolution to support type conversion on snapshot restore >> >> - Enhance Stream SQL and CEP >> - Add support for "update by key" Table Sources >> - Add more table sources and sinks (Kafka, Kinesis, Files, K/V stores) >> - Expand SQL client >> - Integrate CEP and SQL, through MATCH_RECOGNIZE clause >> - Improve CEP Performance of SharedBuffer on RocksDB >> >> >> >> >> > > > > -- > Cheers, > Sagar