date:20190816

Re: [External] Re: From Kafka Stream to Flink

2019-08-16 Thread Fabian Hueske

Hi Ruben, Work on this feature has already started [1], but stalled a bit (probably due to the effort of merging the new Blink query processor). Hequn (in CC) is the guy working on upsert table ingestion, maybe he can share what the status of this feature is. Best, Fabian [1] https://github.com/

Re: Customize file assignments logic in flink application

2019-08-16 Thread Zhu Zhu

Hi Lu, I think it's OK to choose any way as long as it works. Though I've no idea how you would extend SplittableIterator in your case. The underlying is ParallelIteratorInputFormat and its processing is not matched to a certain subtask index. Thanks, Zhu Zhu Lu Niu 于2019年8月16日周五上午12:48写道： >

Re: processing avro data source using DataSet API and output to parquet

2019-08-16 Thread Zhenghua Gao

Flink allows hadoop (mapreduce) OutputFormats in Flink jobs[1]. You can have a try with Parquet OutputFormat[2]. And if you can turn to DataStream API， StreamingFileSink + ParquetBulkWriter meets your requirement[3][4]. [1] https://github.com/apache/flink/blob/master/flink-connectors/flink-hadoop

Re: End of Window Marker

2019-08-16 Thread Fabian Hueske

Hi Padarn, What you describe is essentially publishing Flink's watermarks to an outside system. Flink processes time windows, by waiting for a watermark that's past the window end time. When it receives such a WM it processes and emits all ended windows and forwards the watermark. When a sink rece

Re: I'm not able to make a stream-stream Time windows JOIN in Flink SQL

2019-08-16 Thread Fabian Hueske

Hi Theo, The main problem is that the semantics of your join (Join all events that happened on the same day) are not well-supported by Flink yet. In terms of true streaming joins, Flink supports the time-windowed join (with the BETWEEN predicate) and the time-versioned table join (which does not

Re: Kafka producer failed with InvalidTxnStateException when performing commit transaction

2019-08-16 Thread Fabian Hueske

Hi Tony, I'm sorry I cannot help you with this issue, but Becket (in CC) might have an idea what went wrong here. Best, Fabian Am Mi., 14. Aug. 2019 um 07:00 Uhr schrieb Tony Wei : > Hi, > > Currently, I was trying to update our kafka cluster with larger ` > transaction.max.timeout.ms`. The > o

Re: How to implement Multi-tenancy in Flink

2019-08-16 Thread Fabian Hueske

Hi Ahmad, The ProcessFunction should not rely on new records to come (i..e, do the processsing in the onElement() method) but rather register a timer every 5 minutes and perform the processing when the timer fires in onTimer(). Essentially, you'd only collect data the data in `processElement()` an

Re: [ANNOUNCE] Andrey Zagrebin becomes a Flink committer

2019-08-16 Thread Terry Wang

Congratulations Andrey! Best, Terry Wang > 在 2019年8月15日，下午9:27，Hequn Cheng 写道： > > Congratulations Andrey! > > On Thu, Aug 15, 2019 at 3:30 PM Fabian Hueske > wrote: > Congrats Andrey! > > Am Do., 15. Aug. 2019 um 07:58 Uhr schrieb Gary Yao

Fwd: Full GC (System.gc()）rmi.dgc

2019-08-16 Thread Andrew Lin

Regular full gc because rmi.dgcTrigger one hour by default， 360msWe can change the default value by setting the fallowing parameters。Recommended to increase-Dsun.rmi.dgc.client.gcInterval=720 -Dsun.rmi.dgc.server.gcInterval=720The Sun ONE Application Server uses RMI in the Administratio

Re: Flink job parallelism

2019-08-16 Thread Biao Liu

Hi Vishwas, Regardless of the available task manager, what's your job really look like? Is it with parallelism 2 or 1? It's hard to say what happened based on your description. Could you reproduce the scenario? If the answer is yes, then could you provide more details? Like a screenshot, logs, co

Re: Understanding job flow

2019-08-16 Thread Vishwas Siravara

I did not find this to be true. Here is my code snippet. object DruidStreamJob extends Job with SinkFn { private[flink] val druidConfig = DruidConfig.current private[flink] val decryptMap = ExecutionEnv.loadDecryptionDictionary // //TODO: Add this to sbt jvm, this should be set in sbt fo

Re: processing avro data source using DataSet API and output to parquet

2019-08-16 Thread Lian Jiang

Thanks. Which api (dataset or datastream) is recommended for file handling (no window operation required)? We have similar scenario for real-time processing. May it make sense to use datastream api for both batch and real-time for uniformity? Sent from my iPhone > On Aug 16, 2019, at 00:38, Zh

Update Checkpoint and/or Savepoint Timeout of Running Job without restart?

2019-08-16 Thread Aaron Levin

Hello, Question: Is it possible to update the checkpoint and/or savepoint timeout of a running job without restarting it? If not, is this something that would be a welcomed contribution (not sure how easy this would be)? Context: sometimes we have jobs who are making progress but get into a state

Window Function that releases when downstream work is completed

2019-08-16 Thread Steven Nelson

Hello! I think I know the answer to this, but I thought I would go ahead and ask. We have a process the emits messages to our stream. These messages can include duplicates based on a certain key ( we'll call it TheKey). Our Flink job reads the messages, keys by TheKey and enters a window function

Product Evaluation

2019-08-16 Thread Ajit Saluja

Hi, We have a Client requirement to implement a Complex Event Processing tool. We would like to evaluate if Apache Flink meets our requirements. If there is any Client/Group who has implemented it, we would like to have discussion to understand the capabilities of this tool better. If you may prov

Re: Product Evaluation

2019-08-16 Thread Oytun Tez

Hi Ajit, I believe many of us use CEP – we use it extensively. If you can be more specific, we'll try to respond more directly. --- Oytun Tez *M O T A W O R D* The World's Fastest Human Translation Platform. oy...@motaword.com — www.motaword.com On Fri, Aug 16, 2019 at 5:06 PM Ajit Saluja wro

Using S3 as a sink (StreamingFileSink)

2019-08-16 Thread Swapnil Kumar

Hello, We are using Flink to process input events and aggregate and write o/p of our streaming job to S3 using StreamingFileSink but whenever we try to restore the job from a savepoint, the restoration fails with missing part files error. As per my understanding, s3 deletes those part(intermittent)

Re: Using S3 as a sink (StreamingFileSink)

2019-08-16 Thread Oytun Tez

Hi Swapnil, I am not familiar with the StreamingFileSink, however, this sounds like a checkpointing issue to me FileSink should keep its sink state, and remove from the state the files that it *really successfully* sinks (perhaps you may want to add a validation here with S3 to check file integrit

Stale watermark due to unconsumed Kafka partitions

2019-08-16 Thread Eduardo Winpenny Tejedor

Hi all, It was a bit tricky to figure out what was going wrong here, hopefully someone can add the missing piece to the puzzle. I have a Kafka source with a custom AssignerWithPeriodicWatermarks timestamp assigner. It's a copy of the AscendingTimestampExtractor with a log statement printing each

Re: Understanding job flow

2019-08-16 Thread Victor Wong

Hi Vishwas, Since `DruidStreamJob` is an “object” of scala, and the initialization of your sds client is not within any method, it will be called every time ` DruidStreamJob` is loaded (like static block in Java). Your taskmanagers are different JVM processes, and ` DruidStreamJob` needs to be

Re: End of Window Marker

2019-08-16 Thread Padarn Wilson

Hi Fabian, thanks for your input Exactly. Actually my first instinct was to see if it was possible to publish the watermarks somehow - my initial idea was to insert regular watermark messages into each partition of the stream, but exposing this seemed quite troublesome. > In that case, you could

Re: [External] Re: From Kafka Stream to Flink

Re: Customize file assignments logic in flink application

Re: processing avro data source using DataSet API and output to parquet

Re: End of Window Marker

Re: I'm not able to make a stream-stream Time windows JOIN in Flink SQL

Re: Kafka producer failed with InvalidTxnStateException when performing commit transaction

Re: How to implement Multi-tenancy in Flink

Re: [ANNOUNCE] Andrey Zagrebin becomes a Flink committer

Fwd: Full GC (System.gc()）rmi.dgc

Re: Flink job parallelism

Re: Understanding job flow

Re: processing avro data source using DataSet API and output to parquet

Update Checkpoint and/or Savepoint Timeout of Running Job without restart?

Window Function that releases when downstream work is completed

Product Evaluation

Re: Product Evaluation

Using S3 as a sink (StreamingFileSink)

Re: Using S3 as a sink (StreamingFileSink)

Stale watermark due to unconsumed Kafka partitions

Re: Understanding job flow

Re: End of Window Marker

21 matches

Site Navigation

Mail list logo

Footer information