date:20210319

Re: PyFlink java.io.EOFException at java.io.DataInputStream.readInt

2021-03-19 Thread Yik San Chan

Hi Dian, I simplify the question in https://stackoverflow.com/questions/66687797/pyflink-java-io-eofexception-at-java-io-datainputstream-readfully. You can also find the updated question below: I have a PyFlink job that reads from a file, filter based on a condition, and print. This is a `tree` v

Re: PyFlink java.io.EOFException at java.io.DataInputStream.readInt

2021-03-19 Thread Yik San Chan

I got why regarding the simplified question - the dummy parser should return key(string)-value(string), otherwise it fails the result_type spec On Fri, Mar 19, 2021 at 3:37 PM Yik San Chan wrote: > Hi Dian, > > I simplify the question in > https://stackoverflow.com/questions/66687797/pyflink-jav

Re: The Role of TimerService in ProcessFunction

2021-03-19 Thread Dawid Wysakowicz

Hi Chirag, I agree it might be a little bit confusing. Let me try to explain the reasoning. To do that I'll first try to rephrase the reasoning from FLINK-8560 for introducing the KeyedProcessFunction. It was introduced so that users have a typed access to the current key via Context and OnTimerC

Re: Flink minimum resource recommendation on k8s cluster

2021-03-19 Thread Dawid Wysakowicz

I'd say no. It depends on your job. You can refer to a very good presentation from Robert on how to calculate resource requirements[1]. [1] https://www.youtube.com/watch?v=8l8dCKMMWkw On 18/03/2021 11:37, Amit Bhatia wrote: > Hi, > > Is there any minimum resource ( CPU & Memory) recommendation to

Re: Parameter to config read frequency in Kafka SQL connector

2021-03-19 Thread Dawid Wysakowicz

Hi, Unfortunately I have no experience with this. You can pass any Kafka client parameters through the properties.* option[1] and see if the setting works for you. Best, Dawid [1] https://ci.apache.org/projects/flink/flink-docs-release-1.12/dev/table/connectors/kafka.html#properties On 18/03/2

Re: Understanding Max Parallelism

2021-03-19 Thread Dawid Wysakowicz

Hi Aeden, The maxParallelism option defines the number of key groups that will be created within the keyed state and thus define the maximum parallelism that a Flink keyed job can scale up to as each key group must be atomically assigned to a single task. You can read more on how the rescaling wor

Re: Eliminating Shuffling Under FlinkSQL

2021-03-19 Thread Dawid Wysakowicz

Your understanding of a group by is correct. It is equivalent to a key by. I agree it would be a great feature to keep the Source's partitioning but unfortunately as of now it is not yet supported. Best, Dawid On 18/03/2021 18:28, Aeden Jameson wrote: > It's my understanding that a group by is a

Re: Flink History server ( running jobs )

2021-03-19 Thread Matthias Pohl

Hi Vishal, yes, as the documentation explains [1]: Only jobs that reached a globally terminal state are archived into Flink's history server. State information about running jobs can be retrieved through Flink's REST API. Best, Matthias [1] https://ci.apache.org/projects/flink/flink-docs-release-

Re: PyFlink java.io.EOFException at java.io.DataInputStream.readInt

2021-03-19 Thread Xingbo Huang

Yes, you need to ensure that the key and value types of the Map are determined Best, Xingbo Yik San Chan 于2021年3月19日周五下午3:41写道： > I got why regarding the simplified question - the dummy parser should > return key(string)-value(string), otherwise it fails the result_type spec > > On Fri, Mar 19

Re: inputFloatingBuffersUsage=0?

2021-03-19 Thread Piotr Nowojski

Hi Alexey, Have you looked at the documentation [1]? > inPoolUsage An estimate of the input buffers usage. (ignores LocalInputChannels) Gauge > inputFloatingBuffersUsage An estimate of the floating input buffers usage. (ignores LocalInputChannels) Gauge > inputExclusiveBuffersUsage An estimate of

Re: Flink SQL : Interval Outer/Left Join not working as expected

2021-03-19 Thread Benchao Li

Hi Aneesha, For the interval join operator will output the data with NULL when it confirms that there will no data coming before the watermark. And there is an optimization for reducing state access, which may add more time to trigger the output of these data. For your case, it's almost 30 + 30 =

Re: inputFloatingBuffersUsage=0?

2021-03-19 Thread Piotr Nowojski

Hi, clarification about the 2nd part. Required memory is one single exclusive buffer per channel, so if you are running low on memory, floating buffers are one of the first to go, hence you could observe some `0` for floating buffers. For analysing backpressure I would recommend to use a differen

Re: PyFlink java.io.EOFException at java.io.DataInputStream.readInt

2021-03-19 Thread Dian Fu

Good finding! I think we should handle this case more friendly as I guess this issue should be very common for Python users since Python is dynamic language. I have created https://issues.apache.org/jira/browse/FLINK-21876 to follow up with t

Re: Flink History server ( running jobs )

2021-03-19 Thread Vishal Santoshi

Thank you for the confirmation. On Fri, Mar 19, 2021 at 5:37 AM Matthias Pohl wrote: > Hi Vishal, > yes, as the documentation explains [1]: Only jobs that reached a globally > terminal state are archived into Flink's history server. State information > about running jobs can be retrieved through

Re: Saved State in FSstate Backend

2021-03-19 Thread Abdullah bin Omar

Hi, I use the checkpoint configuration 1 ms. env.getCheckpointConfig().setCheckpointTimeout(1); used FSstatebackend to save the state env.setStateBackend(*new* FsStateBackend( "file:///Users/username/Documents/checkpoint-save-java/checkpointing.txt", *false*)); //here I tried both t

Re: Python API + Unit Testing

2021-03-19 Thread Kevin Lam

Hi Dian Fu, I meant testing in application development. When I'm developing a Pyflink Pipeline, are there any recommended approaches to testing the Flink application? For instance, how should we test applications end-to-end? Individual operators? I'm interested in the Datastream API. One approach

Re: Flink SQL : Interval Outer/Left Join not working as expected

2021-03-19 Thread Aneesha Kaushal

Hello Benchao, Thanks for clarifying! The issue was I send very few records so I could not test how the watermark is progressing. Today after trying on continuous stream I was able to get the results. On Fri, Mar 19, 2021 at 5:24 PM Benchao Li wrote: > Hi Aneesha, > > For the interval join ope

OOM issues with Python Objects

2021-03-19 Thread Kevin Lam

Hi all, I've encountered an interesting issue where I observe an OOM issue in my Flink Application when I use a DataStream of Python Objects, but when I make that Python Object a Subclass of pyflink.common.types.Row and provide TypeInformation, there is no issue. For the OOM scenario, no elements

RocksDB StateBuilder unexpected exception

2021-03-19 Thread dhanesh arole

Hello Hivemind, We are running a stateful streaming job. Each task manager instance hosts around ~100GB of data. During restart of task managers we encountered following errors, because of which the job is not able to restart. Initially we thought it might be due to failing status checks of attach

Re: PyFlink java.io.EOFException at java.io.DataInputStream.readInt

2021-03-19 Thread Yik San Chan

Hi Dian, Thank you for your help! Best, Yik San On Fri, Mar 19, 2021 at 9:33 PM Dian Fu wrote: > Good finding! > > I think we should handle this case more friendly as I guess this issue > should be very common for Python users since Python is dynamic language. I > have created https://issues.a

Re: inputFloatingBuffersUsage=0?

2021-03-19 Thread Alexey Trenikhun

Hi Piotrek, Thank for information, looks like isBackPressured ( and in future backPressuredTimeMsPerSecond) is more useful for our simple monitoring purposes. Looking forward for updated blog post Thanks, Alexey From: Piotr Nowojski Sent: Friday, March 19, 202

Flink on Minikube

2021-03-19 Thread Sandeep khanzode

Hello, I have a fat JAR compiled using the Man Shade plugin and everything works correctly when I deploy it on a standalone local cluster i.e. one job and one task manager node. But I installed Minikube and the same JAR file packaged into a docker image fails with weird serialization errors:

Capturing Statement Execution / Results within JdbcSink

2021-03-19 Thread Rion Williams

Hey all, I've been working with JdbcSink and it's really made my life much easier, but I had two questions about it that folks might be able to answer or provide some clarity around. *Accessing Statement Execution / Results* Is there any mechanism in place (or out of the box) to support reading

Re: Flink application has slightly data loss using Processing Time

2021-03-19 Thread Rainie Li

Hi Arvid, After increasing producer.kafka.request.timeout.ms from 9 to 12. The job has been running fine for almost 5 days, but one of the tasks failed again recently for the same timeout error. (attached stack trace below) Should I keep increasing producer.kafka.request.timeout.ms value?

Histogram

2021-03-19 Thread Alexey Trenikhun

Hello, Is any way to expose from Flink Histogram metric, not Summary ? Thanks, Alexey

Re: Flink SQL : Interval Outer/Left Join not working as expected

2021-03-19 Thread Benchao Li

Glad to hear that you resolved the issue. Aneesha Kaushal 于2021年3月19日周五下午10:49写道： > Hello Benchao, > > Thanks for clarifying! > The issue was I send very few records so I could not test how the > watermark is progressing. Today after trying on continuous stream I was > able to get the results.

Re: PyFlink java.io.EOFException at java.io.DataInputStream.readInt

Re: PyFlink java.io.EOFException at java.io.DataInputStream.readInt

Re: The Role of TimerService in ProcessFunction

Re: Flink minimum resource recommendation on k8s cluster

Re: Parameter to config read frequency in Kafka SQL connector

Re: Understanding Max Parallelism

Re: Eliminating Shuffling Under FlinkSQL

Re: Flink History server ( running jobs )

Re: PyFlink java.io.EOFException at java.io.DataInputStream.readInt

Re: inputFloatingBuffersUsage=0?

Re: Flink SQL : Interval Outer/Left Join not working as expected

Re: inputFloatingBuffersUsage=0?

Re: PyFlink java.io.EOFException at java.io.DataInputStream.readInt

Re: Flink History server ( running jobs )

Re: Saved State in FSstate Backend

Re: Python API + Unit Testing

Re: Flink SQL : Interval Outer/Left Join not working as expected

OOM issues with Python Objects

RocksDB StateBuilder unexpected exception

Re: PyFlink java.io.EOFException at java.io.DataInputStream.readInt

Re: inputFloatingBuffersUsage=0?

Flink on Minikube

Capturing Statement Execution / Results within JdbcSink

Re: Flink application has slightly data loss using Processing Time

Histogram

Re: Flink SQL : Interval Outer/Left Join not working as expected

26 matches

Site Navigation

Mail list logo

Footer information