A litte doubt about the usage of checkpointLock

2018-08-14 Thread aitozi
Hi, community I see the checkpointLock is used in StreamTask to ensure that we don't have concurrent method calls that void consistent checkpoints. As i known, it is used in the data consume , state interactive and the timerService, But I am doubt that, if an application don't enable the checkpoi

Re: Managed Keyed state update

2018-08-14 Thread Fabian Hueske
Hi, It is recommended to always call update(). State modifications by modifying objects is only possible because the heap based backends do not serialize or copy records to avoid additional costs. Hence, this is rather a side effect than a provided API. As soon as you change the state backend, st

Re: Limit on number of files to read for Dataset

2018-08-14 Thread Fabian Hueske
Hi, Flink InputFormats generate their InputSplits sequentially on the JobManager. These splits are stored in the heap of the JM process and handed out to SourceTasks when they request them lazily. Split assignment is done by a InputSplitAssigner, that can be customized. FileInputFormats typically

Re: flink telemetry/metrics

2018-08-14 Thread Chesnay Schepler
How often is the warning logged? The default reporting interval is 10 seconds, if a report is interrupted it can take a while for metrics to show up. Could this also be caused by the MAX_CREATES_PER_MINUTE setting in carbon.conf being set too low? On 13.08.2018 21:31, John O wrote: I have

Re: Flink REST api for cancel with savepoint on yarn

2018-08-14 Thread Gary Yao
Hi Vipul, We are aware of YARN-2031. There are some ideas how to workaround it, which are tracked here: https://issues.apache.org/jira/browse/FLINK-9478 At the moment you have the following options: 1. Find out the master's address from ZooKeeper [1] and issue the HTTP request against t

Re: Standalone cluster instability

2018-08-14 Thread Piotr Nowojski
Hi, Good that we are more or less on track with this problem :) But the problem here is not that heap size is too small, bot that your kernel is running out of memory and starts killing processes. Either: 1. some other process is using the available memory 2. Increase memory allocation on your

Re: A litte doubt about the usage of checkpointLock

2018-08-14 Thread Andrey Zagrebin
Hi, there are at least 3 main players which use the lock to sync state access between each other: thread processing records in user code, checkpointing thread and processing timers callback thread. I would still recommend to follow the contract and use the lock where required (e.g. custom sour

how to assign issue to someone

2018-08-14 Thread Guibo Pan
Hello, I am a new user for flink jira. I reported an issue and would like to fix it, however I found I could not assign it to myself, or anyone. Is there anyone to tell me how to do this? Thanks.

Re: how to assign issue to someone

2018-08-14 Thread Fabian Hueske
Hi, I've given you Contributor permissions for Jira and assigned the issue to you. You can now also assign other issue to you. Looking forward to your contribution. Best, Fabian 2018-08-14 19:45 GMT+02:00 Guibo Pan : > Hello, I am a new user for flink jira. I reported an issue and would like >

Re: Limit on number of files to read for Dataset

2018-08-14 Thread Darshan Singh
Thanks all for your responses. I am now much more clearer on this. Thanks On Tue, Aug 14, 2018 at 9:46 AM, Fabian Hueske wrote: > Hi, > > Flink InputFormats generate their InputSplits sequentially on the > JobManager. > These splits are stored in the heap of the JM process and handed out to > S

Re: Limit on number of files to read for Dataset

2018-08-14 Thread Darshan Singh
Thanks for the details. I got it working. I have around 1 directory for each month and I am running for 12-15 month data.So I created a dataset from each month and did a union. However, when I run I get the HTTP timeout issue. I am reading more than 120K files in total in all of months. I am usin

Re: 1.5.1

2018-08-14 Thread Gary Yao
Hi Juho, It seems in your case the JobMaster did not receive a heartbeat from the TaskManager in time [1]. Heartbeat requests and answers are sent over the RPC framework, and RPCs of one component (e.g., TaskManager, JobMaster, etc.) are dispatched by a single thread. Therefore, the reasons for he

ClassCastException when writing to a Kafka stream with Flink + Python

2018-08-14 Thread Joe Malt
Hi, I'm trying to write to a Kafka stream in a Flink job using the new Python streaming API. My program looks like this: def main(factory): props = Properties() props.setProperty("bootstrap.servers",configs['kafkaBroker']) consumer = FlinkKafkaConsumer010([configs['kafkaReadTopic']

watermark does not progress

2018-08-14 Thread John O
I am noticing that watermark does not progress as expected when running locally in IDE. It just stays at Long.MIN I am using EventTime processing and have tried both these time extractors. * assignAscendingTimestamps ... * assignTimestampsAndWatermarks(BoundedOutOfOrdernessTime

Re: ClassCastException when writing to a Kafka stream with Flink + Python

2018-08-14 Thread vino yang
Hi Joe, ping Chesnay for you, please wait for the reply. Thanks, vino. Joe Malt 于2018年8月15日周三 上午7:16写道: > Hi, > > I'm trying to write to a Kafka stream in a Flink job using the new Python > streaming API. > > My program looks like this: > > def main(factory): > > props = Properties() >

Re: watermark does not progress

2018-08-14 Thread vino yang
Hi Johe, In local mode, it should also work. When you debug, you can set a breakpoint in the getCurrentWatermark method to see if you can enter the method and if the behavior is what you expect. What is your source? If you post your code, it might be easier to locate. In addition, for positioning

How to submit flink job on yarn by java code

2018-08-14 Thread spoon_lz
My project is to automatically generate flink's code jar and then submit it to yarn cluster for execution and get the ApplicationId. I find that after execution, an error will be reported Then I searched for the error on Google and found that the reason for the error was that I did not introduce

docker, error NoResourceAvailableException..

2018-08-14 Thread shyla deshpande
Hello all, Trying to use docker as a single node flink cluster. docker run --name flink_local -p 8081:8081 -t flink local I submited a job to the cluster using the Web UI. The job failed. I see this error message in the docker logs. org.apache.flink.runtime.jobmanager.scheduler.NoResourceAvaila

CoFlatMapFunction with more than two input streams

2018-08-14 Thread Averell
Hi, I have stream_A of type "Dog", which needs to be transformed using data from stream_C of type "Name_Mapping". As stream_C is a slow one (data is not being updated frequently), to do the transformation I connect two streams, do a keyBy, and then use a RichCoFlatMapFunction in which mapping data

Re: CoFlatMapFunction with more than two input streams

2018-08-14 Thread vino yang
Hi Averell, As far as these two solutions are concerned, I think you can only choose option 2, because as you have stated, the current Flink DataStream API does not support the replacement of one of the input stream types of CoFlatMapFunction. Another choice: 1. Split it into two separate jobs. B

Re: ClassCastException when writing to a Kafka stream with Flink + Python

2018-08-14 Thread Chesnay Schepler
As seen in the stacktrace every sink added via StreamExEnv#add_source is wrapped in a PythonSinkFunction which internally converts things to PyObjects, that's why the mapper had no effect. Currently we don't differentiate between java/python sinks, contrary to sources where we have an explicit

Re: CoFlatMapFunction with more than two input streams

2018-08-14 Thread Xingcan Cui
Hi Averell, I am also in favor of option 2. Besides, you could use CoProcessFunction instead of CoFlatMapFunction and try to wrap elements of stream_A and stream_B using the `Either` class. Best, Xingcan > On Aug 15, 2018, at 2:24 PM, vino yang wrote: > > Hi Averell, > > As far as these two