Re: Kryo exception

2018-08-29 Thread Alexander Smirnov
Thanks Hequn! On Thu, 30 Aug 2018 at 04:49, Hequn Cheng wrote: > Hi Alex, > > It seems a bug. There is a discussion here > > . > Best, Hequn > > On Wed, Aug 29, 2018 at 10:19 PM Alexander Smirnov <

Re: Custom java home on YARN

2018-08-29 Thread Paul Lam
Hi tison, This information is very helpful. Thank you! Best, Paul Lam > 在 2018年8月30日,10:41,陈梓立 写道: > > Hi Paul, > > For your information, `yarn.taskmanager.env.JAVA_HOME` is deprecated, you > can set env of taskmanager using the key similar to jobmanager, that is >

Re: Custom java home on YARN

2018-08-29 Thread vino yang
Glad to hear this message! Paul Lam 于2018年8月30日周四 上午10:35写道: > Hi vino, > > Yes, I’m trying to set Java 8 environment for Flink job, but YARN is > running on Java 7 which I can’t change. > > I find the solution in the mail list you referred to. It turns out that I > was using a wrong key to set

Re: withFormat(Csv) is undefined for the type BatchTableEnvironment

2018-08-29 Thread vino yang
Hi francois, Maybe you can refer to the comments of this source code?[1] https://github.com/apache/flink/blob/master/flink-libraries/flink-table/src/main/scala/org/apache/flink/table/api/BatchTableEnvironment.scala#L143 Thanks, vino. françois lacombe 于2018年8月29日周三 下午10:54写道: > Hi Vino, > >

Re: Custom java home on YARN

2018-08-29 Thread Paul Lam
Hi vino, Yes, I’m trying to set Java 8 environment for Flink job, but YARN is running on Java 7 which I can’t change. I find the solution in the mail list you referred to. It turns out that I was using a wrong key to set Java home for the jobmanager, and the right key should be

Re: checkpoint timeout

2018-08-29 Thread vino yang
Hi John, Setting the checkpoint timeout is through this API. The default timeout for checkpoints is 10 minutes [1], not one minute. So, I think it must be something else. You can set the log level of JM and TM to Debug, and then see more checkpoint details. If there is no way to analyze it, you

Re: Custom java home on YARN

2018-08-29 Thread vino yang
Hi Paul, This exception means that the jdk version of the execution code is lower than the compiled jdk version. 52 means that the JDK version that compiles it is 1.8. There is information that Flink 1.4 has removed support for Java 7, so your current best choice is to use JDK 1.8.[1] [1]:

Re: Custom java home on YARN

2018-08-29 Thread Paul Lam
Hi vino, Thanks for your suggestion. I believe it’s working. But I still get this exception in jobmanager.err: > java.lang.UnsupportedClassVersionError: > org/apache/flink/yarn/entrypoint/YarnSessionClusterEntrypoint : Unsupported > major.minor version 52.0 I tried setting Java home for the

Re: Kryo exception

2018-08-29 Thread Hequn Cheng
Hi Alex, It seems a bug. There is a discussion here . Best, Hequn On Wed, Aug 29, 2018 at 10:19 PM Alexander Smirnov < alexander.smirn...@gmail.com> wrote: > Hi, > > A job fell into a restart loop

Problem with querying state on Flink 1.6.

2018-08-29 Thread Joe Olson
I'm having a problem with querying state on Flink 1.6. I put a project in Github that is my best representation of the very simple client example outlined in the 'querying state' section of the 1.6 documentation at

checkpoint timeout

2018-08-29 Thread John O
I have a flink job with a big enough state that makes checkpointing long ( ~ 70 seconds). I have configured the checkpoint timeout to 180 seconds (setCheckpointTimeout(18)) But as you can see from the following logs, timeout seems to be ~60 seconds. Is there another timeout configuration I

回复:Backpressure? for Batches

2018-08-29 Thread Zhijiang(wangzhijiang999)
You can check the log to show the related stack in OOM, maybe we can confirm some reasons. Or you can dump the heap to analyze the memory usages after OOM. Best, Zhijiang -- 发件人:Darshan Singh 发送时间:2018年8月29日(星期三) 19:22

[ANNOUNCE] Weekly community update #35

2018-08-29 Thread Till Rohrmann
Dear community, this is the weekly community update thread #35. Please post any news and updates you want to share with the community to this thread. # Flink 1.7 community roadmap The Flink community started discussing which features will be included in the next upcoming major release. Join the

Re: withFormat(Csv) is undefined for the type BatchTableEnvironment

2018-08-29 Thread françois lacombe
Hi Vino, Thanks for this answer. I can't find in the docs where it's about BatchTableDescriptor https://ci.apache.org/projects/flink/flink-docs-release-1.6/dev/table/connect.html#csv-format It sounds like the withFormat method is applied on TableEnvironment object on this page. All the best

Kryo exception

2018-08-29 Thread Alexander Smirnov
Hi, A job fell into a restart loop with the following exception. Is it something known? What could cause it? Flink 1.4.2 16 Aug 2018 13:43:00,835 INFO org.apache.flink.runtime.taskmanager.Task - Source: Custom Source -> (Filter -> Timestamps/Watermarks -> Map, Filter -> Timestamps/Watermarks ->

Re: When a jobmanager fails, it doesn't restart because it tries to restart non existing tasks

2018-08-29 Thread Till Rohrmann
Thanks for the update Gerard. Fixing the resource cleanup in the case of standby Dispatchers/JobMasters has a high priority. We will hopefully fix the problem with the next bug fix release. Until then, the JobGraph entry must be removed from ZooKeeper manually. Cheers, Till On Wed, Aug 29, 2018

Re: When a jobmanager fails, it doesn't restart because it tries to restart non existing tasks

2018-08-29 Thread Gerard Garcia
Hi Till, Sorry for the late reply, I was waiting to update to Flink 1.6.0 to see if the problem got fixed but I still experience the first issue (jobgraph not deleted from zookeeper when task is canceled). The second issue (taskmanagers unable to register to the new elected jobmanager) was

Taskmanager process memory increasing always

2018-08-29 Thread YennieChen88
Hello, My case is counting the number of successful login and failures within 1 hour, 10 min, 5 min, 3 min, 1 min, 10 second and 1 second, keyBy login ip or device id. Based on previous counting results of different time dimensions, predict the complicance of the next login. After

Re: JobGraphs not cleaned up in HA mode

2018-08-29 Thread Till Rohrmann
Hi Encho, thanks for sending the first part of the logs. What I would actually be interested in are the complete logs because somewhere in the jobmanager-2 logs there must be a log statement saying that the respective dispatcher gained leadership. I would like to see why this happens but for this

Re: Data loss when restoring from savepoint

2018-08-29 Thread Andrey Zagrebin
Hi Juho, > only when the 24-hour window triggers, BucketingSink gets a burst of input This is of course totally true, my understanding is the same. We cannot exclude problem there for sure, just savepoints are used a lot w/o problem reports and BucketingSink is known to be problematic with s3.

Filescheme GS not found sometimes - inconsistent exceptions for reading from GCS

2018-08-29 Thread Encho Mishinev
Hello, I am using Flink 1.5.3 and executing jobs through Apache Beam 2.6.0. One of my jobs involves reading from Google Cloud Storage which uses the file scheme "gs://". Everything was fine but once in a while I would get an exception that the scheme is not recognised. Now I've started seeing

Re: JobGraphs not cleaned up in HA mode

2018-08-29 Thread Encho Mishinev
Hi Till, I will use the approach with a k8s deployment and HA mode with a single job manager. Nonetheless, here are the logs I just produced by repeating the aforementioned experiment, hope they help in debugging: *- Starting Jobmanager-1:* Starting Job Manager sed: cannot rename

share the big data structure between slots

2018-08-29 Thread zhen li
Hi all: When I broadcast the big config stream,every parallel instance should store the data, cause waste of the momery and time. Is there some method to make the slots to share the big data structure within the taskmanager? Here is the doc: By adjusting the number of task slots, users

Re: Backpressure? for Batches

2018-08-29 Thread Darshan Singh
Thanks, Now back to my question again. How can I say read at less speed from hdfs than my say map or group by can consume? Is there some sort of configuration which says read only 1 rows and then stop and then reread etc. Otherwise source will keep on sending the data or keeping in some sort

Re: Data loss when restoring from savepoint

2018-08-29 Thread Juho Autio
Andrey, thank you very much for the debugging suggestions, I'll try them. In the meanwhile two more questions, please: > Just to keep in mind this problem with s3 and exclude it for sure. I would also check whether the size of missing events is around the batch size of BucketingSink or not.

Re: Backpressure? for Batches

2018-08-29 Thread Chesnay Schepler
The semantics for LAZY_FROM_SOURCE are that tasks are scheduled /when there is data to be consumed/, i.e. one the first record was emitted by the previous operator. As such back-pressure exists in batch just like in streaming. On 29.08.2018 11:39, Darshan Singh wrote: Thanks, My job is

Re: Backpressure? for Batches

2018-08-29 Thread Darshan Singh
Thanks, My job is simple. I am using table Api 1. Read from hdfs 2. Deserialize json to pojo and convert to table. 3. Group by some columns. 4. Convert back to dataset and write back to hdfs. In the WebUI I can see at least first 3 running concurrently which sort of makes sense. From your answer

Re: JobGraphs not cleaned up in HA mode

2018-08-29 Thread Till Rohrmann
Hi Encho, it sounds strange that the standby JobManager tries to recover a submitted job graph. This should only happen if it has been granted leadership. Thus, it seems as if the standby JobManager thinks that it is also the leader. Could you maybe share the logs of the two

Re: JobGraphs not cleaned up in HA mode

2018-08-29 Thread Encho Mishinev
Hello, Since two job managers don't seem to be working for me I was thinking of just using a single job manager in Kubernetes in HA mode with a deployment ensuring its restart whenever it fails. Is this approach viable? The High-Availability page mentions that you use only one job manager in an

回复:Backpressure? for Batches

2018-08-29 Thread Zhijiang(wangzhijiang999)
The backpressure is caused when downstream and upstream are running concurrently, and the downstream is slower than the upstream. In stream job, the schedule mode will schedule both sides concurrently, so the backpressure may exist. As for batch job, the default schedule mode is LAZY_FROM_SOURCE

Re: checkpoint failed due to s3 exception: request timeout

2018-08-29 Thread Tony Wei
Hi Andrey, Cool! I will add it in my flink-conf.yaml. However, I'm still wondering if anyone is familiar with this problem or has any idea to find the root cause. Thanks. Best, Tony Wei 2018-08-29 16:20 GMT+08:00 Andrey Zagrebin : > Hi, > > the current Flink 1.6.0 version uses Presto Hive s3

Re: anybody can start flink with job mode?

2018-08-29 Thread Till Rohrmann
Great to hear :-) On Tue, Aug 28, 2018, 14:59 Hao Sun wrote: > Thanks Till for the follow up, I can run my job now. > > On Tue, Aug 28, 2018, 00:57 Till Rohrmann wrote: > >> Hi Hao, >> >> Vino is right, you need to specify the -j/--job-classname option which >> specifies the job name you want

Backpressure? for Batches

2018-08-29 Thread Darshan Singh
I faced the issue with back pressure in streams. I was wondering if we could face the same with the batches as well. In theory it should be possible. But in Web UI for backpressure tab for batches I was seeing that it was just showing the tasks status and no status like "OK" etc. So I was

Re: checkpoint failed due to s3 exception: request timeout

2018-08-29 Thread Andrey Zagrebin
Hi, the current Flink 1.6.0 version uses Presto Hive s3 connector 0.185 [1], which has this option: S3_MAX_CLIENT_RETRIES = "presto.s3.max-client-retries”; If you add “s3.max-client-retries” to flink conf, flink-s3-fs-presto [2] should automatically prefix it and configure PrestoS3FileSystem

Re: JobGraphs not cleaned up in HA mode

2018-08-29 Thread Encho Mishinev
Hi, Unfortunately the thing I described does indeed happen every time. As mentioned in the first email, I am running on Kubernetes so certain things could be different compared to just a standalone cluster. Any ideas for workarounds are welcome, as this problem basically prevents me from using

Re: Why don't operations on KeyedStream return KeyedStream?

2018-08-29 Thread Fabian Hueske
Hi Elias, Your assumption is correct. An operation on a KeyedStream results in a regular DataStream because the operation might change the data type or the key field. Hence, it is not guaranteed that the same keys can be extracted from the output of the keyed operation. However, there is a way

Re: Custom java home on YARN

2018-08-29 Thread vino yang
Hi Paul, You can try: -yD yarn.taskmanager.env.JAVA_HOME=xx in the command line. Thanks, vino. Paul Lam 于2018年8月29日周三 下午3:27写道: > Hi, > > I’m trying to run Flink on a YARN cluster that’s running on JDK 7, and I > think it’s a quite common scenario, but it seems that currently there’s no > way

Re: Queryable state and state TTL

2018-08-29 Thread Andrey Zagrebin
Hi, Fabian is right, support of TTL for queryable state needs an extra effort because of some specifics of its interaction with state objects, but there is no fundamental problem. It is on the roadmap for the future realises. Best, Andrey > On 29 Aug 2018, at 09:30, Fabian Hueske wrote: > >

Re: Queryable state and state TTL

2018-08-29 Thread Fabian Hueske
Hi, I guess that this is not a fundamental problem but just a limitation in the current implementation. Andrey (in CC) who implemented the TTL support should be able to give more insight on this issue. Best, Fabian Am Mi., 29. Aug. 2018 um 04:06 Uhr schrieb vino yang : > Hi Elias, > > From the

Custom java home on YARN

2018-08-29 Thread Paul Lam
Hi, I’m trying to run Flink on a YARN cluster that’s running on JDK 7, and I think it’s a quite common scenario, but it seems that currently there’s no way to pass the JAVA_HOME environment variable to YARN. Am I missing something? And should I create an issue requesting for that? Best, Paul

Re: checkpoint failed due to s3 exception: request timeout

2018-08-29 Thread vino yang
Hi Tony, Maybe you can consider looking at the doc information for this class, this class comes from flink-s3-fs-presto.[1] [1]: https://ci.apache.org/projects/flink/flink-docs-release-1.6/api/java/org/apache/hadoop/conf/Configuration.html Thanks, vino. Tony Wei 于2018年8月29日周三 下午2:18写道: > Hi

Re: checkpoint failed due to s3 exception: request timeout

2018-08-29 Thread Tony Wei
Hi Vino, I thought this config is for aws s3 client, but this client is inner flink-s3-fs-presto. So, I guessed I should find a way to pass this config to this library. Best, Tony Wei 2018-08-29 14:13 GMT+08:00 vino yang : > Hi Tony, > > Sorry, I just saw the timeout, I thought they were

Re: checkpoint failed due to s3 exception: request timeout

2018-08-29 Thread vino yang
Hi Tony, Sorry, I just saw the timeout, I thought they were similar because they both happened on aws s3. Regarding this setting, isn't "s3.max-client-retries: xxx" set for the client? Thanks, vino. Tony Wei 于2018年8月29日周三 下午1:17写道: > Hi Vino, > > Thanks for your quick reply, but I think these