date:20180829

Re: Kryo exception

2018-08-29 Thread Alexander Smirnov

Thanks Hequn! On Thu, 30 Aug 2018 at 04:49, Hequn Cheng wrote: > Hi Alex, > > It seems a bug. There is a discussion here > > . > Best, Hequn > > On Wed, Aug 29, 2018 at 10:19 PM Alexander Smirnov <

Re: Custom java home on YARN

2018-08-29 Thread Paul Lam

Hi tison, This information is very helpful. Thank you! Best, Paul Lam > 在 2018年8月30日，10:41，陈梓立写道： > > Hi Paul, > > For your information, `yarn.taskmanager.env.JAVA_HOME` is deprecated, you > can set env of taskmanager using the key similar to jobmanager, that is >

Re: Custom java home on YARN

2018-08-29 Thread vino yang

Glad to hear this message! Paul Lam 于2018年8月30日周四上午10:35写道： > Hi vino, > > Yes, I’m trying to set Java 8 environment for Flink job, but YARN is > running on Java 7 which I can’t change. > > I find the solution in the mail list you referred to. It turns out that I > was using a wrong key to set

Re: withFormat(Csv) is undefined for the type BatchTableEnvironment

2018-08-29 Thread vino yang

Hi francois, Maybe you can refer to the comments of this source code?[1] https://github.com/apache/flink/blob/master/flink-libraries/flink-table/src/main/scala/org/apache/flink/table/api/BatchTableEnvironment.scala#L143 Thanks, vino. françois lacombe 于2018年8月29日周三下午10:54写道： > Hi Vino, > >

Re: Custom java home on YARN

2018-08-29 Thread Paul Lam

Hi vino, Yes, I’m trying to set Java 8 environment for Flink job, but YARN is running on Java 7 which I can’t change. I find the solution in the mail list you referred to. It turns out that I was using a wrong key to set Java home for the jobmanager, and the right key should be

Re: checkpoint timeout

2018-08-29 Thread vino yang

Hi John, Setting the checkpoint timeout is through this API. The default timeout for checkpoints is 10 minutes [1], not one minute. So, I think it must be something else. You can set the log level of JM and TM to Debug, and then see more checkpoint details. If there is no way to analyze it, you

Re: Custom java home on YARN

2018-08-29 Thread vino yang

Hi Paul, This exception means that the jdk version of the execution code is lower than the compiled jdk version. 52 means that the JDK version that compiles it is 1.8. There is information that Flink 1.4 has removed support for Java 7, so your current best choice is to use JDK 1.8.[1] [1]:

Re: Custom java home on YARN

2018-08-29 Thread Paul Lam

Hi vino, Thanks for your suggestion. I believe it’s working. But I still get this exception in jobmanager.err: > java.lang.UnsupportedClassVersionError: > org/apache/flink/yarn/entrypoint/YarnSessionClusterEntrypoint : Unsupported > major.minor version 52.0 I tried setting Java home for the

Re: Kryo exception

2018-08-29 Thread Hequn Cheng

Hi Alex, It seems a bug. There is a discussion here . Best, Hequn On Wed, Aug 29, 2018 at 10:19 PM Alexander Smirnov < alexander.smirn...@gmail.com> wrote: > Hi, > > A job fell into a restart loop

Problem with querying state on Flink 1.6.

2018-08-29 Thread Joe Olson

I'm having a problem with querying state on Flink 1.6. I put a project in Github that is my best representation of the very simple client example outlined in the 'querying state' section of the 1.6 documentation at

checkpoint timeout

2018-08-29 Thread John O

I have a flink job with a big enough state that makes checkpointing long ( ~ 70 seconds). I have configured the checkpoint timeout to 180 seconds (setCheckpointTimeout(18)) But as you can see from the following logs, timeout seems to be ~60 seconds. Is there another timeout configuration I

回复：Backpressure? for Batches

2018-08-29 Thread Zhijiang(wangzhijiang999)

You can check the log to show the related stack in OOM, maybe we can confirm some reasons. Or you can dump the heap to analyze the memory usages after OOM. Best, Zhijiang -- 发件人：Darshan Singh 发送时间：2018年8月29日(星期三) 19:22

[ANNOUNCE] Weekly community update #35

2018-08-29 Thread Till Rohrmann

Dear community, this is the weekly community update thread #35. Please post any news and updates you want to share with the community to this thread. # Flink 1.7 community roadmap The Flink community started discussing which features will be included in the next upcoming major release. Join the

Re: withFormat(Csv) is undefined for the type BatchTableEnvironment

2018-08-29 Thread françois lacombe

Hi Vino, Thanks for this answer. I can't find in the docs where it's about BatchTableDescriptor https://ci.apache.org/projects/flink/flink-docs-release-1.6/dev/table/connect.html#csv-format It sounds like the withFormat method is applied on TableEnvironment object on this page. All the best

Kryo exception

2018-08-29 Thread Alexander Smirnov

Hi, A job fell into a restart loop with the following exception. Is it something known? What could cause it? Flink 1.4.2 16 Aug 2018 13:43:00,835 INFO org.apache.flink.runtime.taskmanager.Task - Source: Custom Source -> (Filter -> Timestamps/Watermarks -> Map, Filter -> Timestamps/Watermarks ->

Re: When a jobmanager fails, it doesn't restart because it tries to restart non existing tasks

2018-08-29 Thread Till Rohrmann

Thanks for the update Gerard. Fixing the resource cleanup in the case of standby Dispatchers/JobMasters has a high priority. We will hopefully fix the problem with the next bug fix release. Until then, the JobGraph entry must be removed from ZooKeeper manually. Cheers, Till On Wed, Aug 29, 2018

Re: When a jobmanager fails, it doesn't restart because it tries to restart non existing tasks

2018-08-29 Thread Gerard Garcia

Hi Till, Sorry for the late reply, I was waiting to update to Flink 1.6.0 to see if the problem got fixed but I still experience the first issue (jobgraph not deleted from zookeeper when task is canceled). The second issue (taskmanagers unable to register to the new elected jobmanager) was

Taskmanager process memory increasing always

2018-08-29 Thread YennieChen88

Hello, My case is counting the number of successful login and failures within 1 hour, 10 min, 5 min, 3 min, 1 min, 10 second and 1 second, keyBy login ip or device id. Based on previous counting results of different time dimensions, predict the complicance of the next login. After

Re: JobGraphs not cleaned up in HA mode

2018-08-29 Thread Till Rohrmann

Hi Encho, thanks for sending the first part of the logs. What I would actually be interested in are the complete logs because somewhere in the jobmanager-2 logs there must be a log statement saying that the respective dispatcher gained leadership. I would like to see why this happens but for this

Re: Data loss when restoring from savepoint

2018-08-29 Thread Andrey Zagrebin

Hi Juho, > only when the 24-hour window triggers, BucketingSink gets a burst of input This is of course totally true, my understanding is the same. We cannot exclude problem there for sure, just savepoints are used a lot w/o problem reports and BucketingSink is known to be problematic with s3.

Filescheme GS not found sometimes - inconsistent exceptions for reading from GCS

2018-08-29 Thread Encho Mishinev

Hello, I am using Flink 1.5.3 and executing jobs through Apache Beam 2.6.0. One of my jobs involves reading from Google Cloud Storage which uses the file scheme "gs://". Everything was fine but once in a while I would get an exception that the scheme is not recognised. Now I've started seeing

Re: JobGraphs not cleaned up in HA mode

2018-08-29 Thread Encho Mishinev

Hi Till, I will use the approach with a k8s deployment and HA mode with a single job manager. Nonetheless, here are the logs I just produced by repeating the aforementioned experiment, hope they help in debugging: *- Starting Jobmanager-1:* Starting Job Manager sed: cannot rename

share the big data structure between slots

2018-08-29 Thread zhen li

Hi all: When I broadcast the big config stream,every parallel instance should store the data, cause waste of the momery and time. Is there some method to make the slots to share the big data structure within the taskmanager？ Here is the doc: By adjusting the number of task slots, users

Re: Backpressure? for Batches

2018-08-29 Thread Darshan Singh

Thanks, Now back to my question again. How can I say read at less speed from hdfs than my say map or group by can consume? Is there some sort of configuration which says read only 1 rows and then stop and then reread etc. Otherwise source will keep on sending the data or keeping in some sort

Re: Data loss when restoring from savepoint

2018-08-29 Thread Juho Autio

Andrey, thank you very much for the debugging suggestions, I'll try them. In the meanwhile two more questions, please: > Just to keep in mind this problem with s3 and exclude it for sure. I would also check whether the size of missing events is around the batch size of BucketingSink or not.

Re: Backpressure? for Batches

2018-08-29 Thread Chesnay Schepler

The semantics for LAZY_FROM_SOURCE are that tasks are scheduled /when there is data to be consumed/, i.e. one the first record was emitted by the previous operator. As such back-pressure exists in batch just like in streaming. On 29.08.2018 11:39, Darshan Singh wrote: Thanks, My job is

Re: Backpressure? for Batches

2018-08-29 Thread Darshan Singh

Thanks, My job is simple. I am using table Api 1. Read from hdfs 2. Deserialize json to pojo and convert to table. 3. Group by some columns. 4. Convert back to dataset and write back to hdfs. In the WebUI I can see at least first 3 running concurrently which sort of makes sense. From your answer

Re: JobGraphs not cleaned up in HA mode

2018-08-29 Thread Till Rohrmann

Hi Encho, it sounds strange that the standby JobManager tries to recover a submitted job graph. This should only happen if it has been granted leadership. Thus, it seems as if the standby JobManager thinks that it is also the leader. Could you maybe share the logs of the two

Re: JobGraphs not cleaned up in HA mode

2018-08-29 Thread Encho Mishinev

Hello, Since two job managers don't seem to be working for me I was thinking of just using a single job manager in Kubernetes in HA mode with a deployment ensuring its restart whenever it fails. Is this approach viable? The High-Availability page mentions that you use only one job manager in an

回复：Backpressure? for Batches

2018-08-29 Thread Zhijiang(wangzhijiang999)

The backpressure is caused when downstream and upstream are running concurrently, and the downstream is slower than the upstream. In stream job, the schedule mode will schedule both sides concurrently, so the backpressure may exist. As for batch job, the default schedule mode is LAZY_FROM_SOURCE

Re: checkpoint failed due to s3 exception: request timeout

2018-08-29 Thread Tony Wei

Hi Andrey, Cool! I will add it in my flink-conf.yaml. However, I'm still wondering if anyone is familiar with this problem or has any idea to find the root cause. Thanks. Best, Tony Wei 2018-08-29 16:20 GMT+08:00 Andrey Zagrebin : > Hi, > > the current Flink 1.6.0 version uses Presto Hive s3

Re: anybody can start flink with job mode?

2018-08-29 Thread Till Rohrmann

Great to hear :-) On Tue, Aug 28, 2018, 14:59 Hao Sun wrote: > Thanks Till for the follow up, I can run my job now. > > On Tue, Aug 28, 2018, 00:57 Till Rohrmann wrote: > >> Hi Hao, >> >> Vino is right, you need to specify the -j/--job-classname option which >> specifies the job name you want

Backpressure? for Batches

2018-08-29 Thread Darshan Singh

I faced the issue with back pressure in streams. I was wondering if we could face the same with the batches as well. In theory it should be possible. But in Web UI for backpressure tab for batches I was seeing that it was just showing the tasks status and no status like "OK" etc. So I was

Re: checkpoint failed due to s3 exception: request timeout

2018-08-29 Thread Andrey Zagrebin

Hi, the current Flink 1.6.0 version uses Presto Hive s3 connector 0.185 [1], which has this option: S3_MAX_CLIENT_RETRIES = "presto.s3.max-client-retries”; If you add “s3.max-client-retries” to flink conf, flink-s3-fs-presto [2] should automatically prefix it and configure PrestoS3FileSystem

Re: JobGraphs not cleaned up in HA mode

2018-08-29 Thread Encho Mishinev

Hi, Unfortunately the thing I described does indeed happen every time. As mentioned in the first email, I am running on Kubernetes so certain things could be different compared to just a standalone cluster. Any ideas for workarounds are welcome, as this problem basically prevents me from using

Re: Why don't operations on KeyedStream return KeyedStream?

2018-08-29 Thread Fabian Hueske

Hi Elias, Your assumption is correct. An operation on a KeyedStream results in a regular DataStream because the operation might change the data type or the key field. Hence, it is not guaranteed that the same keys can be extracted from the output of the keyed operation. However, there is a way

Re: Custom java home on YARN

2018-08-29 Thread vino yang

Hi Paul, You can try: -yD yarn.taskmanager.env.JAVA_HOME=xx in the command line. Thanks, vino. Paul Lam 于2018年8月29日周三下午3:27写道： > Hi, > > I’m trying to run Flink on a YARN cluster that’s running on JDK 7, and I > think it’s a quite common scenario, but it seems that currently there’s no > way

Re: Queryable state and state TTL

2018-08-29 Thread Andrey Zagrebin

Hi, Fabian is right, support of TTL for queryable state needs an extra effort because of some specifics of its interaction with state objects, but there is no fundamental problem. It is on the roadmap for the future realises. Best, Andrey > On 29 Aug 2018, at 09:30, Fabian Hueske wrote: > >

Re: Queryable state and state TTL

2018-08-29 Thread Fabian Hueske

Hi, I guess that this is not a fundamental problem but just a limitation in the current implementation. Andrey (in CC) who implemented the TTL support should be able to give more insight on this issue. Best, Fabian Am Mi., 29. Aug. 2018 um 04:06 Uhr schrieb vino yang : > Hi Elias, > > From the

Custom java home on YARN

2018-08-29 Thread Paul Lam

Hi, I’m trying to run Flink on a YARN cluster that’s running on JDK 7, and I think it’s a quite common scenario, but it seems that currently there’s no way to pass the JAVA_HOME environment variable to YARN. Am I missing something? And should I create an issue requesting for that? Best, Paul

Re: checkpoint failed due to s3 exception: request timeout

2018-08-29 Thread vino yang

Hi Tony, Maybe you can consider looking at the doc information for this class, this class comes from flink-s3-fs-presto.[1] [1]: https://ci.apache.org/projects/flink/flink-docs-release-1.6/api/java/org/apache/hadoop/conf/Configuration.html Thanks, vino. Tony Wei 于2018年8月29日周三下午2:18写道： > Hi

Re: checkpoint failed due to s3 exception: request timeout

2018-08-29 Thread Tony Wei

Hi Vino, I thought this config is for aws s3 client, but this client is inner flink-s3-fs-presto. So, I guessed I should find a way to pass this config to this library. Best, Tony Wei 2018-08-29 14:13 GMT+08:00 vino yang : > Hi Tony, > > Sorry, I just saw the timeout, I thought they were

Re: checkpoint failed due to s3 exception: request timeout

2018-08-29 Thread vino yang

Hi Tony, Sorry, I just saw the timeout, I thought they were similar because they both happened on aws s3. Regarding this setting, isn't "s3.max-client-retries: xxx" set for the client? Thanks, vino. Tony Wei 于2018年8月29日周三下午1:17写道： > Hi Vino, > > Thanks for your quick reply, but I think these

43 matches

Mail list logo