RocksDB checkpointing dir per TM

2018-10-25 Thread Taher Koitawala
Hi All, Our current cluster configuration uses one HDD which is mainly for root and an other NVME disk per node, [1]we want make sure all TMs write their own RocksDB files to the NVME disk only, how do we do that? [2] Is it also possible to specify multiple directories per TMs so that we

Re: Queryable state when key is UUID - getting Kyro Exception

2018-10-25 Thread Jayant Ameta
MapStateDescriptor descriptor = new MapStateDescriptor<>("rulePatterns", UUID.class, String.class); Jayant Ameta On Fri, Oct 26, 2018 at 8:19 AM bupt_ljy wrote: > Hi, > >Can you show us the descriptor in the codes below? > > client.getKvState(JobID.fromHexString( > "c7b8af14b8afacf

Re: Queryable state when key is UUID - getting Kyro Exception

2018-10-25 Thread bupt_ljy
Hi, Can you show us the descriptor in the codes below? client.getKvState(JobID.fromHexString("c7b8af14b8afacf4fac16cdd0da7e997"), "rule", UUID.fromString("3b3f17a0-d81a-11e8-bb91-7fd1412de84d"), TypeInformation.of(new TypeHintUUID() {}), descriptor); Jiayi Liao, Best Original Message Sen

Re: How to tune Hadoop version in flink shaded jar to Hadoop version actually used?

2018-10-25 Thread vino yang
Hi Henry, When running flink on YARN, from ClusterEntrypoint the system environment info is print out. One of the info is "Hadoop version: 2.4.1”, I think it is from the flink-shaded-hadoop2 jar. But actually the system Hadoop version is 2.7.2. I want to know is it OK if the version is different?

Re: Parallelize an incoming stream into 5 streams with the same data

2018-10-25 Thread Hequn Cheng
Hi Vijay, Option 1 is the right answer. `keyBy1` and `keyBy2` contain all data in `inputStream`. While option 2 replicate all data to each task and option 3 split data into smaller groups without duplication. Best, Hequn On Fri, Oct 26, 2018 at 7:01 AM Vijay Balakrishnan wrote: > Hi, > I need

Re: Accumulating a batch

2018-10-25 Thread Hequn Cheng
Hi Austin, You can use GroupBy Window[1], such as TUMBLE Window. The size of the window either as time or row-count interval. You can also define your own User-Defined Aggregate Functions[2] to be used in window. Best, Hequn [1] https://ci.apache.org/projects/flink/flink-docs-master/dev/table/ta

How to tune Hadoop version in flink shaded jar to Hadoop version actually used?

2018-10-25 Thread 徐涛
Hi Experts When running flink on YARN, from ClusterEntrypoint the system environment info is print out. One of the info is "Hadoop version: 2.4.1”, I think it is from the flink-shaded-hadoop2 jar. But actually the system Hadoop version is 2.7.2. I want to know is it OK if

Re: RocksDB State Backend Exception

2018-10-25 Thread Ning Shi
Hi Andrey, Thank you for the explanation. I think you are right. It is either kStaleFile or kNoSpace. We found the cause of the issue, even though we still don't know how to explain it. We set the java.io.tmpdir to an EBS-backed drive instead of the default and the exception started happening. Th

Parallelize an incoming stream into 5 streams with the same data

2018-10-25 Thread Vijay Balakrishnan
Hi, I need to broadcast/parallelize an incoming stream(inputStream) into 5 streams with the same data. Each stream is keyed by different keys to do various grouping operations on the set. Do I just use inputStream.keyBy(5 diff keys) and then just use the DataStream to perform windowing/grouping op

Link on Azure HDInsight cluster with WASB storage

2018-10-25 Thread Fakrudeen Ali Ahmed
Hi there, https://stackoverflow.com/questions/52996054/resource-changed-on-src-filesystem-in-azure-flink We are unable to start flink on Azure Hadoop cluster [on top of WASB]. This throws: Application application_1539730571763_0046 failed 1 times (global limit =5; local limit is =1) due to AM

Accumulating a batch

2018-10-25 Thread Austin Cawley-Edwards
Hi there, Is it possible to use an AggregationFunction to accumulate n values in a buffer until a threshold is met, then stream down all records in the batch? Thank you! Austin Cawley-Edwards

Re: Queryable state when key is UUID - getting Kyro Exception

2018-10-25 Thread Jayant Ameta
Also, I haven't provided any custom serializer in my flink job. Shouldn't the same configuration work for queryable state client? Jayant Ameta On Thu, Oct 25, 2018 at 4:15 PM Jayant Ameta wrote: > Hi Gordon, > Following is the stack trace that I'm getting: > > *Exception in thread "main" java.

Re: Checkpoint acknowledge takes too long

2018-10-25 Thread Hequn Cheng
Hi Henry, Thanks for letting us know. On Thu, Oct 25, 2018 at 7:34 PM 徐涛 wrote: > Hi Hequn & Kien, > Finally the problem is solved. > It is due to slow sink write. Because the job only have 2 tasks, I check > the backpressure, found that the source has high backpressure, so I tried > to improve

Flink yarn -kill

2018-10-25 Thread Mikhail Pryakhin
Hi Flink community, Could you please help me clarify the following question: When a streaming job running in YARN gets manually killed via yarn -kill command is there any way to make a savepoint or other clean up actions before the job manager is killed? Kind Regards, Mike Pryakhin smime.p7s

Flink-1.6.1 :: HighAvailability :: ZooKeeperRunningJobsRegistry

2018-10-25 Thread Mikhail Pryakhin
Hi Flink experts! When a streaming job with Zookeeper-HA enabled gets cancelled all the job-related Zookeeper nodes are not removed. Is there a reason behind that? I noticed that Zookeeper paths are created of type "Container Node" (an Ephemeral node that can have nested nodes) and fall back to

Re: Question over Incremental Snapshot vs Full Snapshot in rocksDb state backend

2018-10-25 Thread Andrey Zagrebin
Hi Chandan, > 1. Why did we took 2 different approaches using different RocksDB apis ? > We could have used Checkpoint api of RocksDB for fullSnapshot as well . The reason here is partially historical. Full snapshot in RocksDB backend was implemented before incremental and rescaling for incremen

Re: RocksDB State Backend Exception

2018-10-25 Thread Andrey Zagrebin
Hi Ning, The problem here first of all is that RocksDB java JNI client diverged from RocksDB cpp code in status.h, as mentioned in the Flink issue you refer to. Flink 1.6 uses RocksDB 5.7.5 java client. The JNI code there misses these status subcodes: kNoSpace = 4, kDeadlock = 5, kStaleFile = 6

Re: Java Table API and external catalog bug?

2018-10-25 Thread Fabian Hueske
IIRC, that was recently fixed. Might come out with 1.6.2 / 1.7.0. Cheers, Fabian Flavio Pompermaier schrieb am Do., 25. Okt. 2018, 14:09: > Ok thanks! I wasn't aware of this..that would be undoubtedly useful ;) > On Thu, Oct 25, 2018 at 2:00 PM Timo Walther wrote: > >> Hi Flavio, >> >> the ex

Re: Java Table API and external catalog bug?

2018-10-25 Thread Flavio Pompermaier
Ok thanks! I wasn't aware of this..that would be undoubtedly useful ;) On Thu, Oct 25, 2018 at 2:00 PM Timo Walther wrote: > Hi Flavio, > > the external catalog support is not feature complete yet. I think you can > only specify the catalog when reading from a table but `insertInto` does > not co

Re: BucketingSink capabilities for DataSet API

2018-10-25 Thread Andrey Zagrebin
Hi Rafi, At the moment I do not see any support of Parquet in DataSet API except HadoopOutputFormat, mentioned in stack overflow question. I have cc’ed Fabian and Aljoscha, maybe they could provide more information. Best, Andrey > On 25 Oct 2018, at 13:08, Rafi Aroch wrote: > > Hi, > > I'm

Re: Java Table API and external catalog bug?

2018-10-25 Thread Timo Walther
Hi Flavio, the external catalog support is not feature complete yet. I think you can only specify the catalog when reading from a table but `insertInto` does not consider the catalog name. Regards, TImo Am 25.10.18 um 10:04 schrieb Flavio Pompermaier: Any other help here? is this a bug or

Re: Checkpoint acknowledge takes too long

2018-10-25 Thread 徐涛
Hi Hequn & Kien, Finally the problem is solved. It is due to slow sink write. Because the job only have 2 tasks, I check the backpressure, found that the source has high backpressure, so I tried to improve the sink write. After that the end to end duration is below 1s and the che

BucketingSink capabilities for DataSet API

2018-10-25 Thread Rafi Aroch
Hi, I'm writing a Batch job which reads Parquet, does some aggregations and writes back as Parquet files. I would like the output to be partitioned by year, month, day by event time. Similarly to the functionality of the BucketingSink. I was able to achieve the reading/writing to/from Parquet by

Re: HighAvailability :: FsNegativeRunningJobsRegistry

2018-10-25 Thread Chesnay Schepler
The release process for 1.6.2 is currently ongoing and will hopefully be finished within the next days. In the mean-time you could use 1.6.2-rc1 artifacts: binaries: https://dist.apache.org/repos/dist/dev/flink/flink-1.6.2/ maven: https://repository.apache.org/content/repositories/orgapacheflink

Re: Queryable state when key is UUID - getting Kyro Exception

2018-10-25 Thread Jayant Ameta
Hi Gordon, Following is the stack trace that I'm getting: *Exception in thread "main" java.util.concurrent.ExecutionException: java.lang.RuntimeException: Failed request 0.* * Caused by: java.lang.RuntimeException: Failed request 0.* * Caused by: java.lang.RuntimeException: Error while processing

Re: HighAvailability :: FsNegativeRunningJobsRegistry

2018-10-25 Thread Mikhail Pryakhin
Hi all, I’ve realised that the feature I requested information about hasn’t been released yet. Could you please reveal when approximately the release-1.6.2-rc1 is going to be rolled out? Thank you. Kind Regards, Mike Pryakhin >

Re: Queryable state when key is UUID - getting Kyro Exception

2018-10-25 Thread bupt_ljy
Hi Jayant, There should be a Serializer parameter in the constructor of the StateDescriptor, you should create a new serializer like this: new GenericTypeInfo(classOf[UUID]).createSerializer(env.getConfig) By the way, can you show us your kryo exception like what Gordon said? Jiayi Liao, B

Re: Queryable state when key is UUID - getting Kyro Exception

2018-10-25 Thread Tzu-Li (Gordon) Tai
Hi Jayant, What is the Kryo exception message that you are getting? Cheers, Gordon On 25 October 2018 at 5:17:13 PM, Jayant Ameta (wittyam...@gmail.com) wrote: Hi, I've not configured any serializer in the descriptor. (Neither in flink job, nor in state query client). Which serializer should

Re: Queryable state when key is UUID - getting Kyro Exception

2018-10-25 Thread Jayant Ameta
Hi, I've not configured any serializer in the descriptor. (Neither in flink job, nor in state query client). Which serializer should I use? Jayant Ameta On Thu, Oct 25, 2018 at 2:13 PM bupt_ljy wrote: > Hi, > >It seems that your codes are right. Are you sure that you’re using the > same Se

Re: Queryable state when key is UUID - getting Kyro Exception

2018-10-25 Thread bupt_ljy
Hi, It seems that your codes are right. Are you sure that you’re using the same Serializer as the Flink program do? Could you show the serializer in descriptor? Jiayi Liao, Best Original Message Sender:Jayant ametawittyam...@gmail.com Recipient:useru...@flink.apache.org Date:Thursday, Oct 2

Re: Java Table API and external catalog bug?

2018-10-25 Thread Flavio Pompermaier
Any other help here? is this a bug or something wrong in my code? On Tue, Oct 23, 2018 at 9:02 AM Flavio Pompermaier wrote: > I've tried with t2, test.t2 and test.test.t2. > > On Mon, 22 Oct 2018, 19:26 Zhang, Xuefu, wrote: > >> Have you tried "t2" instead of "test.t2"? There is a possibility t

Re: Task manager count goes the expand then converge process when running flink on YARN

2018-10-25 Thread Till Rohrmann
Hi Henry, since version 1.5 you don't need to specify the number of TaskManagers to start, because the system will figure this out. Moreover, in version 1.5.x and 1.6.x it is recommended to set the number of slots per TaskManager to 1 since we did not support multi task slot TaskManagers properly.