Re: Thrift object serialization

2017-05-15 Thread Tzu-Li (Gordon) Tai
Hi Flavio! I believe [1] has what you are looking for. Have you taken a look at that? Cheers, Gordon [1]  https://ci.apache.org/projects/flink/flink-docs-release-1.3/dev/custom_serializers.html On 15 May 2017 at 9:08:33 PM, Flavio Pompermaier (pomperma...@okkam.it) wrote: Hi to all, in my Flin

Re: [BUG?] Cannot Load User Class on Local Environment

2017-05-15 Thread Matt
Hi Till, I just tried with Flink 1.4 by compiling the current master branch on GitHub (as of this morning) and I still find the same problem as before. If I'm not wrong your PR was merged already, so your fixes should be part of the binary. I hope you have time to have a look at the test case in

Re: static/dynamic lookups in flink streaming

2017-05-15 Thread Jain, Ankit
What if we copy the big data set to HDFS on start of cluster (eg EMR if using AWS) and then use that to build distributed operatot state in Flink instead of calling the external store? How does flink contributors feel about that? Thanks Ankit On 5/14/17, 8:17 PM, "yunfan123" wrote: The 1

Re: Storage options for RocksDBStateBackend

2017-05-15 Thread Jain, Ankit
Also, I hope state & checkpointing writes to S3 happens async w/o impacting the actual job execution graph? If so, will there still be a performance impact from using S3? Thanks Ankit From: Ayush Goyal Date: Thursday, May 11, 2017 at 11:21 PM To: Stephan Ewen , Till Rohrmann Cc: user Subject

Re: memory usage in task manager when run and cancel a job

2017-05-15 Thread Sendoh
Could I also ask is this behavior the same as master? I saw that when master uses more than 100% memory (starting a new job uses 35%, and master already uses 70%), ubuntu shuts down and restarts. Best, Sendoh -- View this message in context: http://apache-flink-user-mailing-list-archive.2336

Re: memory usage in task manager when run and cancel a job

2017-05-15 Thread Sendoh
Hi Fabian, Thank you for quick reply. I run the job in streaming environment. So I think in streaming env memory is allocated up to the configured amount and never returned until Flink is shutdown as you said if I understand well. Best, Sendoh -- View this message in context: http://apache

Re: memory usage in task manager when run and cancel a job

2017-05-15 Thread Fabian Hueske
Hi Sendoh, are you running batch or streaming jobs? In batch mode, workers allocate managed memory lazily by default. Memory is allocated up to the configured amount and never returned until Flink is shutdown. So, this would not indicate a memory leak. Best, Fabian 2017-05-15 18:01 GMT+02:00 Sen

memory usage in task manager when run and cancel a job

2017-05-15 Thread Sendoh
Hi Flink users, How does memory usage percentage changes when starting the first job, second job and cancel a job which all use the same jar? I found that when starting the first job, it uses much more memory than the second job. The first job uses around 20 % and the second one may use only 5%.

Checkpoint ?

2017-05-15 Thread Jim Langston
Hi all, I have a long running , streaming app saving checkpoints to the file system. What is the layout of the checkpoint directory ? My current checkpoint directory has >2000 directories in it , similar to this: chk-4645 Also, the directory has grown to >3GB I have a small cluster, and all w

Re: questions about Flink's HashJoin performance

2017-05-15 Thread weijie tong
The Flink version is 1.2.0 On Mon, May 15, 2017 at 10:24 PM, weijie tong wrote: > @Till thanks for your reply. > > My code is similar to HashTableITCase.testInMemoryMutableHashTable() > . It just use the MutableHashTable class , there's no other Flink's > configuration. The main code body is

Re: questions about Flink's HashJoin performance

2017-05-15 Thread weijie tong
@Till thanks for your reply. My code is similar to HashTableITCase.testInMemoryMutableHashTable() . It just use the MutableHashTable class , there's no other Flink's configuration. The main code body is: this.recordBuildSideAccessor = RecordSerializer.get(); > this.recordProbeSideAccessor =

Re: Sink - Cassandra

2017-05-15 Thread nragon
Hi Nick, I'm trying to integrate Hbase with streaming. Did you accomplish good results(writes/s) with phoenix? Can you share your code. Thanks, Nuno -- View this message in context: http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/Sink-Cassandra-tp4107p13138.html Sent from

Thrift object serialization

2017-05-15 Thread Flavio Pompermaier
Hi to all, in my Flink job I create a Dataset using HadoopInputFormat in this way: HadoopInputFormat inputFormat = new HadoopInputFormat<>( new ParquetThriftInputFormat(), Void.class, MyThriftObj.class, job); FileInputFormat.addInputPath(job, new org.apache.hadoop.fs.Path(inputPath); *Dat

Re: questions about Flink's HashJoin performance

2017-05-15 Thread Till Rohrmann
Hi Weijie, it might be the case that batching the processing of multiple rows can give you an improved performance compared to single row processing. Maybe you could share the exact benchmark base line results and the code you use to test Flink's MutableHashTable with us. Also the Flink configura

Re: Why not add flink-connectors to flink dist?

2017-05-15 Thread Chesnay Schepler
You can either package the connector into the user-jar or place it in the /lib directory of the distribution. On 15.05.2017 11:09, yunfan123 wrote: So how can I use it? Every jar file I submitted should contains the specific connector class? Can I package it to flink-dist ? -- View this mess

Why not add flink-connectors to flink dist?

2017-05-15 Thread yunfan123
So how can I use it? Every jar file I submitted should contains the specific connector class? Can I package it to flink-dist ? -- View this message in context: http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/Why-not-add-flink-connectors-to-flink-dist-tp13134.html Sent from