RE: 1.2 release date

2017-02-07 Thread Anton Solovev
Ho Robert, I thought this list https://cwiki.apache.org/confluence/display/FLINK/List+of+contributors would be updated But after announcement it’s not necessary I think From: Robert Metzger [mailto:rmetz...@apache.org] Sent: Tuesday, February 7, 2017 7:58 PM To: user@flink.apache.org Subject: R

complete digraph

2017-02-07 Thread Chen Qin
Hi there, I don't think this would be a urgent topic but definitely seems interesting topic to me. Does flink topology able to run complete digraph (excluding sources and sinks)? The use case is more around support event based arbitary state transiti

Re: To get Schema for jdbc database in Flink

2017-02-07 Thread Punit Tandel
Hi Robert Thanks for the response, So in near future release of the flink version , is this functionality going to be implemented ? Thanks On 02/07/2017 04:12 PM, Robert Metzger wrote: Currently, there is no streaming JDBC connector. Check out this thread from last year: http://apache-flink

Re: FieldForwarding hints

2017-02-07 Thread Fabian Hueske
The correct annotation would be: @ForwardedField("*->f1") The asterisk / wildcard addresses the complete input type. The DataSet API also performs a type-based validation. If the types of the fields on the left and right are not correct, it should fail. Best, Fabian 2017-02-07 23:13 GMT+01:00 N

FieldForwarding hints

2017-02-07 Thread Newport, Billy
For the following map, what would the hint be: @ForwardedField("f0->f1") // Correct? public class Tuplator extends FlatMapBase> { /** * */ private static final long serialVersionUID = 4443299154253252672L; public Tuplator(SerializableAvroRecordBuilder avroRecordBuilder)

RE: Strange filter performance with parquet

2017-02-07 Thread Newport, Billy
We’ve fixed that, we wrote a custom kryo serializer which just writes a long fingerprint and we have the usual bi-directional schema map. This reduces the memory usage according to flink metrics by 20x. It does require us to ship the schema info around with the DFG so that that cache is primed b

RE: Strange filter performance with parquet

2017-02-07 Thread Newport, Billy
Clicked send to early, the serializer we wrote also caches datumreader/writers for each known schema and serializes the genericrecords using the avro encoder/decoder so it’s not slow but not as fast as custom ones. From: Fabian Hueske [mailto:fhue...@gmail.com] Sent: Tuesday, February 07, 2017

Re: Strange filter performance with parquet

2017-02-07 Thread Fabian Hueske
I'm not familiar with the details of Parquet and Avro, but I know that the handling of GenericRecord is very inefficient in Flink. The reason is that they are serialized using Kryo and always contain the full Avro schema. If you can provide a specific record to the InputFormat, Flink will serialize

Re: Cogroup hints/performance

2017-02-07 Thread Fabian Hueske
Hi Billy, A CoGroup does not have any freedom in its execution strategy. It requires that both inputs are partitioned on the grouping keys and are then performs a local sort-merge join, i.e, both inputs are sorted. Existing partitioning or sort orders can be reused. Since there is only one execut

RE: Strange filter performance with parquet

2017-02-07 Thread Newport, Billy
We read them like this: Job job = Job.getInstance(); AvroParquetInputFormat inputFormat = new AvroParquetInputFormat(); AvroParquetInputFormat.setAvroReadSchema(job, getOutputSchema(datasetName)); String storeN

Re: Strange filter performance with parquet

2017-02-07 Thread Fabian Hueske
Hmm, the plan you posted does not look like it would need to spill data to avoid a deadlock. Not sure what's causing the slowdown. How do you read Parquet files? If you use the HadoopIF wrapper, this might add some overhead. A dedicated Flink InputFormat for Parquet might help here. 2017-02-07 21

Cogroup hints/performance

2017-02-07 Thread Newport, Billy
We have a cogroup where sometimes we cogroup like this: Dataset z = larger.coGroup(small).where... The strategy is printed as hash on key and a sort asc on the other key. Which is which? Naively, we'd want to hash larger and sort the small? Or is that wrong? What factors would impact the perfo

RE: Strange filter performance with parquet

2017-02-07 Thread Newport, Billy
It’s kind of like this: DataSet live = from previous DataSet newRecords = avro read Dataset mergedLive = live.cogroup(newRecords) Dataset result = mergedLive.union(deadRecords) Store result to parquet. BTW on another point, Reading parquet files seems very slow to me. Writing is very fast in com

Re: Strange filter performance with parquet

2017-02-07 Thread Fabian Hueske
Hi Billy, this might depend on what you are doing with the live and dead DataSets later on. For example, if you join both data sets, Flink might need to spill one of them to disk and read it back to avoid a deadlock. This happens for instance if the join strategy is a HashJoin which blocks one inp

Strange filter performance with parquet

2017-02-07 Thread Newport, Billy
We're reading a parquet file (550m records). We want to split the parquet using a filter in to 2 sets, live and dead. DataSet a = read parquet DataSet live = a.filter(liveFilter) DataSet dead = a.filter(deadFilter) Is slower than DataSet a = read parquet DataSet live = a.filter(liveFilter) Data

Re: Issue Faced: Not able to get the consumer offsets from Kafka when using Flink with Flink-Kafka Connector

2017-02-07 Thread MAHESH KUMAR
Thanks for the prompt reply On Tue, Feb 7, 2017 at 10:38 AM, Robert Metzger wrote: > Hi Mahesh, > > this is a known limitation of Apache Kafka: https://www.mail- > archive.com/us...@kafka.apache.org/msg22595.html > You could implement a tool that is manually retrieving the latest offset > for th

Dense Matrix Multiplication

2017-02-07 Thread Ekram Ali
I could not find Dense Matrix Multiplication or addition function in flink ,please suggest how will perform multiplication and addition. ex -[1 ,2 ,4],[2,5,7] and i want to multiply by 5 in whole matrix so that every element multiply by 5. same in case of addition please suggest -- View t

Re: JavaDoc 404

2017-02-07 Thread Robert Metzger
I've filed a JIRA for the issue: https://issues.apache.org/jira/browse/FLINK-5736 On Tue, Feb 7, 2017 at 5:00 PM, Robert Metzger wrote: > Yes, I'll try to fix it asap. Sorry for the inconvenience. > > On Mon, Feb 6, 2017 at 4:43 PM, Ufuk Celebi wrote: > >> Thanks for reporting this. I think Rob

Re: Issue Faced: Not able to get the consumer offsets from Kafka when using Flink with Flink-Kafka Connector

2017-02-07 Thread Robert Metzger
Hi Mahesh, this is a known limitation of Apache Kafka: https://www.mail-archive.com/users@kafka.apache.org/msg22595.html You could implement a tool that is manually retrieving the latest offset for the group from the __offsets topic. On Tue, Feb 7, 2017 at 6:24 PM, MAHESH KUMAR wrote: > Hi Team

SparseMatrix Multilpication

2017-02-07 Thread Ekram Ali
I could not find SparseMatrix Multilpication or addition function in flink ,please suggest how will perform multiplication and addition. ex -[1 ,2 ,4],[2,5,7] and i want to multiply by 5 in whole matrix so that every element multiply by 5. same in case of addition please suggest. -- View this

Re: Netty issues while deploying Flink with Yarn on MapR

2017-02-07 Thread Robert Metzger
Hi, cool! Yes, creating a JIRA for the problem is a good idea. Once you've found a way to fix the issue, you can open a pull request referencing the issue. Regards, Robert On Tue, Feb 7, 2017 at 6:20 PM, ani.desh1512 wrote: > Thanks Robert. > I would love to try to solve this problem so that

Issue Faced: Not able to get the consumer offsets from Kafka when using Flink with Flink-Kafka Connector

2017-02-07 Thread MAHESH KUMAR
Hi Team, Kindly let me know if I am doing something wrong. Kafka Version - kafka_2.11-0.10.1.1 Flink Version - flink-1.2.0 Using the latest Kafka Connector - FlinkKafkaConsumer010 - flink-connector-kafka-0.10_2.11 Issue Faced: Not able to get the consumer offsets from Kafka when using Flink with

Re: Netty issues while deploying Flink with Yarn on MapR

2017-02-07 Thread ani.desh1512
Thanks Robert. I would love to try to solve this problem so that future MapR and Flink users do not face these issues. Should I create a JIRA for it? Let me know how I can be of help. -- View this message in context: http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/Netty-issu

Re: allowed lateness on windowed join?

2017-02-07 Thread Aljoscha Krettek
And to add to that: yes, this is what I was suggesting. :-) On Mon, 6 Feb 2017 at 09:58 Fabian Hueske wrote: > Hi, > > Union is a super cheap operator in Flink. It does not scan the records, > but just merges the streams. So the effort is very low. > The built-in join operator works in the same

Re: Netty issues while deploying Flink with Yarn on MapR

2017-02-07 Thread Robert Metzger
Hi Aniket, great analysis of the problem! Thank you for looking so well into it! Would you be interested in trying to solve the problem for Flink? We could try to provide a maven build profile that sets the correct versions and excludes. We could maybe also provide a MapR specific release of Flink

Re: logback

2017-02-07 Thread Robert Metzger
Hi Dmitry, Did you also check out this documentation page? https://ci.apache.org/projects/flink/flink-docs-release-1.2/monitoring/best_practices.html#use-logback-when-running-flink-on-a-cluster On Tue, Feb 7, 2017 at 1:07 PM, Dmitry Golubets wrote: > Hi, > > documentation says: "Users willing to

Re: Dealing with latency in Sink

2017-02-07 Thread Robert Metzger
Hi Mohit, Flink doesn't allow dynamic up or downscaling of parallel operator instances at runtime. However, you can stop and restore from a savepoint with a different parallelism. This way, you can adopt to workload changes. Flink's handling of backpressure is very implicit. If you want to thrott

Re: To get Schema for jdbc database in Flink

2017-02-07 Thread Robert Metzger
Currently, there is no streaming JDBC connector. Check out this thread from last year: http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/JDBC-Streaming-Connector-td10508.html On Mon, Feb 6, 2017 at 5:00 PM, Ufuk Celebi wrote: > I'm not sure how well this works for the streaming AP

Re: JavaDoc 404

2017-02-07 Thread Robert Metzger
Yes, I'll try to fix it asap. Sorry for the inconvenience. On Mon, Feb 6, 2017 at 4:43 PM, Ufuk Celebi wrote: > Thanks for reporting this. I think Robert (cc'd) is working in fixing > this, correct? > > On Sat, Feb 4, 2017 at 12:12 PM, Yassine MARZOUGUI > wrote: > > Hi, > > > > The JavaDoc link

Re: 1.2 release date

2017-02-07 Thread Robert Metzger
Hi Anton, which contributors list are you referring to? I've included all release contributors into the rel announcement. On Mon, Feb 6, 2017 at 11:50 AM, Anton Solovev wrote: > Hi, > > > > Could you update List of contributors after that? J > > > > *Anton Solovev* > > *Software Engineer* > > >

Re: scala version of flink mongodb example

2017-02-07 Thread alex.decastro
Hi there, treading in the thread, do you know how to add authentication options to mongo here? I'm trying to do hdIf.getJobConf.set("user", s"$USER") hdIf.getJobConf.set("password", s"$PWD") but I can't find any documentation to support it. Any pointers? Many thanks, Alex -- View th

Re: Netty issues while deploying Flink with Yarn on MapR

2017-02-07 Thread Ufuk Celebi
Thanks for reporting the solution. @Robert: Is that a general issue we have with Flink YARN on MapR? On 7 February 2017 at 15:50:17, ani.desh1512 (ani.desh1...@gmail.com) wrote: > In case anyone is having similar issues with Flink on Yarn on MapR, I > managed to solve > > this issue with help fr

Re: Regarding Flink as a web service

2017-02-07 Thread Robert Metzger
There's also a nice documentation page on the feature now: https://ci.apache.org/projects/flink/flink-docs-release-1.2/dev/stream/queryable_state.html On Tue, Jan 31, 2017 at 6:18 PM, Aljoscha Krettek wrote: > +u...@apache.org Because he implemented queryable state. > > There is also queryable

Re: Netty issues while deploying Flink with Yarn on MapR

2017-02-07 Thread ani.desh1512
In case anyone is having similar issues with Flink on Yarn on MapR, I managed to solve this issue with help from the MapR community. -- View this message in context: http://apache-flink-user-

logback

2017-02-07 Thread Dmitry Golubets
Hi, documentation says: "Users willing to use logback instead of log4j can just exclude log4j (or delete it from the lib/ folder)." But then Flink just doesn't start. I added logback-classic 1.10 to it's lib folder, but still get NoClassDefFoundError: ch/qos/logback/core/joran/spi/JoranException

Questions about the V-C Iteration in Gelly

2017-02-07 Thread Xingcan Cui
Hi all, I got some question about the vertex-centric iteration in Gelly. a) It seems the postSuperstep method is called before the superstep barrier (I got different aggregate values of the same superstep in this method). Is this a bug? Or the design is just like that? b) There is not setHalt m