Hi,
To expand on Fabian's answer, there's a few API for join.
* connect - you have to provide a CoprocessFunction.
* window join/cogroup - you provide key selector functions, a time window and
a join/cogroup function.
With the first method, you have to write more code, in exchange for much
Hi Pedro,
Seems like a memory leak. The only issue I’m currently aware of that may be
related is [1]. Could you tell if this JIRA relates to what you are bumping
into?
The JIRA mentions Kafka 09, but a fix is only available for Kafka 010 once we
bump our Kafka 010 dependency to the latest
No. This is the thread that answers my question -
http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/Does-FlinkKafkaConsumer010-care-about-consumer-group-td14323.html
Moiz
—
sent from phone
On 19-Jul-2017, at 10:04 PM, Ted Yu wrote:
This was the other
Hi Fran,
is the DataTimeBucketer acts like a memory buffer and does't managed by flink's
state? If so, then i think the problem is not about Kafka, but about the
DateTimeBucketer. Flink won't take snapshot for the DataTimeBucketer if it not
in any state.
Best,
Sihua Zhou
At 2017-07-20
Hi,
Gordon (in CC) knows the details of Flink's Kafka consumer.
He might know how to solve this issue.
Best, Fabian
2017-07-19 20:23 GMT+02:00 PedroMrChaves :
> Hello,
>
> Whenever I submit a job to Flink that retrieves data from Kafka the memory
> consumption
Hi,
unfortunately, it is not possible to convert a DataStream into a DataSet.
Flink's DataSet and DataStream APIs are distinct APIs that cannot be used
together.
The FlinkML library is only available for the DataSet API.
There is some ongoing work to add a machine learning library for streaming
Hi Fran,
did you observe actual data loss due to the problem you are describing or
are you discussing a possible issue based on your observations?
AFAIK, Flink's Kafka consumer keeps track of the offsets itself and
includes these in the checkpoints. In case of a recovery, it does not rely
on the
Hi,
there are basically two operations to merge streams.
1. Union simply merges the input streams such that the resulting stream has
the records of all input streams. Union is a built-in operator in the
DataStream API. For that all streams must have the same data type.
2. Join connects records
Hello,
Whenever I submit a job to Flink that retrieves data from Kafka the memory
consumption continuously increases. I've changed the max heap memory from
2gb, to 4gb, even to 6gb but the memory consumption keeps reaching the
limit.
An example of a simple Job that shows this behavior is
Thanks. Seems this has been fixed before in FLINK-5109
Sent from my iPhone
> On Jul 19, 2017, at 07:47, Chesnay Schepler wrote:
>
> Hello,
>
> Looks like you stumbled upon a bug in our REST API and use a client that is
> stricter than others.
>
> I will create a JIRA
Hello -
I've been successful working with Flink in Java, but have some trouble trying
to leverage the ML library, specifically with KNN.
>From my understanding, this is easier in Scala [1] so I've been converting my
>code.
One issue I've encountered is - How do I get a DataSet[Vector] from a
This was the other thread, right ?
http://search-hadoop.com/m/Flink/VkLeQ0dXIf1SkHpY?subj=Re+Does+job+restart+resume+from+last+known+internal+checkpoint+
On Wed, Jul 19, 2017 at 9:02 AM, Moiz Jinia wrote:
> Yup! Thanks.
>
> Moiz
>
> —
> sent from phone
>
> On 19-Jul-2017,
Yup! Thanks.
Moiz
—
sent from phone
On 19-Jul-2017, at 9:21 PM, Aljoscha Krettek [via Apache Flink User Mailing
List archive.] wrote:
This was now answered in your other Thread, right?
Best,
Aljoscha
>> On 18. Jul 2017, at 11:37, Moiz Jinia <[hidden
This was now answered in your other Thread, right?
Best,
Aljoscha
> On 18. Jul 2017, at 11:37, Moiz Jinia wrote:
>
> Aljoscha Krettek wrote
>> Hi,
>> zero-downtime updates are currently not supported. What is supported in
>> Flink right now is a savepoint-shutdown-restore
Raw state can only be used when implementing an operator, not a function.
For functions you have to use Managed Operator State. Your function will
have to implement
the CheckpointedFunction interface, and create a ValueStateDescriptor
that you register in initializeState.
On 19.07.2017
Thanks for the reply, but I am not using it for managed state, but rather for
the raw state
In my implementation I have the following
class DataProcessorKeyed extends CoProcessFunction[WineRecord, ModelToServe,
Double]{
// The managed keyed state see
Hello,
Looks like you stumbled upon a bug in our REST API and use a client
that is stricter than others.
I will create a JIRA for this.
Regards,
Chesnay
On 19.07.2017 13:31, Will Du wrote:
Hi folks,
I am using a java rest client - unirest lib to GET from flink rest API to get a
Job
Hi folks,
I am using a java rest client - unirest lib to GET from flink rest API to get a
Job status. I got exception-unsupported content encoding -UTF8.
Do you guys known how to resolve it? I use postman client working fine.
Thanks,
Will
Hello,
I assume you're passing the class of your serializer in a
StateDescriptor constructor.
If so, you could add a breakpoint in
Statedescriptor#initializeSerializerUnlessSet,
and check what typeInfo is created and which serializer is created as a
result.
One thing you could try right
Hello,
this problem is described in
https://issues.apache.org/jira/browse/FLINK-6689.
Basically, if you want to use the LocalFlinkMiniCluster you should use a
TestStreamEnvironment instead.
The RemoteStreamEnvironment only works with a proper Flink cluster.
Regards,
Chesnay
On 14.07.2017
Hi Vishnu,
I took a look into the code. Actually, we should support it. However,
those types might be mapped to Java Objects that will be serialized with
our generic Kryo serializer. Have you tested it?
Regards,
Timo
Am 19.07.17 um 06:30 schrieb Martin Eden:
Hey Vishnu,
For those of us
Great thanks that was very helpful.
One last question -
> If your job code hasn’t changed across the restores, then it should be
> fine even if you didn’t set the UID.
What kind of code change? What if the operator pipeline is still the same
but there's a some business logic change?
On Wed,
Does this mean I can use the same consumer group G1 for the newer version A'?
And inspite of same consumer group, A' will receive messages from all
partitions when its started from savepoint?
Yes. That’s true. Flink internally uses static partition assignment, and the
clients are assigned
Does this mean I can use the same consumer group G1 for the newer version
A'? And inspite of same consumer group, A' will receive messages from all
partitions when its started from savepoint?
I am using Flink 1.2.1. Does the above plan require setting uid on the
Kafka source in the job?
Thanks,
Hi!
The only occasions which the consumer group is used is:
1. When committing offsets back to Kafka. Since Flink 1.3, this can be disabled
completely (both when checkpointing is enabled or disabled). See [1] on details
about that.
2. When starting fresh (not starting from some savepoint), if
Below is a plan for downtime-free upgrade of a Flink job. The downstream
consumer of the Flink job is duplicate proof.
Scenario 1 -
1. Start Flink job A with consumer group G1 (12 slot job)
2. While job A is running, take a savepoint AS.
3. Start newer version of Flink job A' from savepoint AS
I have three data streams
1. app exposed and click
2. app download
3. app install
How can i merge the streams to create a unified stream,then compute it on
time-based windows
Thanks
27 matches
Mail list logo