Hi Cliff,
you are right.
The CsvTableSink and the CsvInputFormat are not in sync. It would be great
if you could open a JIRA ticket for this issue.
As a workaround, you could implement your own CsvTableSink to add a
delimiter after the last field.
The code is straightforward, less than 150 lines
Hi Colin,
thanks for pointing out this gap in the docs!
I created FLINK-8303 [1] to extend the table and updated the release
process documentation [2] to update the page for new releases.
Thank you,
Fabian
[1] https://issues.apache.org/jira/browse/FLINK-8303
[2]
ove from a PriorityQueue is expensive
>> ? Trigger Context does expose another version that has removal abilities
>> so was wondering why this dissonance...
>>
>> On Tue, Dec 19, 2017 at 4:53 AM, Fabian Hueske <fhue...@gmail.com> wrote:
>>
>>> Hi Visha
rather than off an operator
>>>> the precedes the Window ? This is doable using ProcessWindowFunction using
>>>> state but only in the case of non mergeable windows.
>>>>
>>>>The best API option I think is a TimeBaseTrigger that fires every
>>
esults belonging to the
> same window?
>
> 2017-12-18 18:51 GMT+08:00 Fabian Hueske <fhue...@gmail.com>:
> > If you define a keyed window (use keyBy()), the results are not merged.
> > For each key, the window is individually evaluated and all results of
> > windows for
>
> 2017-12-18 18:02 GMT+08:00 Fabian Hueske <fhue...@gmail.com>:
> > Hi,
> >
> > timestamps are handled as meta-data in Flink's DataStream API.
> > This means that Flink automatically maintains the timestamps and ensures
> > that all records which w
Hi Vishal,
the Trigger is not designed to augment records but just to control when a
window is evaluated.
I would recommend to use a ProcessFunction to enrich records with the
current watermark before passing them into the window operator.
The context object of the processElement() method gives
Hi,
TriggerContext.getWaterMark() returns the current watermark (i.e.,
event-time) of the window operator.
An operator tracks for each of its inputs the maximum received watermark
and computes its own watermark as the minimum of all these maximums.
Until an operator has not received watermarks
Hi,
timestamps are handled as meta-data in Flink's DataStream API.
This means that Flink automatically maintains the timestamps and ensures
that all records which were aligned with the watermarks (i.e., not late)
are still aligned.
If records are aggregated in a time window, the aggregation
Hi Andrew,
I'm not aware of such a plan.
Another way to address such issues is to run multiple TaskManagers with a
single slot. In that case, only one subtask is executed per TM processes.
Best, Fabian
2017-12-15 22:23 GMT+01:00 Andrew Roberts :
> Hello,
>
> I’m writing a
Thanks for reporting the issue.
I've filed FLINK-8278 [1] to fix the issue.
Best, Fabian
[1] https://issues.apache.org/jira/browse/FLINK-8278
2017-12-14 14:04 GMT+01:00 Kien Truong :
> That syntax is incorrect, should be.
>
> @transient private var counter:Counter = _
Hi,
In case you haven't seen it yet.
Here's an analysis and response to Databricks' benchmark [1].
Best, Fabian
[1]
https://data-artisans.com/blog/curious-case-broken-benchmark-revisiting-apache-flink-vs-databricks-runtime
2017-11-13 11:44 GMT+01:00 G.S.Vijay Raajaa :
Hi Ron,
chaining windows as shown in the example was also possible before 1.4.0.
So you can keep using Flink 1.3.2 if this would be the only reason to
update to 1.4.0.
Best, Fabian
2017-12-15 1:14 GMT+01:00 Ron Crocker :
> In the 1.4 docs I stumbled on this section:
Thanks for reporting back!
2017-12-15 4:52 GMT+01:00 杨光 :
> Yes , i'm using Java8 , and i found the 1.4 version provided new
> parameters : "containerized.master.env.ENV_VAR1" and
> "containerized.taskmanager.env".
> I change my start command from "-yD
t trigger to fire in regular intervals
> (e.g. every 5 seconds) using table API?
>
>
> On 14.12.2017 17:57, Fabian Hueske wrote:
>
> Hi,
>
> you are using a BoundedOutOfOrdernessTimestampExtractor to generate
> watermarks.
> The BoundedOutOfOrdernessTimestampExtractor is a
Hi,
you are using a BoundedOutOfOrdernessTimestampExtractor to generate
watermarks.
The BoundedOutOfOrdernessTimestampExtractor is a periodic watermark
assigner and only generates watermarks if a watermark interval is
configured.
Without watermarks, the query cannot "make progress" and only
Hi Elias,
thanks for reporting this issue.
I created FLINK-8260 [1] to extend the documentation.
Best, Fabian
[1] https://issues.apache.org/jira/browse/FLINK-8260
2017-12-14 1:07 GMT+01:00 Elias Levy :
> Looks like the Flink Kafka connector page, in the Producer
Hi Seth,
that's not possible with the current interface.
There have been some discussions about how to address issues of idle
sources (or partitions).
Aljoscha (in CC) should know more about that.
Best, Fabian
2017-12-13 18:13 GMT+01:00 Seth Wiesman :
> Quick follow up
Bowen Li (in CC) closed the issue but there is no fix (or at least it is
not linked in the JIRA).
Maybe it was resolved in another issue or can be differently resolved.
@Bowen, can you comment on how to fix this problem? Will it work in Flink
1.4.0?
Thank you,
Fabian
2017-12-13 5:28 GMT+01:00
Hi,
you are right.
The purpose of a KeyedStream is to process all events/records with the same
key by the same operator task (which runs in a single thread). The operator
itself can have a greater parallelism, such that different keys are
processed by different tasks.
Best, Fabian
2017-12-13
by the
> source?
>
>
> 2017-12-12 19:25 GMT+08:00 Fabian Hueske <fhue...@gmail.com>:
> > Early events are usually not an issue because the can be kept in state
> until
> > they are ready to be processed.
> > Also, depending on the watermark assigner often pus
Thank you Aljoscha for managing the release!
2017-12-12 12:46 GMT+01:00 Aljoscha Krettek :
> The Apache Flink community is very happy to announce the release of Apache
> Flink 1.4.0.
>
> Apache Flink® is an open-source stream processing framework for
> distributed,
early event?
>
>
> 2017-12-12 18:24 GMT+08:00 Fabian Hueske <fhue...@gmail.com>:
> > Hi,
> >
> > this depends on how you generate watermarks [1].
> > You could generate watermarks with a four hour delay and be fine (at the
> > cost of a four hour latenc
Hi,
this depends on how you generate watermarks [1].
You could generate watermarks with a four hour delay and be fine (at the
cost of a four hour latency) or have some checks that you don't increment a
watermark by more than x minutes at a time.
These considerations are quite use case specific,
Hi Tovi,
testing the behavior of a data flow with respect to the order of records
from different sources is tricky.
Source functions are working independently of each other and it is not
easily possible to control the order in which records is shipped (and
received) across source functions.
You
...@gmail.com>:
> OK, I see.
>
> But what if a window contains no elements? Is it still get fired and
> invoke the window function?
>
> 2017-12-12 15:42 GMT+08:00 Fabian Hueske <fhue...@gmail.com>:
> > Hi,
> >
> > this depends on the window type.
Hi,
sliding windows replicate their records for each window.
If you have use an incrementally aggregating function (ReduceFunction,
AggregateFunction) with a sliding, the space requirement should not be an
issue because each window stores a single value.
However, this also means that each window
Hi,
this depends on the window type. Tumbling and Sliding Windows are (by
default) aligned with the epoch time (1970-01-01 00:00:00).
For example a tumbling window of 2 hour starts and ends every two hours,
i.e., from 12:00:00 to 13:59:59.999, from 14:00:00 to 15:59:59.999, etc.
The
Hi,
I think you are looking for a ProcessFunction with timers [1].
Best,
Fabian
[1]
https://ci.apache.org/projects/flink/flink-docs-release-1.3/dev/stream/process_function.html
2017-12-11 9:03 GMT+01:00 Marvin777 :
> hi,
>
> I'm new to apache Flink. I want to
gt;
>>
>>
>>
>>
>>
>>
>>
>>
>>
>> On Fri, Dec 8, 2017 at 9:11 AM, Aljoscha Krettek <aljos...@apache.org>
>> wrote:
>>
>>> Hi,
>>>
>>> If you use an AggregatingFunction in this way (i.e. for a
Hi Sendoh,
it certainly possible to deserialize nested JSON.
However, the JsonRowDeserializationSchema doesn't support it yet.
You would either have to extend the class or implement a new one.
Best, Fabian
2017-12-08 12:33 GMT+01:00 Sendoh :
> Hi Flink users,
>
>
Hmm, I see...
I'm running out of ideas.
You might be right with your assumption about a bug in the Beam Flink
runner. In this case, this would be an issue for the Beam project which
hosts the Flink runner.
But it might also be an issue on the Flink side.
Maybe Aljoscha (in CC), one of the
Hi,
thanks a lot for investigating this problems and the results you shared.
This looks like a bug to me. I'm CCing Aljoscha who knows the internals of
the DataStream API very well.
Which Flink version are you using?
Would you mind creating a JIRA issue [1] with all the info you provided so
Yes.
Adding .returns(typeInfo) works as well. :-)
2017-12-08 11:29 GMT+01:00 Fabian Hueske <fhue...@gmail.com>:
> Hi,
>
> you give the TypeInformation to your user code but you don't expose it to
> the DataStream API (the code of the FlatMapFunction is a black box for th
Hi,
you give the TypeInformation to your user code but you don't expose it to
the DataStream API (the code of the FlatMapFunction is a black box for the
API).
You're FlatMapFunction should implement the ResultTypeQueryable interface
and return the TypeInformation.
Best, Fabian
2017-12-08 11:19
m> wrote:
>
>> Nishu
>>
>> You might consider sideouput with metrics at least after window. I would
>> suggest having that to catch data screw or partition screw in all flink
>> jobs and amend if needed.
>>
>> Chen
>>
>> On Thu, Dec 7, 201
TM in time
> to see what it looks like. But each one I do look at the heap usage is
> ~150MB/6.16GB (with fraction: 0.1)
>
> On Thu, Dec 7, 2017 at 11:59 AM, Fabian Hueske <fhue...@gmail.com> wrote:
>
>> Hmm, the OOM sounds like a bug to me. Can you provide the stacktrace?
&
Is it possible that the data is dropped due to being late, i.e., records
with timestamps behind the current watemark?
What kind of operations does your program consist of?
Best, Fabian
2017-12-07 10:20 GMT+01:00 Sendoh :
> I would recommend to also print the count of
Hi,
A ClassNotFoundException should not be expected behavior.
Can you post the stacktrace of the exception?
We had a few issues in the past where Flink didn't use the correct
classloader.
So this would not be an unusual bug.
Thanks,
Fabian
2017-12-07 10:44 GMT+01:00 Tugdual Grall
We are currently voting on the third release candidate for 1.4.0.
Feel free to propose this feature on the dev mailing list [1], but I don't
think this will result in cancelling the vote.
If we identify a blocking issue for 1.4.0, it could be included as well.
But we're already a few weeks behind
gt; There are some join functions though, I will look into applying them.
>
> Besides this, can you recommend an initial place in the code where one
> should look to begin studying the optimizer?
>
> Thanks for your time once more,
>
> Best regards,
>
> Miguel E. Coimbra
&g
s it used to be a lot smaller, I broke it out
> manually by adding the sort/partition to see which steps were causing me
> the slowdown, thinking it was my code, I wanted to separate the operations.
>
> Thank you again for your help.
>
> On Thu, Dec 7, 2017 at 4:49 AM, Fabian Hueske
AFAIK, a job keeps its ID in case of a recovery.
Did you observe something else?
2017-12-07 17:32 GMT+01:00 Hao Sun <ha...@zendesk.com>:
> I mean restarted during failure recovery
>
> On Thu, Dec 7, 2017 at 8:29 AM Fabian Hueske <fhue...@gmail.com> wrote:
>
>> W
ed somewhere and
>> expose it through the api as well?
>>
>> On Mon, Dec 4, 2017 at 4:52 AM Fabian Hueske <fhue...@gmail.com> wrote:
>>
>>> Hi,
>>>
>>> you can submit jar files and start jobs via the REST interface [1].
>>> When s
ifferent config for taking overall the same amount of ram?
>
>
>
>
> On Wed, Dec 6, 2017 at 11:49 AM, Fabian Hueske <fhue...@gmail.com> wrote:
>
>> Hi Garrett,
>>
>> data skew might be a reason for the performance degradation.
>>
>> The plan you
Hi Vishal,
you are right, it is not possible to use state in an AggregateFunction
because windows need to be mergeable.
An AggregateFunction knows how to merge its accumulators but merging
generic state is not possible.
I am not aware of an efficient and easy work around for this.
If you want to
";
> ConfigOption akkaConfig = ConfigOptions.key(AkkaOptions.
> FRAMESIZE.key()).defaultValue(akkaFrameSize);
> clientConfig.setString(akkaConfig, akkaFrameSize);
> env = new RemoteEnvironment(remoteAddress, remotePort, clientConfig,
> jarFiles, null);
>
> I have run out of
Hi,
Flink's operators are designed to work in memory as long as possible and
spill to disk once the memory budget is exceeded.
Moreover, Flink aims to run programs in a pipelined fashion, such that
multiple operators can process data at the same time.
This behavior can make it a bit tricky to
Hi,
I haven't done that before either. The query API will change with the next
version (Flink 1.4.0) which is currently being prepared for releasing.
Kostas (in CC) might be able to help you.
Best, Fabian
2017-12-05 9:52 GMT+01:00 m@xi :
> Hi Fabian,
>
> Thanks for your
d()
> )
>
> val kinesisStream = env.fromCollection(testData)
>
> tableEnv.registerDataStream(streamName, avroStream);
>
> val query = "SELECT nd_key, sum(concept_rank) FROM "+streamName + " GROUP
> BY nd_key"
>
> Thanks,
> Tao
>
> On Mon, Dec 4, 2017
Can you create a JIRA issue to propose the feature?
Thank you,
Fabian
2017-12-04 16:15 GMT+01:00 Hao Sun :
> Thanks. If we can support include configuration dir that will be very
> helpful.
>
> On Mon, Dec 4, 2017, 00:50 Chesnay Schepler wrote:
>
>> You
if
> there are any windows that remain open for a very long time, but in general
> it would be useful IMHO. Or Flink could even commit both (read vs.
> triggered) offsets to kafka for monitoring purposes.
>
> On Mon, Dec 4, 2017 at 3:30 PM, Fabian Hueske <fhue...@gmail.com> wrote:
>
Hi Juho,
the partitions of both topics are independently consumed, i.e., at their
own speed without coordination. With the configuration that Gordon linked,
watermarks are generated per partition.
Each source task maintains the latest (and highest) watermark per partition
and propagates the
Hi,
you can submit jar files and start jobs via the REST interface [1].
When starting a job, you get the jobId. You can link jar files and
savepoints via the jobId.
Best, Fabian
[1]
https://ci.apache.org/projects/flink/flink-docs-release-1.3/monitoring/rest_api.html#submitting-programs
Hi Georg,
The recommended approach to configure user functions is to pass parameters
as (typesafe) arguments to the constructor.
Flink serializes users function objects using Java serialization and
distributes them to the workers. Hence, the configuration during plan
construction is preserved.
Hi Max,
state (keyed or operator state) is always local to the task.
By default it is not accessible (read or write) from the outside or other
tasks of the application.
You can expose keyed state as queryable state [1] to perform key look ups.
This feature was designed for external application
Hi Georg,
I have no experience with SBT's console mode, so I cannot comment on that,
but Flink provides a Scala REPL that might be useful [1].
Best, Fabian
[1]
https://ci.apache.org/projects/flink/flink-docs-release-1.3/dev/scala_shell.html
2017-11-30 23:09 GMT+01:00 Georg Heiler
Hi Rahul,
Flink does not provide a connector for ElasticSearch 6 yet.
There is this JIRA issue to track the development progress [1].
Best, Fabian
[1] https://issues.apache.org/jira/browse/FLINK-8101
2017-12-01 7:22 GMT+01:00 Rahul Raj :
> Hi All,
>
> Is there a Flink
Dear Flink community,
The Call for Presentations for Flink Forward San Francisco 2018 is now
open! Share your experiences and best practices in stream processing,
real-time analytics, and managing mission-critical Flink deployments in
production. We’re happy to receive your talk ideas until
Another example is King's RBEA platform [1] which was built on Flink.
In a nutshell, RBEA runs a single large Flink job, to which users can add
queries that should be computed.
Of course, the query language is restricted because they queries must match
on the structure of the running job.
Hope
t; in the
> operator's execution.
>
> Is this possible? I was hoping I could retrieve this information in the
> Java program itself and avoid processing logs.
>
> Thanks again.
>
> Best regards,
>
>
> Miguel E. Coimbra
> Email: miguel.e.coim...@gmail.com <miguel.e.coim
utions and not just the results
> of the last execution.
>
>
>
> Best regards,
>
> Miguel E. Coimbra
> Email: miguel.e.coim...@gmail.com <miguel.e.coim...@ist.utl.pt>
> Skype: miguel.e.coimbra
>
> On 27 November 2017 at 22:56, Fabian Hueske <fhue...@gmail.c
Hi Miguel,
I'm sorry but AFAIK, the situation has not changed.
Is it possible that you are calling execute() multiple times?
In that case, the 1-st and 2-nd graph would be recomputed before the 3-rd
graph is computed.
That would explain the increasing execution time of 15 seconds.
Best, Fabian
Hi George,
Flink 1.4 will not include a KafkaTableSink for Kafka 0.11 but a DataStream
API SinkFunction (KafkaProducer).
As an alternative to usingthe Kafka010TableSink, you can also convert the
result Table into a DataStream and use the KafkaProducer for Kafka 0.11 to
emit the DataStream.
We
Hi Mans,
no, non-equi joins are not supported by the relational APIs because they
can be prohibitively expensive to compute.
There's one exception. Cross joins where one of the input tables is
guaranteed to have a single row (because it is the result of a non-grouped
aggregation) are supported in
Hi Ebru,
this case is not supported by Flink's CsvInputFormat. The problem is that
such a file could not be read in parallel because it is not possible to
identify record boundaries if you start reading in the middle of the file.
We have a new CsvInputFormat under development that follows the RFC
Hi Fritz,
the ElasticSearch connector has not been updated for ES6 yet.
There is a JIRA issue [1] to add support for ES6 and somebody working on it
as it seems.
Best, Fabian
[1] https://issues.apache.org/jira/browse/FLINK-8101
2017-11-18 2:24 GMT+01:00 Fritz Budiyanto :
>
Hi Andre,
Do you have a batch or streaming use case?
Flink provides Cassandra Input and OutputFormats for DataSet (batch) jobs
and a Cassandra Sink for DataStream applications. The is no Cassandra
source for DataStream applications.
Regarding your error, this looks more like a Zepplin
Hi Aviad,
sorry for the late reply.
You can configure the checkpoint directory (which is also used for
externalized checkpoints) when you create the state backend:
env.setStateBackend(new RocksDBStateBackend("hdfs:///checkpoints-data/");
This configures the checkpoint directory to be
Hi Ebru,
AvroParquetOutputFormat seems to implement Hadoop's OutputFormat interface.
Flink provides a wrapper for Hadoop's OutputFormat [1], so you can try to
wrap AvroParquetOutputFormat in Flink's HadoopOutputFormat.
Hope this helps,
Fabian
[1]
Hi Colin,
thanks for reporting the bug. I had a look at it and it seems that the
wrong classloader is used when compiling the code (both for the batch as
well as the streaming queries).
I have a fix that I need to verify.
It's not necessary to open a new JIRA for that. We can cover all cases
Thanks for the correction! :-)
2017-11-13 13:05 GMT+01:00 Kien Truong <duckientru...@gmail.com>:
> Getting late elements from side-output is already available with Flink 1.3
> :)
>
> Regards,
>
> Kien
> On 11/13/2017 5:00 PM, Fabian Hueske wrote:
>
> Hi Andrea,
Hi Ivan,
I don't have much experience with Avro, but extracting the schema and
creating a writer for each record sounds like a pretty expensive approach.
This might result in significant load and increased GC activity.
Do all records have a different schema or might it make sense to cache the
Hi Andrea,
you are right. Flink's window operators can drop messages which are too
late, i.e., have a timestamp smaller than the last watermark.
This is expected behavior and documented at several places [1] [2].
There are a couple of options how to deal with late elements:
1. Use more
Hi Ashish,
this is a known issue and has been fixed for the next version [1].
Best, Fabian
[1] https://issues.apache.org/jira/browse/FLINK-7100
2017-11-11 16:02 GMT+01:00 Ashish Pokharel :
> All,
>
> Hopefully this is a quick one. I enabled Graphite reporter in my App and
Hi Colin,
Flink's SQL runner does not support handling of late data yet. At the
moment, late events are simply dropped.
We plan to add support for late data in a future release.
The "withIdleStateRetentionTime" parameter only applies to non-windowed
aggregation functions and controls when they
Hi XiangWei,
I don't think this is a public interface, but Till (in CC) might know
better.
Best,
Fabian
2017-11-06 3:27 GMT+01:00 XiangWei Huang :
> Hi Flink users,
> Flink Jobmanager throw a NotSerializableException when i used
> JobMasterGateway to get ExecutionGraph
Hi Seth,
I think the Table API is not there yet to address you use case.
1. Allowed lateness cannot be configured but it is on the list of features
that we plan to add in the future.
2. Custom triggers are not supported. We are planning to add an option to
support your use case (early firing and
Hi Teena,
thanks for reaching out to the mailing list for this issue. This sound
indeed like a bug in Flink and should be investigated.
We are currently working on a new release 1.4 and the testing phase will
start soon. So it would make sense to include this problem in the testing
and hopefully
s will be ignored.
>
> so basically there has to be an accumulator implemented inside
> AsyncFunction to gather up all results and return them in a single
> .collect() call.
> but how to know when to do so? or I am completely off track here
>
>
>
> On Wed, 1 Nov 2017 at 03:57 Fabian H
Hi Tomasz,
that sounds like a sound design.
You have to make sure that the output of the application is idempotent such
that the reprocessing job overrides all! output data of the earlier job.
Best, Fabian
2017-10-23 16:24 GMT+02:00 Tomasz Dobrzycki :
> Hi all,
>
Hi Tomas,
triggering a batch DataSet job from a DataStream program for each input
record doesn't sound like a good idea to me.
You would have to make sure that the cluster always has sufficient
resources and handle failures.
It would be preferable to have all data processing in a DataStream job.
Hi David,
that's correct. A TM is a single process. A slot is just a virtual concept
in the TM process and runs its program slice in multiple threads.
Besides managed memory (which is split into chunks add assigned to slots)
all other resources (CPU, heap, network, disk) are not isolated and free
Hi David,
please find my answers below:
1. For high utilization, all slot should be filled. Each slot will
processes a slice of the program on a slice of the data. In case of
partitioning or changed parallelism, the data is shuffled accordingly .
2. That's a good question. I think the default
Hi Navneeth,
the configuring user function using a Configuration object and setting the
parameters in the open() method of a RichFunction is no longer recommended.
In fact, that only works for the DataSet API and has not been added for the
DataStream API. The open() method with the Configuration
Hi David,
Flink's DataSet API schedules one slice of a program to a task slot. A
program slice is one parallel instance of each operator of a program.
When all operator of your program run with a parallelism of 1, you end up
with only 1 slice that runs on a single slot.
Flink's DataSet API
Hi,
in a MapReduce context, combiners are used to reduce the amount of data 1)
to shuffle and fully sort (to group the data by key) and 2) to reduce the
impact of skewed data.
The question is, why do you need a combiner in your use case.
- To reduce the data to shuffle: You should not use a
nformation to case classes:
> https://issues.apache.org/jira/browse/FLINK-7859
>
> Joshua
>
>
> On Oct 17, 2017, at 3:01 AM, Fabian Hueske <fhue...@gmail.com> wrote:
>
> Hi Joshua,
>
> that's a limitation of the Scala API.
> Row requires to explicitly specify
null values that this error might be thrown?
>
> Thank you,
>
> Joshua
>
> On Oct 25, 2017, at 3:12 AM, Fabian Hueske <fhue...@gmail.com> wrote:
>
> Hi Joshua,
>
> that is correct. Delta iterations cannot spill to disk. The solution set
> is managed in an in-me
Hi Joshua,
that is correct. Delta iterations cannot spill to disk. The solution set is
managed in an in-memory hash table.
Spilling that hash table to disk would have a significant impact on the
performance.
By default the hash table is organized in Flink's managed memory.
You can try to
Hi Lei,
setting explicit operator ID should solve this issue.
As far as I know, the auto-generated operator id also depended on the
operator parallelism in previous versions of Flink (not sure until which
point).
Which version are you running?
Best, Fabian
2017-10-17 3:15 GMT+02:00 Lei Chen
No worries :-) Thanks for the notice.
2017-10-18 15:07 GMT+02:00 Kenny Gorman <ke...@eventador.io>:
> Yep we hung out and got it working. I should have replied sooner! Thx for
> the reply.
>
> -kg
>
> On Oct 18, 2017, at 7:06 AM, Fabian Hueske <fhue...@gma
Hi Hayden,
I tried to reproduce the problem you described and followed the HA setup
instructions of the documentation [1].
For me the instructions worked and start-cluster.sh started two JobManagers
on my local machine (master contained two localhost entries).
The bash scripts tend to be a bit
Hi Kenny,
this look almost correct.
The Table class has a method writeToSink(TableSink) that should address
your use case (so the same as yours but without the TableEnvironment
argument).
Does that work for you?
If not what kind of error and error message do you get?
Best, Fabian
2017-10-18
here any way to "downgrade" or convert a DataSet to a DataStream?
>
> BR
> /Magnus
>
> On 17 Oct 2017, at 10:54, Fabian Hueske <fhue...@gmail.com> wrote:
>
> Hi Magnus,
>
> there is no Split operator on the DataSet API.
>
> As you said, this can be done usi
Hi Stefano,
this is not supported in Flink's SQL and we would need new Group Window
functions (like TUMBLE) for this.
A TUMBLE_COUNT function would be somewhat similar to SESSION, which also
requires checks on the sorted neighboring rows to identify the window of a
row.
Such a function would
Hi Magnus,
there is no Split operator on the DataSet API.
As you said, this can be done using a FilterFunction. This also allows for
non-binary splits:
DataSet setToSplit = ...
DataSet firstSplit = setToSplit.filter(new SplitCondition1());
DataSet secondSplit = setToSplit.filter(new
Setting the slot sharing group is Flink's mechanism to solve this issue.
I'd consider this a limitation of the library that provides LEARN and
SELECT.
Did you consider to open an issue at (or contributing to) the library to
support setting the slotSharing group?
2017-10-17 9:38 GMT+02:00
Hi Joshua,
that's a limitation of the Scala API.
Row requires to explicitly specify a TypeInformation[Row] but it is not
possible to inject custom types into a CaseClassTypeInfo, which are
automatically generated by a Scala compiler plugin.
The probably easiest solution is to use Flink's Java
Hi Andrea,
have you looked into assigning slot sharing groups [1]?
Best, Fabian
[1]
https://ci.apache.org/projects/flink/flink-docs-release-1.3/dev/datastream_api.html#task-chaining-and-resource-groups
2017-10-16 18:01 GMT+02:00 AndreaKinn :
> Hi all,
> I want to expose
801 - 900 of 1535 matches
Mail list logo