Ok, simply turning up HDFS on the cluster and using it as the state backend
fixed the issue. Thank you both for the help!
On Mon, Feb 15, 2016 at 5:45 PM, Stefano Baghino <
stefano.bagh...@radicalbit.io> wrote:
> You can find the log of the recovering job manager here:
> https://gist.github.com/s
I also have a couple of use cases where the pin data sets in memory feature
would help a lot ;)
On Mon, Feb 15, 2016 at 10:18 PM, Saliya Ekanayake
wrote:
> Thanks, I'll check this.
>
> Saliya
>
> On Mon, Feb 15, 2016 at 4:08 PM, Fabian Hueske wrote:
>
>> I would have a look at the example progr
Hi!
As a bit of background: ZooKeeper allows you only to store very small data.
We hence persist only the changing checkpoint metadata in ZooKeeper.
To recover a job, some constant data is also needed: The JobGraph, and the
JarFiles. These cannot go to ZooKeeper, but need to go to a reliable
stor
Hi,
No, we don't start a flink job inside another job, although the job
creation was done in a loop, but only when one job is finished the next job
started after cleanup. And we didn't get this exception on my local flink
installation, it appears when i run on the cluster.
Thanks & Regards
Biplob
Hi,
in reference to this ticket https://issues.apache.org/jira/browse/FLINK-3115
when do you think that an ElasticSearch 2 streaming connector will become
available? Will it make it for the 1.0 release?
That would be great, as we are planning to use that particular version of
ElasticSearch in the
Found one blocker issue during testing:
- Watermark generators accept negative watermarks (FLINK-3415)
On Mon, Feb 15, 2016 at 8:47 PM, Robert Metzger wrote:
> Hi,
>
> I've now created a "preview RC" for the upcoming 1.0.0 release.
> There are still some blocking issues and important pull req
Hi,
I don’t know about the results but one problem I can identify is this snipped:
groupBy(0).sum(2).max(2)
The max(2) here is a non-parallel operation since it finds the max over all
elements, not grouped by key. If you want the max to also be per-key you have
to use
groupBy(0).sum(2).andMax(
Fabian,
I've a quick follow-up question on what you suggested. When streaming the
same data through different maps, were you implying that everything goes as
single job in Flink, so data read happens only once?
Thanks,
Saliya
On Mon, Feb 15, 2016 at 3:58 PM, Fabian Hueske wrote:
> It is not po
Yes, if you implement both maps in a single job, data is read once.
2016-02-16 15:53 GMT+01:00 Saliya Ekanayake :
> Fabian,
>
> I've a quick follow-up question on what you suggested. When streaming the
> same data through different maps, were you implying that everything goes as
> single job in F
Fabian,
Not sure if we are on the same page. If I do something like below code, it
will groupby field 0 and each task will write a separate part file in
parallel.
val sink = data1.join(data2)
.where(1).equalTo(0) { ((l,r) => ( l._3, r._3) ) }
.partitionByHash(0)
.writeAsCsv(pathBa
Yes, you're right. I did not understand your question correctly.
Right now, Flink does not feature an output format that writes records to
output files depending on a key attribute.
You would need to implement such an output format yourself and append it as
follows:
val data = ...
data.partitionB
I looked at the samples and I think what you meant is clear, but I didn't
find a solution for my need. In my case, I want to use the result from
first map operation before I can apply the second map on the *same* data
set. For simplicity, let's say I've a bunch of short values represented as
my dat
You can use so-called BroadcastSets to send any sufficiently small DataSet
(such as a computed average) to any other function and use it there.
However, in your case you'll end up with a data flow that branches (at the
source) and merges again (when the average is send to the second map).
Such patt
Thank you, yes, this makes sense. The broadcasted data in my case would a
large array of 3D coordinates,
On a side note, how can I take the output from a reduce function? I can see
methods to write it to a given output, but is it possible to retrieve the
reduced result back to the program - like a
Broadcasted DataSets are stored on the JVM heap of each task manager (but
shared among multiple slots on the same TM), hence the size restriction.
There are two ways to retrieve a DataSet (such as the result of a reduce).
1) if you want to fetch the result into your client program use
DataSet.coll
Thank you. I'll check this
On Tue, Feb 16, 2016 at 4:01 PM, Fabian Hueske wrote:
> Broadcasted DataSets are stored on the JVM heap of each task manager (but
> shared among multiple slots on the same TM), hence the size restriction.
>
> There are two ways to retrieve a DataSet (such as the result
Hi,
I have a streaming machine learning job that usually runs with input from
kafka. To tweak the models I need to run on some old data from HDFS.
Unfortunately the data on HDFS is spread out over several subfolders.
Basically I have a datum with one subfolder for each hour within those are
the a
Hi, Where can get the summary changes between flink-1.0 and flink-0.10,
thank you in advance!
Best Regards,
Zhijiang Wang
Hi Zhijiang,
We have wiki pages about description of Flink 1.0 relesase [1] [2]. But the
pages are not updated in realtime. It is possible that there are some changes
that haven’t been described.
After releasing 1.0 officially, maybe we post an article dealing with the
changes in 1.0 to the Fl
19 matches
Mail list logo