Small addition, i'm currently running the programs via my IDE intelij
--
View this message in context:
http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/Executing-graph-algorithms-on-Gelly-that-are-larger-then-memmory-tp10358p10359.html
Sent from the Apache Flink User Mailing
I read somewhere that Flink and Gelly should be able to handle graph
algorithms that require more space then the available memory, i'm currently
getting java OutOfMemoryError heap space and if it would use disk space that
wouldn't happen.
Currently my algorithms use dense graphs with 10m edges, the
Great, thanks!
On 28 Nov 2016 8:54 p.m., "Fabian Hueske" wrote:
> Hi Flavio,
>
> sure.
> This code should be close to what you need:
>
> public static class BatchingMapper implements MapPartitionFunction String[]> {
>
>int cnt = 0;
>String[] batch = new String[1000];
>
>@Override
>
Hi Flavio,
sure.
This code should be close to what you need:
public static class BatchingMapper implements
MapPartitionFunction {
int cnt = 0;
String[] batch = new String[1000];
@Override
public void mapPartition(Iterable values,
Collector out) throws Exception {
for(String v
Thanks for the support Fabian!
I think I'll try the tumbling window method, it seems cleaner. Btw, just
for the sake of completeness, can you show me a brief snippet (also in
pseudocode) of a mapPartition that groups together elements into chunks of
size n?
Best,
Flavio
On Mon, Nov 28, 2016 at 8:
Hi Flavio,
I think the easiest solution is to read the CSV file with the
CsvInputFormat and use a subsequent MapPartition to batch 1000 rows
together.
In each partition, you might end up with an incomplete batch.
However, I don't see yet how you can feed these batches into the
JdbcInputFormat whic
Hi Diego,
The message shows that two tasks are trying to touch concurrently the same file.
This message is thrown upon recovery after a failure, or at the initialization
of the job?
Could you please check the logs for other exceptions before this?
Can this be related to this issue?
https://www.
Hi to all,
I have a use case where I have to read a huge csv containing ids to fetch
from a table in a db.
The jdbc input format can handle parameterized queries so I was thinking to
fetch data using 1000 id at a time. What is the easiest whay to divide a
dataset by slices of 1000 ids each (in ord
Hi colleagues,
I am experiencing problems when trying to write events from a stream to HDFS. I
get the following exception:
org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.hdfs.protocol.AlreadyBeingCreatedException):
failed to create file
/user/biguardian/events/2016-11-28--15/flinkpar
Hi Sendoh,
I have used the Custom Trigger which is same as 1.0.3 EventTimeTrigger, and
kept the allowedLateness value to Long.MAX_VALUE.
Because of this change the late elements are not discarded and become
single element windows
Regards,
Vinay Patil
On Mon, Nov 28, 2016 at 5:54 PM, Sendoh [via
Hi Flink users,
Can I ask how to avoid default allowLateness(0) ? so that late events
becomes single-element windows as 1.0.3 version acts?
Best,
Sendoh
--
View this message in context:
http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/How-to-let-1-1-3-not-drop-late-events
Hi,
sorry for the late reply.
There is a repository [1] with an example application that uses a custom
trigger [2] (though together with a TimeWIndow and not with a GlobalWindow).
I'm not aware of a repo with an example of a GlobalWIndow.
Regarding the question about timestamps and watermarks: I
The zookeeper related logs are loged by user codes,I finally find the reason
why the taskmanger was lost,that was I gave the taskmanager a big amount of
memory, the jobmanager identify the taskmanager is down during the
taskmanager in Full GC.Thanks for your help.
--
View this message in context
Hi Pedro,
if I read you code correctly, you are not assigning timestamps and
watermarks to the rules stream.
Flink automatically derives watermarks from all streams involved.
If you do not assign a watermark, the default is watermark is
Long.MIN_VALUE which is exactly the value you are observing.
Hi Anastasios,
that's certainly possible. The most straight-forward approach would be a
synchronous call to the database.
Because only one request is active at the same time, you do not need a
thread pool.
You can establish the connection in the open() method of a RichMapFunction.
The problem with
Hi Lydia,
that is certainly possible, however you need to adapt the algorithm a bit.
The straight-forward approach would be to replicate the input data and
assign IDs for each k-means run.
If you have a data point (1, 2, 3) you could replicate it to three data
points (10, 1, 2, 3), (15, 1, 2, 3),
16 matches
Mail list logo