Talking with Stefano this morning and looking at the DataSourceTask code we
discovered that the open() and close() methods are both called for every
split and not once per inputFormat instance (maybe open and close should be
renamed as openSplit and closeSplit to avoid confusion...).
I think that i
Being a generic JDBC input format, I would prefer to stay with Row, letting
the developer manage the cast according to the driver functionalities.
As for the open() and close() issue, I agree with Flavio that we'd need a
better management of the inputformat lifecycle. Perhaps a new interface
exten
Of course there is one already. We'll look into the runtime context.
saluti,
Stefano
2016-04-18 9:41 GMT+02:00 Stefano Bortoli :
> Being a generic JDBC input format, I would prefer to stay with Row,
> letting the developer manage the cast according to the driver
> functionalities.
>
> As for the
There is also InputFormat.configure() which is called before any split
processing happens. But I see your point about a missing close() method
that is called after all input splits have been processed.
On Mon, 18 Apr 2016 at 09:44 Stefano Bortoli wrote:
> Of course there is one already. We'll lo
Yes, I forgot to mention that I could instantiate the connection in the
configure() but then I can't close it (as you confirmed) :(
On Mon, Apr 18, 2016 at 9:46 AM, Aljoscha Krettek
wrote:
> There is also InputFormat.configure() which is called before any split
> processing happens. But I see yo
Yes, I know Janino is a pure Java project. I meant if we add Scala code to
flink-core, we should add Scala dependency to flink-core and it could be
confusing.
Regards,
Chiwan Park
> On Apr 18, 2016, at 2:49 PM, Márton Balassi wrote:
>
> Chiwan, just to clarify Janino is a Java project. [1]
>
Hi,
yes, I'm afraid you need a custom operator for that. (We are working on
providing built-in support for this, though)
I sketched an Operator that does the sorting and also wrote a quick example
that uses it:
SortedWindowOperator:
https://gist.github.com/aljoscha/6600bc1121b7f8a0f68b89988dd341bd
Till Rohrmann created FLINK-3774:
Summary: Flink configuration is not correctly forwarded to
PlanExecutor in ScalaShellRemoteEnvironment
Key: FLINK-3774
URL: https://issues.apache.org/jira/browse/FLINK-3774
Till Rohrmann created FLINK-3775:
Summary: Flink Scala shell does not respect Flink configuration
Key: FLINK-3775
URL: https://issues.apache.org/jira/browse/FLINK-3775
Project: Flink
Issue Ty
Till Rohrmann created FLINK-3776:
Summary: Flink Scala shell does not allow to set configuration for
local execution
Key: FLINK-3776
URL: https://issues.apache.org/jira/browse/FLINK-3776
Project: Flin
This vote is cancelled in favour of RC3, because of the problem raised
by Fabian.
On Fri, Apr 15, 2016 at 1:02 PM, Fabian Hueske wrote:
> There is a request from the Mahout community to include a fix for
> FLINK-3762 in 1.0.2.
> Stephan gave a +1 [1] and I would also like to include it.
>
> I'm r
I agree, a method to close an input format is missing.
InputFormat is an API stable interface, so it is not possible to extend it
(until Flink 2.0). RichInputFormat is API stable as well, but an abstract
class. So it should be possible to add an empty default implementation of a
closeInputFormat()
Flavio Pompermaier created FLINK-3777:
-
Summary: Add open and close methods to manage IF lifecycle
Key: FLINK-3777
URL: https://issues.apache.org/jira/browse/FLINK-3777
Project: Flink
Iss
There is also FLINK-3657 which was recently merged to master.
This feature was requested by the Mahout community and the commit changes
the visibility of a method in DataSetUtils.
So strictly speaking, this is not a bug fix but a new feature. On the other
hand, it is very lightweight change and doe
Hi Fabian, I've just created a JIRA for that (FLINK-3777).
As you said input split should be not too fine-grained but we have a table
with 11 billions of rows that can't be queried with ranges greated than
100K of rows because it has a lot of JOIN and increasing thhis threashold
implies incredibly
I am fine with it since it only touches a utility class. So +1 to
include it as well.
– Ufuk
On Mon, Apr 18, 2016 at 12:07 PM, Fabian Hueske wrote:
> There is also FLINK-3657 which was recently merged to master.
> This feature was requested by the Mahout community and the commit changes
> the vi
+1 from my side as well
On Mon, Apr 18, 2016 at 12:10 PM, Ufuk Celebi wrote:
> I am fine with it since it only touches a utility class. So +1 to
> include it as well.
>
> – Ufuk
>
> On Mon, Apr 18, 2016 at 12:07 PM, Fabian Hueske wrote:
> > There is also FLINK-3657 which was recently merged to
+1, since others are blocked on this
On Mon, 18 Apr 2016 at 12:10 Ufuk Celebi wrote:
> I am fine with it since it only touches a utility class. So +1 to
> include it as well.
>
> – Ufuk
>
> On Mon, Apr 18, 2016 at 12:07 PM, Fabian Hueske wrote:
> > There is also FLINK-3657 which was recently me
OK, thanks for the responses. I'll go ahead and include it for RC3.
On Mon, Apr 18, 2016 at 12:20 PM, Aljoscha Krettek wrote:
> +1, since others are blocked on this
>
> On Mon, 18 Apr 2016 at 12:10 Ufuk Celebi wrote:
>
>> I am fine with it since it only touches a utility class. So +1 to
>> inclu
+1 for not mixing Java and Scala in flink-core.
Maybe it makes sense to implement the code generated serializers /
comparators as a separate module which can be plugged-in. This could be
pure Scala.
In general, I think it would be good to have some kind of "version
management" for serializers in p
Hi everyone,
I am trying to implement savepoint mechanism for my Flink project.
Here is the scenario:
I got the snapshot of Flink application by using "flink savepoint "
command while the application is running.
After saving snapshot of application, I canceled the job from web ui than I
cha
Dear Flink community,
Please vote on releasing the following candidate as Apache Flink version 1.0.2.
The commit to be voted on:
d39af152a166ddafaa2466cdae82695880893f3e
Branch:
release-1.0.2-rc3 (see
https://git1-us-west.apache.org/repos/asf/flink/?p=flink.git;a=shortlog;h=refs/heads/release-1.
Can you please share the program before and after the savepoint?
– Ufuk
On Mon, Apr 18, 2016 at 3:11 PM, Ozan DENİZ wrote:
> Hi everyone,
>
> I am trying to implement savepoint mechanism for my Flink project.
>
> Here is the scenario:
>
> I got the snapshot of Flink application by using "flink s
As Flavio underlined, it is not about selecting a certain number of rows,
but executing queries with sequence of joins on a very large database. We
played around to find the best throughput. Honestly, I prefer to have many
smaller range-queries with more parallel threads than fewer expensive
querie
Till Rohrmann created FLINK-3778:
Summary: ScalaShellRemoteStreamEnvironment cannot be forwarded a
user configuration
Key: FLINK-3778
URL: https://issues.apache.org/jira/browse/FLINK-3778
Project: Fli
Hi!
Yes, window contents is part of savepoints. If you change the topology, it
is crucial that the new topology matches the old window contents to the new
operator.
If you change the structure of the program, you probably need to assign
persistent names to the operators. See
https://ci.apache.org
Ufuk Celebi created FLINK-3779:
--
Summary: Add support for queryable state
Key: FLINK-3779
URL: https://issues.apache.org/jira/browse/FLINK-3779
Project: Flink
Issue Type: Improvement
C
Greg Hogan created FLINK-3780:
-
Summary: Jaccard Similarity
Key: FLINK-3780
URL: https://issues.apache.org/jira/browse/FLINK-3780
Project: Flink
Issue Type: New Feature
Components: Gell
Hi Team,
Problem Description : When I was calling *reduce()* method on keyedStream
object then getting Ecxeption as
"* org.apache.flink.api.common.InvalidProgramException: Specifying keys via
field positions is only valid for tuple data types. Type: Integer*".
StreamExecutionEnvironment env
I think then you have to either reconfigure your cluster environment or
wait until we bump the Akka version to 2.4.x which supports having an
internal and external IP address.
Cheers,
Till
On Fri, Apr 15, 2016 at 6:36 PM, star jlong
wrote:
> Hi Till/Ned,
>
> Soory I thought this was my post.
>
Ted Yu created FLINK-3781:
-
Summary: BlobClient may be left unclosed in
BlobCache#deleteGlobal()
Key: FLINK-3781
URL: https://issues.apache.org/jira/browse/FLINK-3781
Project: Flink
Issue Type: Bug
Hello Jitendra,
I am new to Flink community but may have seen this issue earlier. Can you
try to use DataStream> instead of KeyedStream integers4
Regards Saikat
On Mon, Apr 18, 2016 at 7:37 PM, Jitendra Agrawal <
jitendra.agra...@northgateps.com> wrote:
> Hi Team,
>
> Problem Description : Wh
Unfortunately making code generation a separate module would introduce
cyclic dependency.
Code generation requires the TypeInfo which is available in flink-core and
flink-core requires
the generated serializers from the code generation module. Do you have a
solution for this?
I think if we can com
Not entirely related, but for the special case of writing a parallelized source
that emits records in event time order, I found the MergeIterator to be most
useful. Here's an
example:https://github.com/nupic-community/flink-htm/blob/eb29f97f08f3482b32228db7284f669aad8dce2e/flink-htm-streaming-s
It throws by
if (!type.isTupleType()) {
throw new InvalidProgramException("Specifying
keys via field positions is only valid " +
"for tuple data types. Type: "
+ type);
}
So, like Saikat say
If you work on plain Integer (or other non-POJO types) you need to
provide a KeySelector to make it work.
For you case something like this:
.keyBy(new KeySelector() {
@Override
public Integer getKey(Integer value) throws Exception {
return value;
}
})
As S
I was trying out the new scala-shell with streaming support...
The following code executes correctly the first time I run it:
val survival = benv.readCsvFile[(String, String, String,
String)]("file:///home/trevor/gits/datasets/haberman/haberman.data")
survival.count()
However, if I call survival
Chenguang He created FLINK-3782:
---
Summary: ByteArrayOutputStream and ObjectOutputStream should close
Key: FLINK-3782
URL: https://issues.apache.org/jira/browse/FLINK-3782
Project: Flink
Issue T
Hi Stephan and Ufuk,
Thank you for your reply.
I have assigned uid to the "assignTimestampsAndWatermarks", "addSource",
"apply" operators. However, I couldn't assign uid to the time window. Therefore
the time window doesn't hold any state regarding timestamp.
For example, I implemented a cust
GaoLun created FLINK-3783:
-
Summary: Support weighted random sampling with reservoir
Key: FLINK-3783
URL: https://issues.apache.org/jira/browse/FLINK-3783
Project: Flink
Issue Type: Improvement
40 matches
Mail list logo