Aljoscha,

Looks like a potential solution. Feels a bit hacky though.

Didn't quite understand why a list backed store is used to for static input
buffer? Join(inner) should emit only one record if there is a key match.

Is it a property of the system to emit Long.MAX_VALUE watermark when a
finite stream source ends?
If so can I do something like this to read static file in parallel?
    val meta = env.readTextFile("S3:///path/to/file").map(...).keyBy(...)

Shouldn't we also override checkpoint handling of custom operator? If so,
should the checkpoint wait/fail during the initial read phase?

Lohith,
Adding a component like Cassandra just for this feels like a overkill. But
if I can't find a suitable way to do this, I might use it( or Redis
probably).

Srikanth



On Fri, Apr 22, 2016 at 12:20 PM, Lohith Samaga M <lohith.sam...@mphasis.com
> wrote:

> Hi,
> Cassandra could be used as a distributed cache.
>
> Lohith.
>
> Sent from my Sony Xperia™ smartphone
>
>
> ---- Aljoscha Krettek wrote ----
>
>
> Hi Srikanth,
> that's an interesting use case. It's not possible to do something like
> this out-of-box but I'm actually working on API for such cases.
>
> In the mean time, I programmed a short example that shows how something
> like this can be programmed using the API that is currently available. It
> requires writing a custom operator but it is still somewhat succinct:
> https://gist.github.com/aljoscha/c657b98b4017282693a67f1238c88906
>
> Please let me know if you have any questions.
>
> Cheers,
> Aljoscha
>
> On Thu, 21 Apr 2016 at 03:06 Srikanth <srikanth...@gmail.com> wrote:
>
>> Hello,
>>
>> I have a fairly typical streaming use case but not able to figure how to
>> implement it best in Flink.
>> I want to join records read from a kafka stream with one(or more)
>> dimension tables which are saved as flat files.
>>
>> As per this jira <https://issues.apache.org/jira/browse/FLINK-2320> its
>> not possible to join DataStream with DataSet.
>> These tables are too big to do a collect() and join.
>>
>> It will be good to read these files during startup, do a partitionByHash
>> and keep it cached.
>> On the DataStream may be do a keyBy and join.
>> Is something like this possible?
>>
>> Srikanth
>>
>
> Information transmitted by this e-mail is proprietary to Mphasis, its
> associated companies and/ or its customers and is intended
> for use only by the individual or entity to which it is addressed, and may
> contain information that is privileged, confidential or
> exempt from disclosure under applicable law. If you are not the intended
> recipient or it appears that this mail has been forwarded
> to you without proper authority, you are notified that any use or
> dissemination of this information in any manner is strictly
> prohibited. In such cases, please notify us immediately at
> mailmas...@mphasis.com and delete this mail from your records.
>

Reply via email to