Aljoscha, Your thoughts on this?
Srikanth On Mon, Apr 25, 2016 at 8:08 PM, Srikanth <srikanth...@gmail.com> wrote: > Aljoscha, > > Looks like a potential solution. Feels a bit hacky though. > > Didn't quite understand why a list backed store is used to for static > input buffer? Join(inner) should emit only one record if there is a key > match. > > Is it a property of the system to emit Long.MAX_VALUE watermark when a > finite stream source ends? > If so can I do something like this to read static file in parallel? > val meta = env.readTextFile("S3:///path/to/file").map(...).keyBy(...) > > Shouldn't we also override checkpoint handling of custom operator? If so, > should the checkpoint wait/fail during the initial read phase? > > Lohith, > Adding a component like Cassandra just for this feels like a overkill. But > if I can't find a suitable way to do this, I might use it( or Redis > probably). > > Srikanth > > > > On Fri, Apr 22, 2016 at 12:20 PM, Lohith Samaga M < > lohith.sam...@mphasis.com> wrote: > >> Hi, >> Cassandra could be used as a distributed cache. >> >> Lohith. >> >> Sent from my Sony Xperia™ smartphone >> >> >> ---- Aljoscha Krettek wrote ---- >> >> >> Hi Srikanth, >> that's an interesting use case. It's not possible to do something like >> this out-of-box but I'm actually working on API for such cases. >> >> In the mean time, I programmed a short example that shows how something >> like this can be programmed using the API that is currently available. It >> requires writing a custom operator but it is still somewhat succinct: >> https://gist.github.com/aljoscha/c657b98b4017282693a67f1238c88906 >> >> Please let me know if you have any questions. >> >> Cheers, >> Aljoscha >> >> On Thu, 21 Apr 2016 at 03:06 Srikanth <srikanth...@gmail.com> wrote: >> >>> Hello, >>> >>> I have a fairly typical streaming use case but not able to figure how to >>> implement it best in Flink. >>> I want to join records read from a kafka stream with one(or more) >>> dimension tables which are saved as flat files. >>> >>> As per this jira <https://issues.apache.org/jira/browse/FLINK-2320> its >>> not possible to join DataStream with DataSet. >>> These tables are too big to do a collect() and join. >>> >>> It will be good to read these files during startup, do a partitionByHash >>> and keep it cached. >>> On the DataStream may be do a keyBy and join. >>> Is something like this possible? >>> >>> Srikanth >>> >> >> Information transmitted by this e-mail is proprietary to Mphasis, its >> associated companies and/ or its customers and is intended >> for use only by the individual or entity to which it is addressed, and >> may contain information that is privileged, confidential or >> exempt from disclosure under applicable law. If you are not the intended >> recipient or it appears that this mail has been forwarded >> to you without proper authority, you are notified that any use or >> dissemination of this information in any manner is strictly >> prohibited. In such cases, please notify us immediately at >> mailmas...@mphasis.com and delete this mail from your records. >> > >