Thanks to all for your inputs. Yeah, I could put all these common configurations/rules in DB and workers can pick it up dynamically at run time. In this case DB configuration/connection details need to be hard coded ? Is there any way worker can pickup DB name/credentials etc at run time dynamically ?
I am going through the feature/API documentation, but how about using function closer and setGlobalJobParameters/getGlobalJobParameters ? +Baswaraj On Wed, Aug 24, 2016 at 5:17 PM, Maximilian Michels <m...@apache.org> wrote: > Hi! > > 1. The community is working on adding side inputs to the DataStream > API. That will allow you to easily distribute data to all of your > workers. > > 2. In the meantime, you could use `.broadcast()` on a DataSet to > broadcast data to all workers. You still have to join that data with > another stream though. > > 3. The easiest method of all is to simply load your file in the > RichMapFunction's open() method. The file can reside in a distributed > file system which is accessible by all workers. > > Cheers, > Max > > On Wed, Aug 24, 2016 at 6:45 AM, Jark Wu <wuchong...@alibaba-inc.com> > wrote: > > Hi, > > > > I think what Bswaraj want is excatly something like Storm Distributed > Cache > > API[1] (if I’m not misunderstanding). > > > > The distributed cache feature in storm is used to efficiently distribute > > files (or blobs, which is the equivalent terminology for a file in the > > distributed cache and is used interchangeably in this document) that are > > large and can change during the lifetime of a topology, such as > geo-location > > data, dictionaries, etc. Typical use cases include phrase recognition, > > entity extraction, document classification, URL re-writing, > location/address > > detection and so forth. Such files may be several KB to several GB in > size. > > For small datasets that don't need dynamic updates, including them in the > > topology jar could be fine. But for large files, the startup times could > > become very large. In these cases, the distributed cache feature can > provide > > fast topology startup, especially if the files were previously downloaded > > for the same submitter and are still in the cache. This is useful with > > frequent deployments, sometimes few times a day with updated jars, > because > > the large cached files will remain available without changes. The large > > cached blobs that do not change frequently will remain available in the > > distributed cache. > > > > > > We can look into this whether it is a common use case and how to > implement > > it in Flink. > > > > [1] http://storm.apache.org/releases/2.0.0-SNAPSHOT/ > distcache-blobstore.html > > > > > > - Jark Wu > > > > 在 2016年8月23日,下午9:45,Lohith Samaga M <lohith.sam...@mphasis.com> 写道: > > > > Hi > > May be you could use Cassandra to store and fetch all such reference > data. > > This way the reference data can be updated without restarting your > > application. > > > > Lohith > > > > Sent from my Sony Xperia™ smartphone > > > > > > > > ---- Baswaraj Kasture wrote ---- > > > > Thanks Kostas ! > > I am using DataStream API. > > > > I have few config/property files (key vale text file) and also have > business > > rule files (json). > > These rules and configurations are needed when we process incoming event. > > Is there any way to share them to task nodes from driver program ? > > I think this is very common use case and am sure other users may face > > similar issues. > > > > +Baswaraj > > > > On Mon, Aug 22, 2016 at 4:56 PM, Kostas Kloudas > > <k.klou...@data-artisans.com> wrote: > >> > >> Hello Baswaraj, > >> > >> Are you using the DataSet (batch) or the DataStream API? > >> > >> If you are in the first, you can use a broadcast variable for your task. > >> If you are using the DataStream one, then there is no proper support for > >> that. > >> > >> Thanks, > >> Kostas > >> > >> On Aug 20, 2016, at 12:33 PM, Baswaraj Kasture <kbaswar...@gmail.com> > >> wrote: > >> > >> Am running Flink standalone cluster. > >> > >> I have text file that need to be shared across tasks when i submit my > >> application. > >> in other words , put this text file in class path of running tasks. > >> > >> How can we achieve this with flink ? > >> > >> In spark, spark-submit has --jars option that puts all the files > specified > >> in class path of executors (executors run in separate JVM and spawned > >> dynamically, so it is possible). > >> > >> Flink's task managers run tasks in separate thread under taskmanager JVM > >> (?) , how can we make this text file to be accessible on all tasks > spawned > >> by current application ? > >> > >> Using HDFS, NFS or including file in program jar is one way that i know, > >> but am looking for solution that can allows me to provide text file at > run > >> time and still accessible in all tasks. > >> Thanks. > >> > >> > > > > > > Information transmitted by this e-mail is proprietary to Mphasis, its > > associated companies and/ or its customers and is intended > > for use only by the individual or entity to which it is addressed, and > may > > contain information that is privileged, confidential or > > exempt from disclosure under applicable law. If you are not the intended > > recipient or it appears that this mail has been forwarded > > to you without proper authority, you are notified that any use or > > dissemination of this information in any manner is strictly > > prohibited. In such cases, please notify us immediately at > > mailmas...@mphasis.com and delete this mail from your records. > > > > >