Hi how about 1. have a process that read the data from your sqlserver and dumps it as a file into a directory on your hd 2. use spark-streanming to read data from that directory and store it into hdfs
perhaps there is some sort of spark 'connectors' that allows you to read data from a db directly so you dont need to go via spk streaming? hth On Tue, Jun 7, 2016 at 3:09 PM, Ajay Chander <itsche...@gmail.com> wrote: > Hi Spark users, > > Right now we are using spark for everything(loading the data from > sqlserver, apply transformations, save it as permanent tables in hive) in > our environment. Everything is being done in one spark application. > > The only thing we do before we launch our spark application through > oozie is, to load the data from edge node to hdfs(it is being triggered > through a ssh action from oozie to run shell script on edge node). > > My question is, there's any way we can accomplish edge-to-hdfs copy > through spark ? So that everything is done in one spark DAG and lineage > graph? > > Any pointers are highly appreciated. Thanks > > Regards, > Aj >