Spark Streaming-- for each new file in HDFS

Kappaganthu, Sivaram (ES) Thu, 15 Sep 2016 10:01:03 -0700

Hello,

I am a newbie to spark and I have  below requirement.


Problem statement : A third party application is dumping files continuously in 
a server. Typically the count of files is 100 files  per hour and each file is 
of size less than 50MB. My application has to  process those files.

Here
1) is it possible  for spark-stream to trigger a job after a file is placed 
instead of triggering a job at fixed batch interval?
2) If it is not possible with Spark-streaming, can we control this with 
Kafka/Flume

Thanks,
Sivaram

----------------------------------------------------------------------
This message and any attachments are intended only for the use of the addressee 
and may contain information that is privileged and confidential. If the reader 
of the message is not the intended recipient or an authorized representative of 
the intended recipient, you are hereby notified that any dissemination of this 
communication is strictly prohibited. If you have received this communication 
in error, notify the sender immediately by return email and delete the message 
and any attachments from your system.

Spark Streaming-- for each new file in HDFS

Reply via email to