Could you explain what you're trying to do? It should have no batch for no data in stream, so it will end up to no-op even it is possible.
- Jungtaek Lim (HeartSaVioR) 2018년 11월 6일 (화) 오전 8:29, Arun Manivannan <a...@arunma.com>님이 작성: > Hi, > > I would like to create a "zero" value for a Structured Streaming Dataframe > and unfortunately, I couldn't find any leads. With Spark batch, I can do a > "emptyDataFrame" or "createDataFrame" with "emptyRDD" but with > StructuredStreaming, I am lost. > > If I use the "emptyDataFrame" as the zero value, I wouldn't be able to > join them with any other DataFrames in the program because Spark doesn't > allow you to mix batch and stream data frames. (isStreaming=false for the > Batch ones). > > Any clue is greatly appreciated. Here are the alternatives that I have at > the moment. > > *1. Reading from an empty file * > *Disadvantages : poll is expensive because it involves IO and it's error > prone in the sense that someone might accidentally update the file.* > > val emptyErrorStream = (spark: SparkSession) => { > spark > .readStream > .format("csv") > .schema(DataErrorSchema) > > .load("/Users/arunma/IdeaProjects/OSS/SparkDatalakeKitchenSink/src/test/resources/dummy1.txt") > .as[DataError] > } > > *2. Use MemoryStream* > > *Disadvantages: MemoryStream itself is not recommended for production use > because of the ability to mutate it but I am converting it to DS immediately. > So, I am leaning towards this at the moment. * > > > val emptyErrorStream = (spark:SparkSession) => { > implicit val sqlC = spark.sqlContext > MemoryStream[DataError].toDS() > } > > Cheers, > Arun >