Yes, I meant batch interval. Thanks for clarifying. Cheers,
Michael On Oct 7, 2014, at 11:14 PM, jayant [via Apache Spark User List] <ml-node+s1001560n15904...@n3.nabble.com> wrote: > Hi Michael, > > I think you are meaning batch interval instead of windowing. It can be > helpful for cases when you do not want to process very small batch sizes. > > HDFS sink in Flume has the concept of rolling files based on time, number of > events or size. > https://flume.apache.org/FlumeUserGuide.html#hdfs-sink > > The same could be applied to Spark if the use cases demand. The only major > catch would be that it breaks the concept of window operations which are in > Spark. > > Thanks, > Jayant > > > > > On Tue, Oct 7, 2014 at 10:19 PM, Michael Allman <[hidden email]> wrote: > Hi Andrew, > > The use case I have in mind is batch data serialization to HDFS, where sizing > files to a certain HDFS block size is desired. In my particular use case, I > want to process 10GB batches of data at a time. I'm not sure this is a > sensible use case for spark streaming, and I was trying to test it. However, > I had trouble getting it working and in the end I decided it was more trouble > than it was worth. So I decided to split my task into two: one streaming job > on small, time-defined batches of data, and a traditional Spark job > aggregating the smaller files into a larger whole. In retrospect, I think > this is the right way to go, even if a count-based window specification was > possible. Therefore, I can't suggest my use case for a count-based window > size. > > Cheers, > > Michael > > On Oct 5, 2014, at 4:03 PM, Andrew Ash <[hidden email]> wrote: > >> Hi Michael, >> >> I couldn't find anything in Jira for it -- >> https://issues.apache.org/jira/issues/?jql=project%20%3D%20SPARK%20AND%20text%20~%20%22window%22%20AND%20component%20%3D%20Streaming >> >> Could you or Adrian please file a Jira ticket explaining the functionality >> and maybe a proposed API? This will help people interested in count-based >> windowing to understand the state of the feature in Spark Streaming. >> >> Thanks! >> Andrew >> >> On Fri, Oct 3, 2014 at 4:09 PM, Michael Allman <[hidden email]> wrote: >> Hi, >> >> I also have a use for count-based windowing. I'd like to process data >> batches by size as opposed to time. Is this feature on the development >> roadmap? Is there a JIRA ticket for it? >> >> Thank you, >> >> Michael >> >> >> >> -- >> View this message in context: >> http://apache-spark-user-list.1001560.n3.nabble.com/window-every-n-elements-instead-of-time-based-tp2085p15701.html >> Sent from the Apache Spark User List mailing list archive at Nabble.com. >> >> --------------------------------------------------------------------- >> To unsubscribe, e-mail: [hidden email] >> For additional commands, e-mail: [hidden email] >> >> > > > > > If you reply to this email, your message will be added to the discussion > below: > http://apache-spark-user-list.1001560.n3.nabble.com/window-every-n-elements-instead-of-time-based-tp2085p15904.html > To unsubscribe from window every n elements instead of time based, click here. > NAML -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/window-every-n-elements-instead-of-time-based-tp2085p15905.html Sent from the Apache Spark User List mailing list archive at Nabble.com.