Yes. On Wed, Aug 12, 2015 at 12:12 PM, Mohit Anchlia <mohitanch...@gmail.com> wrote:
> Thanks! To write to hdfs I do need to use saveAs method? > > On Wed, Aug 12, 2015 at 12:01 PM, Tathagata Das <t...@databricks.com> > wrote: > >> This is how Spark does. It writes the task output to a uniquely-named >> temporary file, and then atomically (after the task successfully completes) >> renames the temp file to the expected file name <file>/<partition-XXX> >> >> >> On Tue, Aug 11, 2015 at 9:53 PM, Mohit Anchlia <mohitanch...@gmail.com> >> wrote: >> >>> Thanks for the info. When data is written in hdfs how does spark keeps >>> the filenames written by multiple executors unique >>> >>> On Tue, Aug 11, 2015 at 9:35 PM, Hemant Bhanawat <hemant9...@gmail.com> >>> wrote: >>> >>>> Posting a comment from my previous mail post: >>>> >>>> When data is received from a stream source, receiver creates blocks of >>>> data. A new block of data is generated every blockInterval milliseconds. N >>>> blocks of data are created during the batchInterval where N = >>>> batchInterval/blockInterval. A RDD is created on the driver for the blocks >>>> created during the batchInterval. The blocks generated during the >>>> batchInterval are partitions of the RDD. >>>> >>>> Now if you want to repartition based on a key, a shuffle is needed. >>>> >>>> On Wed, Aug 12, 2015 at 4:36 AM, Mohit Anchlia <mohitanch...@gmail.com> >>>> wrote: >>>> >>>>> How does partitioning in spark work when it comes to streaming? What's >>>>> the best way to partition a time series data grouped by a certain tag like >>>>> categories of product video, music etc. >>>>> >>>> >>>> >>> >> >