Hi, all.  I would like to begin to set up my workflow in Apache Beam, but
only run it on a local machine until our system administrators have the
capacity to set up an adequate (spark or hadoop) cluster.  From the
documentation, I understand that we should be mindful of the memory
requirements of a data set that we use, but is there any alternative (of
course, at the sacrifice of speed) to using a larger data set with the
DirectRunner?  Can we configure it to spill to disk, possibly?

Thanks,
Steve

Reply via email to