Benjamin, Sorry, the success and failures are a bit too nuanced for an email.
A quick check on average CAD files says they're around 1 MB. That'd be a poor use of HDFS. Thanks, Jesse On Mon, May 23, 2016 at 11:08 AM Stadin, Benjamin < [email protected]> wrote: > Hi Jesse, > > Yes, this is what I’m looking for. I want to deploy and run the same code, > mostly written in Python as well as C++, on different nodes. I also want to > benefit from the job distribution and job monitoring / administration > capabilities. I only need parallelization to a minor degree later. > > Though I’m hesitant to use HDFS, or any other distributed file system. > Since I process the data only on one node, it will probably be big > disadvantage for this data to be distributed to other nodes as well via > HDFS. > > Could you maybe share some info about the successful implementations and > configurations of such distributed job engine? > > Thanks > Ben > > Von: Jesse Anderson <[email protected]> > Antworten an: "[email protected]" < > [email protected]> > Datum: Montag, 23. Mai 2016 um 19:22 > An: "[email protected]" <[email protected]> > Betreff: Re: Force pipe executions to run on same node > > Benjamin, > > I've had a few students using Big Data frameworks as a distributed job > engine. They work in varying degrees of success. > > With Beam, your success will really depend on the runner as JB said. If I > understand your use case correctly, if you were using Hadoop MapReduce, > you'd be using a map-only job. Beam would give you the ability to run the > same code on several different execution engines. If that isn't your goal, > you might look elsewhere. > > Thanks, > > Jesse > > On Mon, May 23, 2016 at 6:47 AM Jean-Baptiste Onofré <[email protected]> > wrote: > >> Hi Benjamin, >> >> Your data processing doesn't seem to be fully big data oriented and >> distributed. >> >> Maybe Apache Camel is more appropriate for such scenario. You can always >> delegate part of the data processing to Beam from Camel (using Kafka >> topic for instance). >> >> Regards >> JB >> >> On 05/22/2016 11:01 PM, Stadin, Benjamin wrote: >> > Hi JB, >> > >> > None so far. I¹m still thinking about how to achieve what I want to do, >> > and whether Beam makes sense for my usage scenario. >> > >> > I¹m mostly interested to just orchestrate tasks to individual machines >> and >> > service endpoints, depending on their workload. My application is not so >> > much about Big Data and parallelism, but local data processing and local >> > parallelization. >> > >> > An example scenario: >> > - A user uploads a set of CAD files >> > - data from CAD files are extracted in parallel >> > - a whole bunch of native tools operate on this extracted data set in an >> > own pipe. Due to the amount of data generated and consumed, it doesn¹t >> > make sense at all to distribute these tasks to other machines. It¹s very >> > IO bound. >> > - For the same reason, it doesn¹t make sense to distribute data using >> RDD. >> > It¹s rather favorable to do only some tasks (such as CAD data >> extraction) >> > in parallel, otherwise run other data tasks as a group on a single node, >> > in order to avoid IO bottle necks. >> > >> > So I don¹t have a typical Big Data processing in mind. What I¹m looking >> > for is rather an integrated environment to provide only some kind of >> > parallel task execution, and task management and administration, as well >> > as a message bus and event system. >> > >> > Is Beam a choice for such rather non-Big-Data scenario? >> > >> > Regards, >> > Ben >> > >> > >> > Am 21.05.16, 18:59 schrieb "Jean-Baptiste Onofré" unter < >> [email protected]>: >> > >> >> Hi Ben, >> >> >> >> it's not SDK related, it's more depend on the runner. >> >> >> >> What runner are you using ? >> >> >> >> Regards >> >> JB >> >> >> >> On 05/21/2016 04:22 PM, Stadin, Benjamin wrote: >> >>> Hi, >> >>> >> >>> I need to control beam pipes/filters so that pipe executions that >> match >> >>> a certain criteria are executed on the same node. >> >>> >> >>> In Spring XD this can be controlled by defining groups >> >>> >> >>> ( >> http://docs.spring.io/spring-xd/docs/1.2.0.RELEASE/reference/html/#deplo >> >>> yment) >> >>> and then specify deployment criteria to match this group. >> >>> >> >>> Is this possible with Beam? >> >>> >> >>> Best >> >>> Ben >> >> >> >> -- >> >> Jean-Baptiste Onofré >> >> [email protected] >> >> http://blog.nanthrax.net >> >> Talend - http://www.talend.com >> > >> >> -- >> Jean-Baptiste Onofré >> [email protected] >> http://blog.nanthrax.net >> Talend - http://www.talend.com >> >
