Welcome aboard and great idea ;) Probably DoFn is easier and straight forward in your case.
Anyway, it would be a good addition to the SDK (as primitives), or at least in a "Data Integration DSL" (as I thought first).
Regards JB On 03/20/2016 09:01 PM, Ismaël Mejía wrote:
Hello, I agree with you JB, Log is a more appropriate name for the case of 'print', we can definitely create a richer transform with your ideas, and we will discuss the details later on when we start to work together. The more abstract case which I call Debug since I didn't find a better name is a general transform that can be the base of many others who produce side effects but don't change the data in the PTransform, that's why I consider it a different (more abstract) Transform per se, and I implemented the general predicate + function application just to prove my point, and the Log/print case was just a test of a specific case. Since I am new to the Dataflow model I don't know which unintended consequences this transform can have (or which good practices a transform that side-effects must take care of), aditionally I have not thought about how to support more advanced features of the model (e.g. side inputs/outputs). Any ideas ? But well, this is my hello world in the Dataflow model, so we'll see what's to come :) -Ismaël On Sun, Mar 20, 2016 at 4:18 PM, Jean-Baptiste Onofré <[email protected] <mailto:[email protected]>> wrote: Hi, thanks for the update. IMHO, I would name Debug transform as Log: .apply(Log.withLevel("DEBUG")) .apply(Log.withLevel("INFO").withPattern("%d %m ...")) .apply(Log.withLevel("WARN").withMessage("Foo").withStream("System.out") It would more flexible and related to the actual behavior. I would mimic a bit the Camel log component for instance. If you don't mind, I will do it with you. Thanks Regards JB On 03/20/2016 12:07 PM, Ismaël Mejía wrote: Hi, The code of the transform is here in a playground for Beam experiments I created (it is a bit alpha for the moment, and it does not have comments): https://github.com/iemejia/beam-playground/blob/master/src/main/java/org/apache/beam/transforms/Debug.java Since my initial goal was more of a test scenario in the DirectPipelineRunner I haven't considered yet more advanced logging capabilities and the possible issues of distribution (serialization, in particular of dependencies, as well as exceptions, etc), but of course it is something I expect to improve if there is interest. Do you see some immediate things to improve to try it with the distributed runners (I want to do this, as a excuse also to try the FlinkRunner). Best, -Ismael On Sun, Mar 20, 2016 at 11:13 AM, Jean-Baptiste Onofré <[email protected] <mailto:[email protected]> <mailto:[email protected] <mailto:[email protected]>>> wrote: By the way, for the "Integration" DSL, in addition of explicit debug transform, it would make sense to have an implicit "Tracer". It's something that I planned: it would allow us to have sampling on PCollection if the pipeline tracer is enabled (like we do in a Camel route with the tracer). Regards JB On 03/20/2016 10:14 AM, Ismaël Mejía wrote: Hello, I just started playing with Beam and I wanted to debug what happens between transforms in pipelines. I wrote a simple 'Debug' transform for this. The idea is to apply a function based on a predicate to any element in a collection without changing the collection, or in other words, a transform that does not transform but produces side effects. The idea is better illustrated with this simple example: .apply(FlatMapElements.via((String text) -> Arrays.asList(text.split(" "))) .withOutputType(new TypeDescriptor<String>() { })) .apply(Debug .when((String s) -> s.startsWith("A")) .with((String s) -> { System.out.println(s); return null; })); .apply(Filter.byPredicate((String text) -> text.length() > 5)) .apply(Debug.print()); // sugared method, same as above I think this can be useful (at least for debugging purposes), is there something like this already in the SDK ? If this is not the case, can you please give me some feedback/ideas to improve my transform. Thanks, -Ismael ps. You can find the code of the first version of the transform here: https://github.com/iemejia/beam-playground/blob/master/src/main/java/org/apache/beam/transforms/Debug.java -- Jean-Baptiste Onofré [email protected] <mailto:[email protected]> <mailto:[email protected] <mailto:[email protected]>> http://blog.nanthrax.net Talend - http://www.talend.com -- Jean-Baptiste Onofré [email protected] <mailto:[email protected]> http://blog.nanthrax.net Talend - http://www.talend.com
-- Jean-Baptiste Onofré [email protected] http://blog.nanthrax.net Talend - http://www.talend.com
