Welcome aboard and great idea ;)

Probably DoFn is easier and straight forward in your case.

Anyway, it would be a good addition to the SDK (as primitives), or at least in a "Data Integration DSL" (as I thought first).

Regards
JB

On 03/20/2016 09:01 PM, Ismaël Mejía wrote:
Hello,

I agree with you JB, Log is a more appropriate name for the case of 'print',
we can definitely create a richer transform with your ideas, and we will
discuss the details later on when we start to work together.

The more abstract case which I call Debug since I didn't find a better
name is a
general transform that can be the base of many others who produce side
effects
but don't change the data in the PTransform, that's why I consider it a
different (more abstract) Transform per se, and I implemented the general
predicate + function application just to prove my point, and the
Log/print case
was just a test of a specific case.

Since I am new to the Dataflow model I don't know which unintended
consequences
this transform can have (or which good practices a transform that
side-effects must
take care of), aditionally I have not thought about how to support more
advanced features of the model (e.g. side inputs/outputs). Any ideas ?

But well, this is my hello world in the Dataflow model, so we'll see
what's to
come :)

-Ismaël



On Sun, Mar 20, 2016 at 4:18 PM, Jean-Baptiste Onofré <[email protected]
<mailto:[email protected]>> wrote:

    Hi,

    thanks for the update.

    IMHO, I would name Debug transform as Log:

    .apply(Log.withLevel("DEBUG"))
    .apply(Log.withLevel("INFO").withPattern("%d %m ..."))
    .apply(Log.withLevel("WARN").withMessage("Foo").withStream("System.out")

    It would more flexible and related to the actual behavior.

    I would mimic a bit the Camel log component for instance.

    If you don't mind, I will do it with you.

    Thanks
    Regards
    JB

    On 03/20/2016 12:07 PM, Ismaël Mejía wrote:

        Hi,

        The code of the transform is here in a playground for Beam
        experiments I
        created (it is a bit alpha for the moment, and it does not have
        comments):

        
https://github.com/iemejia/beam-playground/blob/master/src/main/java/org/apache/beam/transforms/Debug.java

        Since my initial goal was more of a test scenario in the
        DirectPipelineRunner I haven't considered yet more advanced logging
        capabilities and the possible issues of distribution
        (serialization, in
        particular of dependencies, as well as exceptions, etc), but of
        course
        it is something I expect to improve if there is interest. Do you see
        some immediate things to improve to try it with the distributed
        runners
        (I want to do this, as a excuse also to  try the FlinkRunner).

        Best,
        -Ismael


        On Sun, Mar 20, 2016 at 11:13 AM, Jean-Baptiste Onofré
        <[email protected] <mailto:[email protected]>
        <mailto:[email protected] <mailto:[email protected]>>> wrote:

             By the way, for the "Integration" DSL, in addition of
        explicit debug
             transform, it would make sense to have an implicit
        "Tracer". It's
             something that I planned: it would allow us to have sampling on
             PCollection if the pipeline tracer is enabled (like we do
        in a Camel
             route with the tracer).

             Regards
             JB

             On 03/20/2016 10:14 AM, Ismaël Mejía wrote:

                 ​Hello,

                 I just started playing with Beam and I wanted to debug
        what happens
                 between transforms in pipelines. I wrote a simple 'Debug'
                 transform for
                 this.
                 The idea is to apply a function based on a predicate to any
                 element in a
                 collection without changing the collection, or in other
        words, a
                 transform that
                 does not transform but produces side effects.

                 The idea is better illustrated with this simple example:

                       .apply(FlatMapElements.via((String text) ->
                 Arrays.asList(text.split(" ")))
                         .withOutputType(new TypeDescriptor<String>() {
                        }))
                       .apply(Debug
                         .when((String s) -> s.startsWith("A"))
                         .with((String s) -> {
                           System.out.println(s);
                           return null;
                         }));
                       .apply(Filter.byPredicate((String text) ->
        text.length() > 5))
                       .apply(Debug.print());  // sugared method, same
        as above

                 I think this can be useful (at least for debugging
        purposes), is
                 there
                 something
                 like this already in the SDK ? If this is not the case,
        can you
                 please
                 give me some
                 feedback/ideas to improve my transform.

                 Thanks,
                 -Ismael

                 ps. You can find the code of the first version of the
        transform
                 here:
        
https://github.com/iemejia/beam-playground/blob/master/src/main/java/org/apache/beam/transforms/Debug.java



             --
             Jean-Baptiste Onofré
        [email protected] <mailto:[email protected]>
        <mailto:[email protected] <mailto:[email protected]>>
        http://blog.nanthrax.net
             Talend - http://www.talend.com



    --
    Jean-Baptiste Onofré
    [email protected] <mailto:[email protected]>
    http://blog.nanthrax.net
    Talend - http://www.talend.com



--
Jean-Baptiste Onofré
[email protected]
http://blog.nanthrax.net
Talend - http://www.talend.com

Reply via email to