I agree with Davor, it's what I tried to express: even if a custom DoFn or PTransform is always possible, maybe it makes sense to provide such functions directly in the SDK.

Regards
JB

On 03/20/2016 08:13 PM, Davor Bonaci wrote:
Of course, this functionality can be implemented manually in a DoFn --
this wouldn't be a primitive. That said, an argument could be made that
a Beam SDK should provide a composite that logs elements in a
PCollection. This can further be augmented with proposals like
"log_every_n_seconds" or "log_every_n_elements", etc. There's no such
concept in the Java SDK right now, but can be easily constructed by users.

Now, there are several counter-arguments as well:

  * An approach where we provide a logging PTransform would be less
    flexible / powerful than a DoFn + direct usage of a logging library.
    Regardless, one might want to optimize the common, simple case.
  * A runner could choose to sample elements of all PCollections to
    provide this automatically, perhaps without any intervention from
    the pipeline writer (and/or enable this behavior on-demand, while
    the pipeline is running).

Regarding runner portability, our thinking is that logging ought to work
across runners, regardless of the choice of logging library. This is
important to be able to use other, non-Beam related libraries, that
neither we or the pipeline developers can control. Basically, a runner
can set up logging bridges between major logging frameworks and standard
streams. Then, the runner gets all logs from Beam SDK, user code, any
random libraries that users may choose, etc. -- it all works transparently.

On Sun, Mar 20, 2016 at 10:55 AM, Dan Halperin <[email protected]
<mailto:[email protected]>> wrote:

    Sorry for short phone email. Check out this Transform from a pull
    request.

    
https://github.com/elibixby/DataflowJavaSDK/blob/51ee732964b3425c6c4a8677d135c41765d9bcdc/contrib/firebaseio/src/main/java/contrib/LogElements.java

    The key subtleties are around making sure you don't log too
    often--logging per element is very expensive.

    Happy to discuss more if you have questions, and again sorry I'm on
    a phone so can't provide a more comprehensive response.

    On Sun, Mar 20, 2016 at 10:21 Dan Halperin <[email protected]
    <mailto:[email protected]>> wrote:

        Hi Ismael,

        I think you can just do this with a normal DoFn. Why do you
        think this needs to be a new primitive?
        On Sun, Mar 20, 2016 at 10:20 Dan Halperin <[email protected]
        <mailto:[email protected]>> wrote:

            Hi is
            On Sun, Mar 20, 2016 at 08:18 Jean-Baptiste Onofré
            <[email protected] <mailto:[email protected]>> wrote:

                Hi,

                thanks for the update.

                IMHO, I would name Debug transform as Log:

                .apply(Log.withLevel("DEBUG"))
                .apply(Log.withLevel("INFO").withPattern("%d %m ..."))
                
.apply(Log.withLevel("WARN").withMessage("Foo").withStream("System.out")

                It would more flexible and related to the actual behavior.

                I would mimic a bit the Camel log component for instance.

                If you don't mind, I will do it with you.

                Thanks
                Regards
                JB

                On 03/20/2016 12:07 PM, Ismaël Mejía wrote:
                 > Hi,
                 >
                 > The code of the transform is here in a playground for
                Beam experiments I
                 > created (it is a bit alpha for the moment, and it
                does not have comments):
                 >
                 >
                
https://github.com/iemejia/beam-playground/blob/master/src/main/java/org/apache/beam/transforms/Debug.java
                 >
                 > Since my initial goal was more of a test scenario in the
                 > DirectPipelineRunner I haven't considered yet more
                advanced logging
                 > capabilities and the possible issues of distribution
                (serialization, in
                 > particular of dependencies, as well as exceptions,
                etc), but of course
                 > it is something I expect to improve if there is
                interest. Do you see
                 > some immediate things to improve to try it with the
                distributed runners
                 > (I want to do this, as a excuse also to  try the
                FlinkRunner).
                 >
                 > Best,
                 > -Ismael
                 >
                 >
                 > On Sun, Mar 20, 2016 at 11:13 AM, Jean-Baptiste
                Onofré <[email protected] <mailto:[email protected]>
                 > <mailto:[email protected] <mailto:[email protected]>>> wrote:
                 >
                 >     By the way, for the "Integration" DSL, in
                addition of explicit debug
                 >     transform, it would make sense to have an
                implicit "Tracer". It's
                 >     something that I planned: it would allow us to
                have sampling on
                 >     PCollection if the pipeline tracer is enabled
                (like we do in a Camel
                 >     route with the tracer).
                 >
                 >     Regards
                 >     JB
                 >
                 >     On 03/20/2016 10:14 AM, Ismaël Mejía wrote:
                 >
                 >         ​Hello,
                 >
                 >         I just started playing with Beam and I wanted
                to debug what happens
                 >         between transforms in pipelines. I wrote a
                simple 'Debug'
                 >         transform for
                 >         this.
                 >         The idea is to apply a function based on a
                predicate to any
                 >         element in a
                 >         collection without changing the collection,
                or in other words, a
                 >         transform that
                 >         does not transform but produces side effects.
                 >
                 >         The idea is better illustrated with this
                simple example:
                 >
                 >               .apply(FlatMapElements.via((String text) ->
                 >         Arrays.asList(text.split(" ")))
                 >                 .withOutputType(new
                TypeDescriptor<String>() {
                 >                }))
                 >               .apply(Debug
                 >                 .when((String s) -> s.startsWith("A"))
                 >                 .with((String s) -> {
                 >                   System.out.println(s);
                 >                   return null;
                 >                 }));
                 >               .apply(Filter.byPredicate((String text)
                -> text.length() > 5))
                 >               .apply(Debug.print());  // sugared
                method, same as above
                 >
                 >         I think this can be useful (at least for
                debugging purposes), is
                 >         there
                 >         something
                 >         like this already in the SDK ? If this is not
                the case, can you
                 >         please
                 >         give me some
                 >         feedback/ideas to improve my transform.
                 >
                 >         Thanks,
                 >         -Ismael
                 >
                 >         ps. You can find the code of the first
                version of the transform
                 >         here:
                 >
                
https://github.com/iemejia/beam-playground/blob/master/src/main/java/org/apache/beam/transforms/Debug.java
                 >
                 >
                 >
                 >     --
                 >     Jean-Baptiste Onofré
                 > [email protected] <mailto:[email protected]>
                <mailto:[email protected] <mailto:[email protected]>>
                 > http://blog.nanthrax.net
                 >     Talend - http://www.talend.com
                 >
                 >

                --
                Jean-Baptiste Onofré
                [email protected] <mailto:[email protected]>
                http://blog.nanthrax.net
                Talend - http://www.talend.com



--
Jean-Baptiste Onofré
[email protected]
http://blog.nanthrax.net
Talend - http://www.talend.com

Reply via email to