On Mon, Mar 13, 2023 at 11:33 AM Godefroy Clair <godefroy.cl...@gmail.com>
wrote:

> Hi,
> I am wondering about the way `Flatten()` and `FlatMap()` are implemented
> in Apache Beam Python.
> In most functional languages, FlatMap() is the same as composing
> `Flatten()` and `Map()` as indicated by the name, so Flatten() and
> Flatmap() have the same input.
> But in Apache Beam, Flatten() is using _iterable of PCollections_ while
> FlatMap() is working with _PCollection of Iterables_.
>
> If I am not wrong, the signature of Flatten, Map and FlatMap are :
> ```
> Flatten:: Iterable[PCollections[A]] -> PCollection[A]
> Map:: (PCollection[A], (A-> B)) -> PCollection[B]
> FlatMap:: (PCollection[Iterable[A]], (A->B)) -> [A]
>

FlatMap is actually (PCollection[A], (A->Iterable[B])) -> PCollection[B].


> ```
>
> So my question is is there another "Flatten-like" function  with this
> signature :
> ```
> anotherFlatten:: PCollection[Iterable[A]] -> PCollection[A]
> ```
>
> One of the reason this would be useful, is that when you just want to
> "flatten" a `PCollection` of `iterable` you have to use `FlatMap()`with an
> identity function.
>
> So instead of writing:
> `FlatMap(lambda e: e)`
> I would like to use a function
> `anotherFlatten()`
>

As Reuven mentions, Beam's Flatten could have been called Union, in which
case we'd free up the name Flatten for the PCollection[Iterable[A]] ->
PCollection[A] operation. It's Flatten for historical reasons, and would be
difficult to change now.

FlumeJava uses static constructors to provide Flatten.Iterables:
PCollection[Iterable[A]] -> PCollection[A] vs.  Flatten.PCollections:
Iterable[PCollection[A]] -> PCollection[A].

If you want a FlattenIterables in Python, you could easily implement it as
a composite transform [2] whose implementation is passing the identity
function to FlatMap.

[1]
https://beam.apache.org/releases/javadoc/current/org/apache/beam/sdk/transforms/Flatten.html
[2]
https://beam.apache.org/documentation/programming-guide/#composite-transforms

Reply via email to