For info both AvroIO ReadAll/ParseAll and TextIO ReadAll deprecations
were merged into master today and will be part of 2.13.0.
For those working in other SDKs (Python, Go) please pay attention to
not implement such transforms (or deprecate them too if already done)
to keep the API ideas coherent.
+1
Thanks for that Ismaël.
Regards
JB
On 06/02/2019 11:24, Ismaël Mejía wrote:
> Since it seems we have consensus on deprecating both transforms I created
>
> BEAM-6605 Deprecate TextIO.readAll() and TextIO.ReadAll transform
> BEAM-6606 Deprecate AvroIO.readAll() and AvroIO.ReadAll transform
>
Since it seems we have consensus on deprecating both transforms I created
BEAM-6605 Deprecate TextIO.readAll() and TextIO.ReadAll transform
BEAM-6606 Deprecate AvroIO.readAll() and AvroIO.ReadAll transform
Thanks everyone.
On Fri, Feb 1, 2019 at 7:03 PM Chamikara Jayalath wrote:
>
> Python SDK
Python SDK doesn't have FileIO yet so let's keep ReadAllFromFoo transforms
currently available for various file types around till we have that.
Thanks,
Cham
On Fri, Feb 1, 2019 at 7:41 AM Jean-Baptiste Onofré wrote:
> Hi,
>
> readFiles() should be used IMHO. We should remove readAll() to avoid
Hi,
readFiles() should be used IMHO. We should remove readAll() to avoid
confusion.
Regards
JB
On 30/01/2019 17:25, Ismaël Mejía wrote:
> Hello,
>
> A ‘recent’ pattern of use in Beam is to have in file based IOs a
> `readAll()` implementation that basically matches a `PCollection` of
> file pat
+1 to deprecating and not implementing readAll() in transforms where
it is equivalent to matchAll() + readMatches() + readFiles().
It encourages and advertises the use of a nice, composable api reducing the
amount of code to be maintained.
can the Python/Go SDK align with this?
+ 1 to that too.
I want to chime in that I am also +1 to deprecating readAll, is there
anyone strongly pro readAll instead of the explicit composition?
And more important, can the Python/Go SDK align with this (deprecating
ReadAll and implementing ReadFiles)?
On Thu, Jan 31, 2019 at 12:34 AM Chamikara Jayalath
wr
Thanks for the clarification Ismaël and Eugene. +1 for deprecating existing
FooIO.readAll() transforms in favor of FooIO.readFiles().
On Wed, Jan 30, 2019 at 3:25 PM Eugene Kirpichov
wrote:
> TextIO.read() and AvroIO.read() indeed perform better than match() +
> readMatches() + readFiles(), due
TextIO.read() and AvroIO.read() indeed perform better than match() +
readMatches() + readFiles(), due to DWR - so for these two in particular I
would not recommend such a refactoring.
However, new file-based IOs that do not support DWR should only provide
readFiles(). Those that do, should provide
I meant TextIO.readFiles(). As you mention Cham they are in reality the 'same'.
TextIO.readAll = FileIO.match() . FileIO.readMatches . TextIO.readFiles
And this pattern can be repeated to all other file-based IOs. The
whole point of this discussion is to decide if such 'wrapper' tranform
(ReadA
On Wed, Jan 30, 2019 at 2:37 PM Chamikara Jayalath
wrote:
>
>
> On Wed, Jan 30, 2019 at 2:33 PM Ismaël Mejía wrote:
>
>> Ups slight typo, in the first line of the previous email I meant read
>> instead of readAll
>>
>> On Wed, Jan 30, 2019 at 11:32 PM Ismaël Mejía wrote:
>> >
>> > Reuven is rig
On Wed, Jan 30, 2019 at 2:33 PM Ismaël Mejía wrote:
> Ups slight typo, in the first line of the previous email I meant read
> instead of readAll
>
> On Wed, Jan 30, 2019 at 11:32 PM Ismaël Mejía wrote:
> >
> > Reuven is right for the example, readAll at this moment may be faster
> > and also sup
Ups slight typo, in the first line of the previous email I meant read
instead of readAll
On Wed, Jan 30, 2019 at 11:32 PM Ismaël Mejía wrote:
>
> Reuven is right for the example, readAll at this moment may be faster
> and also supports Dynamic Work Rebalancing (DWR), but the performance
> of the
Reuven is right for the example, readAll at this moment may be faster
and also supports Dynamic Work Rebalancing (DWR), but the performance
of the other approach may (and must) be improved to be equal, once the
internal implementation of TextIO.read moves to a SDF version instead
of the FileBasedSo
Yes, this is precisely the goal of SDF.
On Wed, Jan 30, 2019 at 8:41 PM Kenneth Knowles wrote:
>
> So is the latter is intended for splittable DoFn but not yet using it? The
> promise of SDF is precisely this composability, isn't it?
>
> Kenn
>
> On Wed, Jan 30, 2019 at 10:16 AM Jeff Klukas wr
So is the latter is intended for splittable DoFn but not yet using it? The
promise of SDF is precisely this composability, isn't it?
Kenn
On Wed, Jan 30, 2019 at 10:16 AM Jeff Klukas wrote:
> Reuven - Is TextIO.read().from() a more complex case than the topic Ismaël
> is bringing up in this thr
Reuven - Is TextIO.read().from() a more complex case than the topic Ismaël
is bringing up in this thread? I'm surprised to hear that the two examples
have different performance characteristics.
Reading through the implementation, I guess the fundamental difference is
whether a given configuration
Jeff, what you did here is not simply a refactoring. These two are quite
different, and will likely have different performance characteristics.
The first evaluates the wildcard, and allows the runner to pick appropriate
bundling. Bundles might contain multiple files (if they are small), and the
ru
I would prefer we move towards option [2]. I just tried the following
refactor in my own code from:
return input
.apply(TextIO.read().from(fileSpec));
to:
return input
.apply(FileIO.match().filepattern(fileSpec))
.apply(FileIO.readMatches())
.a
Hello,
A ‘recent’ pattern of use in Beam is to have in file based IOs a
`readAll()` implementation that basically matches a `PCollection` of
file patterns and reads them, e.g. `TextIO`, `AvroIO`. `ReadAll` is
implemented by a expand function that matches files with FileIO and
then reads them using
20 matches
Mail list logo