+1
Ran the verification scripts.
Caveats:
- I input a GCS bucket that did not exist, expecting it to be created, so
the Dataflow tests failed.
- I also skipped the Python tests that asked to write to GitHub.
- You also have not built, staged, & signed the Python wheels. It is a bit
hidden in
I am not so sure this is a good idea. Here are some systems and their
precision:
Arrow - microseconds
BigQuery - microseconds
New Java instant - nanoseconds
Firestore - microseconds
Protobuf - nanoseconds
Dataflow backend - microseconds
Postgresql - microseconds
Pubsub publish time - nanoseconds
M
The Python SDK currently uses timestamps in microsecond resolution while
Java SDK, as most would probably expect, uses milliseconds.
This causes a few difficulties with portability (Python coders need to
convert to millis for WindowedValue and Timers, which is related to a bug
I'm looking into:
h
Thanks, Yifan.
1. It appears that there are 32 jenkins-related instances, 16 cores each,
which consume over 2/3 of available CPU quota.
2. Among old VMs there are 6 1-core VMs, that look like
"gke-io-datastores-*" and "gke-metrics-*". They don't consume much quota,
but I am curious why do we have
I opened https://github.com/apache/beam/pull/8319 to eliminate the
duplicate yaml file (and cover timestamp coder for the Python SDK). Would
appreciate if someone could take a look. (PR doesn't affect the
StrUtf8Coder subject, but it is required to fix a timer bug.)
Thanks,
Thomas
On Fri, Apr 12
Hm I am not very familiar with POI, but if its transforms are able to take
in a file descriptor, you should be able to use FileIO.match()[0] to find
your files (local, or in GCS/S3/HDFS); and FileIO.readMatches()[1] to get
file descriptors for these files.
If the POI libraries require the files to
We recently created 16 compute instances for the Jenkins. Each one of them
has 16 CPUs, means they consume 256 CPU in total. I guess that is why the
CPU usage in us-central1 remains high. We're working on the migrating the
rest of old Jenkins agents, and the old instances will be removed once
finis
FYI, I have recently observed a large amount of test failures in Beam test
suites where Dataflow Jobs failed due to a lack of CPU quota in
apache-beam-testing project.
We have been adding new suites for Python 3.x versions, which may have
contributed to this. problem.
I have not investigated yet
Not sure: my case is using a nested class and the error is a stack overflow
(or infinite recursion detection is triggered).
It is odd though that they have the same workaround.
smime.p7s
Description: S/MIME Cryptographic Signature
This looks very similar to https://github.com/uqfoundation/dill/issues/300,
however we observed that bug on Python 3, and not on Python 2.7.
On Tue, Apr 16, 2019 at 10:58 AM Udi Meiri wrote:
> I was looking at migrating unit tests to pytest and found this test which
> doesn't pass:
> https://gis
> it would be good to have a sort of weekly report on dead links
Seeing as checking for broken external links returns a lot of false
positives, I'd rather not spam everyone with them. However, I don't
know if making it a postcommit will give it sufficient visibility. Not
sure what the best way to
I was looking at migrating unit tests to pytest and found this test which
doesn't pass:
https://gist.github.com/udim/a71fcb278b56a9a5b7962f4588e14efb (stack
overflow)
(requires installing python3.7 and "python3.7 -m pip install pytest".)
The same command passes with python2.7 and python3.5.
I trie
On Tue, Apr 16, 2019 at 9:18 AM Reuven Lax wrote:
> A common request (especially in streaming) is to support sorting values by
> timestamp, not by the full value.
>
On this point, I think an explicit secondary key probably addresses the
need. Naively implemented, the "sort by values" use case wo
This is a good conversation. Some things to consider:
Since Beam is cross language, the "shufflers" can usually only sort by
binary value. This is different than other systems where custom comparators
can be used for sorting. We might need to introduce OrderPreservingCoder,
and mark the coders tha
1. This is clearly useful, and extensively used. Agree with all that. I
think it can work for batch and streaming equally well if sorting is
required only per "pane", though I might be overlooking something.
2. A transform need not be primitive to be well-defined and executed in a
special way by m
At the moment, portability has GroupByKey transform. In most data
processing frameworks, such as Hadoop MR and Apache Spark there is a
concept of secondary sorting during the shuffle phase. Dataflow worker code
has it under the name BatchViewOverrides.GroupByKeyAndSortValuesOnly [1],
it's PTransfor
Thanks, Ryan for a great introduction to the topic - it helped a lot! Let
me try to fuse all the discussions we had in this one thread.
You mentioned[1] that you thought of something similar and asked what
problems did I face so let me explain it here as clear as I can:
The main trouble I had is
Congratulations!
On Sat, Apr 13, 2019 at 12:53 AM Thomas Weise wrote:
> Congrats!
>
>
> On Thu, Apr 11, 2019 at 6:03 PM Reuven Lax wrote:
>
>> Congratulations Boyuan!
>>
>> On Thu, Apr 11, 2019 at 4:53 PM Ankur Goenka wrote:
>>
>>> Congrats Boyuan!
>>>
>>> On Thu, Apr 11, 2019 at 4:52 PM Mark
+1 to removing link validation for website changes. However it would be
good to have a sort of weekly report on dead links or another alternative
to be aware of them.
On Tue, Apr 16, 2019 at 2:43 AM Kyle Weaver wrote:
> I agree with Andrew that the external links checks are ultra-flaky and
> sel
19 matches
Mail list logo