15, (8:30 AM PST)

Michael Marshall Tue, 20 Sep 2022 21:38:53 -0700

Hi Pulsar Community,

Here are the meeting notes from Thursday's community meeting. Thanks to
all who participated!


Disclaimer: if something is misattributed or misrepresented, please
send a correction to this list.

Source google doc:
https://docs.google.com/document/d/19dXkVXeU2q_nHmkG8zURjKnYlvD96TbKf5KjYyASsOE

Thanks,
Michael

2022/09/15, (8:30 AM PST)
-   Attendees:
-   Matteo Merli
-   Christophe Bornet
-   Asaf Mesika
-   Qiang Zhao
-   Heesung Sohn
-   Andrey Yegorov
-   Nicolò Boschi
-   Ayman Khalil
-   Dave Fisher
-   Lari Hotari
-   Michael Marshall

-   Discussions/PIPs/PRs (Generally discussed in order they appear)

-   Pulsar CI status and proposed change to address resource shortage.
We’ve had slowness in the past 3 or 4 weeks. Lari opened a GitHub
support ticket, took a week to get a response, but it wasn’t very
helpful. It seems our resources are lower. Lari made a proposal on ML
to update the workflow. The summary: when a PR is opened, we only run
a quick/light check for the PR and not proceed further. We then add a
ready to test label on a PR before we run the later tests. PR authors
can run tests in their own fork if they are relying on it for tests.
When reviewing, a committer can provide instructions/documentation to
the contributor. When tests are passing, we would add a comment to
tell the pulsar bot to make the tests run. Matteo: it’s unfortunate
because it complicates the flow. Could we defer testing until PRs are
approved? Lari: the problem is that we need the test feedback to
approve it. That’s why the contributor should run the tests on their
own side. Matteo: we should only merge when tests pass, approval isn’t
the same as tests passing. Lari: the idea is that the user would be
running the tests in their own fork. Qiang Zhao: Maybe we can run the
quick test first like checkstyle, sportbugs etc. after PR is approved
we can continue other tests. Dave: what if they do not have all of the
commits in their fork? This is a moving target. Michael: that is
currently the case in the apache/pulsar repo too. Dave: what if we
only run certain tests for certain use cases. Lari: we are trying to
address that by moving certain modules out of the project. Matteo: I
think we should move the c++ and python client to a separate repo.
Michael: that would be helpful for release management and for letting
c++ move at its own pace. We would need to make it work for the docker
image which has the python wheels. Matteo: we could make it work by
only shipping with released versions of the python client. Michael:
that would definitely speed up releases for apache/pulsar and the
build for the docker image. Matteo: it would also reduce resource
utilization on the ci because we wouldn’t build the the c++. Lari: the
proposal to have contributors build their project in their repo is
important to reduce resource utilization. Apache Spark requires this
already. Matteo: it could help to have automation that does reporting
of build results in other repos. Coming back to my earlier point, why
not gate tests on PR approval? Lari: this feature is available out of
the box for non-write access users to the repo. Why not use the label
solution? Matteo: I see the compiler/test validation to be separate
from human validation/approval. Additionally, if the tests are passing
in their fork why do we not just trust their tests? Lari: I think
there could be issues with recency to verify that the other tests are
the right tests because you wouldn’t want the other test to be for an
old version of master. Matteo: as long as it is the same branch, then
it will be the same tests that we would run if we ran it in the
apache/pulsar repo. Lari: it’s good to be sure that the pipeline is
the same in our repo. Matteo: the pipeline has to be the same because
it is tied to the branch. It is not that I have a different
configuration in my fork. There will be a diff visible in the PR that
shows any points of divergence. A bot could guide users for this
process and then also verify that the other repo’s tests pass. Lari:
that is doable, one way to do that is that the bot creates a link to
create that PR. Matteo: I’m fine with that. After that, we should use
the bot to track the status of the other repos test results and report
back to the PR. Lari: this will reduce resource consumption, so it
could be faster to start without the bot reporting back. Matteo: I
don’t think the resource constraints will go away soon. Lari: if we’re
able to cut 60% of the current consumption, we’ll be fine and won’t
have any delays. There is also work to move Pulsar SQL out of the main
repo. Dave: do we need an official PIP for that? Matteo: yes, I think
so. Asaf: because the protocol is not thin, do we risk conflicts for
the other client? Matteo: the advantage right now is that if you break
the protocol in Java, it will break C++ tests. The reverse is that we
have a humongous repo. If we break the protocol, well, that is why we
require PIPs for protocol changes to get eyes on the changes. The same
is with the Go client, which is outside. Dave: node js too. Michael:
the cost of the c++ build is high and we don’t change the protocol
that often. Matteo: we could use a nightly job to build the pulsar
docker image and then have the go and c++ client test against that
nightly job. That would help catch issues sooner. Dave: you could also
just run those tests against the latest release. Matteo: the problem
is that it doesn’t guarantee that we catch things before upgrading. If
we run against master, we’ll discover earlier. Dave: why not do both
test against latest release and current master? Matteo: sure. Dave:
that would let us do regression tests. I like the idea of nightly
integrations. Matteo: we should publish nightly images for the broker
and use those. Lari: one note about pulsar io and pulsar sql, we’ll
need to be careful to make sure tests are run similarly to the client
tests just discussed. We’ll need a way to build the pulsar-all docker
image. There is currently https://github.com/apache/pulsar-release.
Matteo: the all image is a bit too much, often. I’d rather keep the
sql out of the image. With the connectors, it is practical to have all
of them, but it’s also a 3 gb image. We could give users a way to
easily install connectors. Michael: 3 gb is definitely big. Matteo:
there are definitely trade offs.

-   Michael: would it be valuable to move away from the stale label to
use a label that indicates that a PR needs reviews? Then, a label can
make it clear whether a PR needs reviews or if the contributor needs
to make changes. Matteo: it could help to make it clearer. There is a
lot of duplication of effort, and we want multiple reviewers, but we
also want to distribute time well to review many PRs. Michael:
similarly issues do not always get responses. We could try to
coordinate on this by asking for volunteers and scheduling.

-   Michael: sharing an observation about watching the zookeeper /.
The bookkeeper watches all of these when it gets its lock own its
/ledgers/available/bookie-id. The bookie doesn’t need any other
notifications for that client. I know there are reasons we’ve
identified for wanting to use the persistent recursive watch on /, but
it seems like a lot of overhead for the single bookie client. Matteo:
there are a lot of complexity for this. The main cost is getting
notifications and then checking a hash map to see if it cares about
the update. That should be a fast check. The bookie will cache the
list of ledgers. Before it was always getting those from zookeeper,
but now the caching decreases the network utilization. Before it was
doing this lookup very frequently. It’s not just the bookie. Michael:
I thought the bookie had multiple sessions based on an observation.
Matteo: a bookie should only have one session. Auto recovery might be
adding extra ones. We might be able to improve the bookie by only
having a watch on the `/ledger` znode to cut off a bunch of the other
traffic.

Pulsar Community Meeting Notes 2022/09/15, (8:30 AM PST)

Reply via email to