Hi

I want to explain that if you want me to scale, we need to follow some
rules. Sorry for being somewhat rude sometimes, it's usually not
deserved. In particular I apologize for the last thread to Jenn.

This is really about scalability. There is 100+ try server users. They
run a few try a day each, well, at least the productive people, which
excludes me. Each job is run usually on 3 'builders'. That results in
*thousands* of runs each days. So yes, it does break in awful ways
*every days*.

Let me explain why there is this sentence in the try job emails:
"If you think the try slave is broken (it happens!), please REPLY to
this email, don't ask on irc, mailing list or IM."

Some facts to back this sentence:
- chromium-dev@ and the equivalent googler mailing lists have
thousands of people and only a few care.

- The irc channel has 150+ persons but if you don't mind not getting
an answer, go ahead. :)

- IM me (or Nicolas or Brad), seems like a good idea but if it happens
50 times a day for different problems, we
really can't get any work done since we're constantly interrupted.

- Emailing tryser...@chromium.org is the way to go. I receive it on my
phone and I can batch replies together. In addition, at the end of the
week, I can get an approximation of how much time this took me.
Sometimes I don't reply, especially for flaky test. Be sure these
emails are appreciated and helps me pinpoint bad slaves.

There's definitely a scale advantage to ping the smallest number of
folks first and escalate accordingly. If I don't reply fast enough for
you, let's say 15min  ~ 1 hour or even sooner if you are really
blocked, feel free to escalate to a mailing list or irc or IM me but
please, give me a chance to fix it first. And to fix it first; I need
to know which slaves you are talking about. To know that, I need the
try job url, it's in every try job email.

If you are still reading, thanks, that means you find this resource
useful or it wastes enough of your time each day for you to bother
reading this email. That is truly appreciated. If you finds ways to
improve the functionality, the efficiency, to reduce the maintenance
burden or to stop working on webkit for a day or two (who on earth
would want to do that?), when I say "patches are welcome", I say that
in it truest meaning and I'm not trying to be sarcastic. If you don't
have time, please file a bug to http://new.crbug.com and assign it to
me. I'll triage it accordingly.

A fair number of people contributed to the try server, the buildbot
code in general and to the depot tools.

The try server maintenance squads includes Bev and Nicolas. They help
me tremendously.

The try server specific code also includes Brad, Dan,
Marc-André, Steven and Thomas.

The buildbot code in general also includes Allen, Antony, Anthony,
Chase, Evan #1, Glenn, Alpha, Huan, Ian, Jungshik, Ken, Rahul, Mark
and Mark, Micheal, Nick, Ojan, Pam, Patrick, Paul, Pavel, Robert,
Randall, Sid, Stuart, Tony, Lei, Timur, Takeshi, Victor and William.

In addition, people contributing to depot_tools also includes Albert,
Elliot, Evan #2, Nirnimesh, Pawel, Scott, B.J. and Sverrir.

So that's a lot of people helping improving everyone's workflow. I
really want to thanks each of them.

I want to be clear: this is fine to IM me but please reply to the email first.

---

So, what's next?

The windows try slaves are atrociously slow right now. They even time
out during compile these days. The bottleneck
is linking. To give you an idea, here's a truncated dir list:
(...)
10,223,616 dump_cache.exe
10,309,632 fetch_client.exe
10,600,448 net_perftests.exe
12,406,784 sync_unit_tests.exe
14,966,784 net_unittests.exe
22,118,400 mini_installer.exe
55,029,760 tab_switching_test.exe
55,029,760 url_fetch_test.exe
55,046,144 memory_test.exe
55,066,624 page_cycler_tests.exe
60,858,368 test_shell.exe
63,164,416 test_shell_tests.exe
65,908,736 generate_profile.exe
66,113,536 perf_tests.exe
67,272,704 ui_tests.exe
67,362,816 sync_integration_tests.exe
67,948,544 interactive_ui_tests.exe
72,175,616 unit_tests.exe
(...)
258,968,576 generate_profile.pdb
262,237,184 perf_tests.pdb
262,351,872 chrome_dll.pdb
265,300,992 sync_integration_tests.pdb
266,202,112 ui_tests.pdb
266,333,184 browser_tests.pdb
266,841,088 interactive_ui_tests_dll.pdb
266,882,048 interactive_ui_tests.pdb
281,095,168 unit_tests.pdb
            273 File(s)  7,391,294,890 bytes

Yes, that's 7gigs of crap. And that doesn't include lib\ and obj\. So
the poor VMs have a bit of a hard time to keep up and a significant
number of them are simply timing out. I'm working on reducing the
injected dependencies in our projects but it takes time. As an
example, syncapi links with net for a single string parsing function.
But pulling net also pulls v8 so I'm splitting net in two to remove
the dependency injection. We may go back to disabling PDBs, saving
several gigs but that slightly reduce the usefulness. That's probably
what needs to happen right now.

Thanks,

Marc-Antoine

On Thu, Oct 1, 2009 at 9:37 PM, Marc-Antoine Ruel <mar...@chromium.org> wrote:
> [bcc: chromium-dev]
>
> As I said in my previous email (which was blocked by the ML) you
> should have just replied to the email and saved 1600+ people time.
> That what is written on the try job status email for a reason.

--~--~---------~--~----~------------~-------~--~----~
Chromium Developers mailing list: chromium-dev@googlegroups.com 
View archives, change email options, or unsubscribe: 
    http://groups.google.com/group/chromium-dev
-~----------~----~----~----~------~----~------~--~---

Reply via email to