Hi I want to explain that if you want me to scale, we need to follow some rules. Sorry for being somewhat rude sometimes, it's usually not deserved. In particular I apologize for the last thread to Jenn.
This is really about scalability. There is 100+ try server users. They run a few try a day each, well, at least the productive people, which excludes me. Each job is run usually on 3 'builders'. That results in *thousands* of runs each days. So yes, it does break in awful ways *every days*. Let me explain why there is this sentence in the try job emails: "If you think the try slave is broken (it happens!), please REPLY to this email, don't ask on irc, mailing list or IM." Some facts to back this sentence: - chromium-dev@ and the equivalent googler mailing lists have thousands of people and only a few care. - The irc channel has 150+ persons but if you don't mind not getting an answer, go ahead. :) - IM me (or Nicolas or Brad), seems like a good idea but if it happens 50 times a day for different problems, we really can't get any work done since we're constantly interrupted. - Emailing tryser...@chromium.org is the way to go. I receive it on my phone and I can batch replies together. In addition, at the end of the week, I can get an approximation of how much time this took me. Sometimes I don't reply, especially for flaky test. Be sure these emails are appreciated and helps me pinpoint bad slaves. There's definitely a scale advantage to ping the smallest number of folks first and escalate accordingly. If I don't reply fast enough for you, let's say 15min ~ 1 hour or even sooner if you are really blocked, feel free to escalate to a mailing list or irc or IM me but please, give me a chance to fix it first. And to fix it first; I need to know which slaves you are talking about. To know that, I need the try job url, it's in every try job email. If you are still reading, thanks, that means you find this resource useful or it wastes enough of your time each day for you to bother reading this email. That is truly appreciated. If you finds ways to improve the functionality, the efficiency, to reduce the maintenance burden or to stop working on webkit for a day or two (who on earth would want to do that?), when I say "patches are welcome", I say that in it truest meaning and I'm not trying to be sarcastic. If you don't have time, please file a bug to http://new.crbug.com and assign it to me. I'll triage it accordingly. A fair number of people contributed to the try server, the buildbot code in general and to the depot tools. The try server maintenance squads includes Bev and Nicolas. They help me tremendously. The try server specific code also includes Brad, Dan, Marc-André, Steven and Thomas. The buildbot code in general also includes Allen, Antony, Anthony, Chase, Evan #1, Glenn, Alpha, Huan, Ian, Jungshik, Ken, Rahul, Mark and Mark, Micheal, Nick, Ojan, Pam, Patrick, Paul, Pavel, Robert, Randall, Sid, Stuart, Tony, Lei, Timur, Takeshi, Victor and William. In addition, people contributing to depot_tools also includes Albert, Elliot, Evan #2, Nirnimesh, Pawel, Scott, B.J. and Sverrir. So that's a lot of people helping improving everyone's workflow. I really want to thanks each of them. I want to be clear: this is fine to IM me but please reply to the email first. --- So, what's next? The windows try slaves are atrociously slow right now. They even time out during compile these days. The bottleneck is linking. To give you an idea, here's a truncated dir list: (...) 10,223,616 dump_cache.exe 10,309,632 fetch_client.exe 10,600,448 net_perftests.exe 12,406,784 sync_unit_tests.exe 14,966,784 net_unittests.exe 22,118,400 mini_installer.exe 55,029,760 tab_switching_test.exe 55,029,760 url_fetch_test.exe 55,046,144 memory_test.exe 55,066,624 page_cycler_tests.exe 60,858,368 test_shell.exe 63,164,416 test_shell_tests.exe 65,908,736 generate_profile.exe 66,113,536 perf_tests.exe 67,272,704 ui_tests.exe 67,362,816 sync_integration_tests.exe 67,948,544 interactive_ui_tests.exe 72,175,616 unit_tests.exe (...) 258,968,576 generate_profile.pdb 262,237,184 perf_tests.pdb 262,351,872 chrome_dll.pdb 265,300,992 sync_integration_tests.pdb 266,202,112 ui_tests.pdb 266,333,184 browser_tests.pdb 266,841,088 interactive_ui_tests_dll.pdb 266,882,048 interactive_ui_tests.pdb 281,095,168 unit_tests.pdb 273 File(s) 7,391,294,890 bytes Yes, that's 7gigs of crap. And that doesn't include lib\ and obj\. So the poor VMs have a bit of a hard time to keep up and a significant number of them are simply timing out. I'm working on reducing the injected dependencies in our projects but it takes time. As an example, syncapi links with net for a single string parsing function. But pulling net also pulls v8 so I'm splitting net in two to remove the dependency injection. We may go back to disabling PDBs, saving several gigs but that slightly reduce the usefulness. That's probably what needs to happen right now. Thanks, Marc-Antoine On Thu, Oct 1, 2009 at 9:37 PM, Marc-Antoine Ruel <mar...@chromium.org> wrote: > [bcc: chromium-dev] > > As I said in my previous email (which was blocked by the ML) you > should have just replied to the email and saved 1600+ people time. > That what is written on the try job status email for a reason. --~--~---------~--~----~------------~-------~--~----~ Chromium Developers mailing list: chromium-dev@googlegroups.com View archives, change email options, or unsubscribe: http://groups.google.com/group/chromium-dev -~----------~----~----~----~------~----~------~--~---