All should be well again. This outage is my fault (sort of). I put in place a new load balancer. The previous version just did "round robin" allocation of jobs to buildservers, often resulting in an uneven load on the buildservers. This required us to deploy a lot more buildservers then we really need.
The new load balancer is aware of how many jobs are on each buildserver, so evenly distributes the load. So, at the moment I have only 12 buildservers running, where before I had 21 (and I can go back to 21 if the actual load requires it).
Today's outage appears to be triggered not so much by the new load balancer, but by an old Linux kernel bug that it tripped over. I'm looking into how to avoid the problem.
So, the bug wasn't in the kernel, but with another module I was using. I have installed a mitigation, which hopefully will solve the problem.
ai2-test's buildservers are still not available, that is a seperate issue, which I am addressing. Also, ai2-test is now behind ai2, so I'm going to update it today, when done, it will have everything from ai2, plus the latest changes that we have merged into our source tree.