This week’s work fun: confirmed that Google Cloud’s default K8s Ingress setup pre-1.17 basically guarantees HTTP 502 responses for up to 10 minutes during regular app rollouts. To trigger the issue you just need to have fewer replicas of the app than there are K8s nodes/VMs. When a replica that lives on a node is deleted during a regular rolling update and there isn’t another to replace it Google’s Load Balancers happily continue to send traffic there. You can repro this with a basic setup from their official tutorials. Switching to their new Container-Native load balancers seems to help. It’s wild, though. 1.17 is fairly new in GKE and clusters aren’t auto-upgraded to Container-Native. Google has basically been selling a broken load balancer setup to GKE customers for years.

Leave a Reply

Your email address will not be published.

This site uses Akismet to reduce spam. Learn how your comment data is processed.