(Almost) Every infrastructure decision I endorse or regret after 4 years running infrastructure at a startup

canpolat@programming.dev · 9 months ago

(Almost) Every infrastructure decision I endorse or regret after 4 years running infrastructure at a startup

procesd@lemmy.world · 9 months ago

If I could be bothered to sit and write down a distilled version of my last decade at work it would be something very similar. Any junior SRE can benefit from this.

fluckx@lemmy.world · 3 months ago

Any insight on why you prefer the nginx ingress vs the ALB ingress controller on AWS? you can group/combine ingresses as well and it will automatically load the correct certificate from ACM if it exists. Which means you won’t have to mess around with certbot. Your TLS ends in the loadbalancer in that case though.

EKS managed addons now support custom configuration( might not have when you started out ) though maybe not all the custom features you’re looking for are there. It’s not as flexible as the helm chart obviously, but usually supports the most basic things you’d want to use.

Interesting read otherwise!

Personally I’ve had issues selling people on gitops/kustomize as they all find helm charts a lot easier.

Lodra@programming.dev · 9 months ago

This is excellent. I may copy the rough format for tracking things internally at my company!

Btw, I agree with most of your decisions in here with just a few exceptions.

kustomize > helm
Argo > flux

My last thought is less clear though. There are good observability solutions besides datadog. Grafana Cloud is great. Honeycomb has a similar offering. But all are pretty expensive though.

If you aren’t using OpenTelemetry, you’re probably doing observability wrong!

Piatro@programming.dev · 9 months ago

I’ve only used helm and hadn’t considered kustomize as an equivalent, what about kustomize makes it bette in your opinion?

Lodra@programming.dev · 9 months ago

First is complexity. A simple helm chart works great but more elaborate charts can turn into a maintenance problem. This is especially when managing a large number of apps and need to establish and maintain standards across them. E.g. you want to add a new label to every helm chart you use. You now get to making 60 PRs for 60 charts. Or you can tie them all together with chart dependencies. This can be done well but almost never is. It’s just too easy to build a bad helm chart. Kustomize allows you to do this from a “top-down” perspective

Second is modifications. Consider as an example that you want to run filebeat as a sidecar container on some pod to capture its logs. But the helm chart you’re using doesn’t include this feature. You have two choices: modify the pod when it’s created with a mutatingwebhook or similar (super complicated solution) or you can copy/fork the chart, add the functionality, and maintain it going forward. Kustomize just doesn’t have this problem. You can just modify a base manifest with overlays.

Last is the nature of Go templates which helm charts are based on. Everything outside of {{ }} is just plaintext. This leads to a ton of limitations. Got a whitespace issue? You’ll probably find out at runtime. Want your IDE to identify syntax issues, provide, intellisense, etc. on the final manifest? Good luck! You need to render that chart first. With Kustomize, every manifest is structured text (yaml). So you get the benefits of all standard tooling for yaml data in your IDEs and CI/CD pipelines.

Honestly, I could keep going (helm releases ugghhhh!). But helm definitely wins on one point and it’s a big one; Helm is the standard for distributing k8s manifests. So every meaningful project supplies helm charts. Kustomize doesn’t even come close on this one. That said, I think Kustomize manifests are just simpler to build. So having an official base manifest for every project just doesn’t matter too much.