<?xml version="1.0" encoding="utf-8" standalone="yes"?><rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom"><channel><title>Engineering posts on Project Wintermute</title><link>https://wintermutecore.com/posts/</link><description>Recent content in Engineering posts on Project Wintermute</description><generator>Hugo</generator><language>en-us</language><lastBuildDate>Thu, 09 Apr 2026 09:00:00 +0200</lastBuildDate><atom:link href="https://wintermutecore.com/posts/index.xml" rel="self" type="application/rss+xml"/><item><title>Migrating Go test helpers from Azure SDK v1 to v2 (track 2)</title><link>https://wintermutecore.com/posts/azure-sdk-v2-go-migration/</link><pubDate>Thu, 09 Apr 2026 09:00:00 +0200</pubDate><guid>https://wintermutecore.com/posts/azure-sdk-v2-go-migration/</guid><description>&lt;p&gt;&lt;strong&gt;TL;DR.&lt;/strong&gt; Track 2 of &lt;code&gt;azure-sdk-for-go&lt;/code&gt; is not a search-and-replace from track 1. Plan for a credential rework, a client factory pass, and a nil-safety sweep on every pointer chain. Lock the patterns in CI lints so the next batch ships cleaner than the previous one.&lt;/p&gt;
&lt;p&gt;Microsoft &lt;code&gt;azure-sdk-for-go&lt;/code&gt; track 1 packages (rooted at &lt;code&gt;github.com/Azure/azure-sdk-for-go/services/...&lt;/code&gt;) have been on the deprecation path for a while. Track 2 (the &lt;code&gt;armXxx&lt;/code&gt; packages under &lt;code&gt;github.com/Azure/azure-sdk-for-go/sdk/...&lt;/code&gt;) is the supported surface and the only one with active fixes.&lt;/p&gt;</description></item><item><title>Tag-based AWS resource cleanup: patterns that actually scale</title><link>https://wintermutecore.com/posts/aws-tag-based-resource-cleanup/</link><pubDate>Fri, 03 Apr 2026 11:00:00 +0200</pubDate><guid>https://wintermutecore.com/posts/aws-tag-based-resource-cleanup/</guid><description>&lt;p&gt;&lt;strong&gt;TL;DR.&lt;/strong&gt; Name and time filters are not enough for safe AWS bulk cleanup. Use tags as the primary signal, expect &lt;code&gt;ListTagsForResource&lt;/code&gt; to be your bottleneck, enforce tagging at provisioning time, and run an audit job that flags untagged resources so the policy stays honest.&lt;/p&gt;
&lt;p&gt;The &amp;ldquo;delete a lot of AWS resources at once&amp;rdquo; problem shows up in every account: CI sandboxes, expired test estates, dev environments forgotten about, ad-hoc reproductions left behind. Bulk cleanup tools that target this exist and work well. Used carelessly any of them is a footgun. Used carefully with tag filtering, they become one of the most useful pieces of cost discipline you can ship.&lt;/p&gt;</description></item><item><title>Building a daily data pipeline with Dagu, Python, and a JSONL data lake</title><link>https://wintermutecore.com/posts/daily-data-pipeline-dagu-python/</link><pubDate>Wed, 18 Mar 2026 08:00:00 +0200</pubDate><guid>https://wintermutecore.com/posts/daily-data-pipeline-dagu-python/</guid><description>&lt;p&gt;&lt;strong&gt;TL;DR.&lt;/strong&gt; Three stages (ingest, index, alert), one stage per script, JSONL as the index format, a flat dedup state file, Dagu for scheduling. Boring, reliable, and dramatically less work than reaching for a heavyweight orchestrator.&lt;/p&gt;
&lt;p&gt;There is a class of pipeline that does not deserve a Spark cluster, an Airflow deployment, or a multi-tenant orchestrator. It is the &amp;ldquo;fetch a few hundred records from an API every day, index them, alert on the interesting ones&amp;rdquo; job. We have built this kind of thing dozens of times. Here is the shape that has held up.&lt;/p&gt;</description></item><item><title>k3s on Hetzner: notes from running production clusters</title><link>https://wintermutecore.com/posts/k3s-hetzner-production-notes/</link><pubDate>Thu, 05 Mar 2026 14:00:00 +0200</pubDate><guid>https://wintermutecore.com/posts/k3s-hetzner-production-notes/</guid><description>&lt;p&gt;&lt;strong&gt;TL;DR.&lt;/strong&gt; k3s on Hetzner is a strong cost-control move when you are willing to operate the cluster. Mind the Flannel MTU on Hetzner private networks, separate stateless and stateful workloads at the storage layer, keep observability minimal but real, and treat backups as a tested practice rather than a config setting.&lt;/p&gt;
&lt;p&gt;A managed Kubernetes service is the right answer for most teams. When it is not the right answer (cost, control, locality of data), self-hosted k3s on a low-cost provider like Hetzner is one of the better options. We have run several clusters of this shape in production for over a year. This post is the set of decisions that have held up.&lt;/p&gt;</description></item><item><title>Speeding up GitHub Actions lint pipelines for large Go codebases</title><link>https://wintermutecore.com/posts/go-ci-lint-pipeline-optimisation/</link><pubDate>Thu, 12 Feb 2026 10:00:00 +0200</pubDate><guid>https://wintermutecore.com/posts/go-ci-lint-pipeline-optimisation/</guid><description>&lt;p&gt;&lt;strong&gt;TL;DR.&lt;/strong&gt; Lint on a large Go monorepo went from 63 seconds to about 25 seconds on warm cache, with macOS skipped on branches. Five changes: concurrency group, conditional OS matrix, combined cache restore and save, explicit &lt;code&gt;go mod download&lt;/code&gt;, and incremental &lt;code&gt;golangci-lint --new-from-rev&lt;/code&gt;. None require a self-hosted runner.&lt;/p&gt;
&lt;p&gt;A large Go codebase makes the CI lint stage the part developers feel most: every push, on every branch. Lint feedback that takes a minute and a half kills iteration speed and quietly trains people to push less often, which is the opposite of what you want.&lt;/p&gt;</description></item><item><title>Anatomy of a 6-hour Kubernetes ingress outage</title><link>https://wintermutecore.com/posts/kubernetes-ingress-outage-postmortem/</link><pubDate>Mon, 09 Feb 2026 12:00:00 +0200</pubDate><guid>https://wintermutecore.com/posts/kubernetes-ingress-outage-postmortem/</guid><description>&lt;p&gt;&lt;strong&gt;TL;DR.&lt;/strong&gt; A backend deployment lost all healthy pods. nginx active health checks marked the upstream pool empty. That pool was the &lt;code&gt;default_server&lt;/code&gt; for ports 80 and 443, so every unmatched hostname returned 502 for 6 hours and 38 minutes. The trigger was Kubernetes-side. The blast radius was a configuration choice we made years ago. The post-incident fixes were almost all on the nginx side.&lt;/p&gt;
&lt;p&gt;We had a P0 outage on a public ingress tier. Two redundant nginx instances, both showing the same symptoms, both serving production traffic to dozens of hostnames. This is the writeup, sanitised and reduced to the parts that generalise.&lt;/p&gt;</description></item></channel></rss>