TL;DR. Track 2 of azure-sdk-for-go is not a search-and-replace from track 1. Plan for a credential rework, a client factory pass, and a nil-safety sweep on every pointer chain. Lock the patterns in CI lints so the next batch ships cleaner than the previous one.
Microsoft azure-sdk-for-go track 1 packages (rooted at github.com/Azure/azure-sdk-for-go/services/...) have been on the deprecation path for a while. Track 2 (the armXxx packages under github.com/Azure/azure-sdk-for-go/sdk/...) is the supported surface and the only one with active fixes.
We have been migrating a large suite of Go test helpers across in batches: compute, network, storage, batch, data services. This post covers what is the same in every batch, because the recurring patterns are the interesting part.
Why this is more than a rename ¶
Track 2 changes the shape of the API in four ways that matter:
- Authentication.
autorestis gone. Credentials come fromazidentity(DefaultAzureCredential,ClientSecretCredential). Client construction takesazcore.ClientOptionsfor cloud env, retry policy, custom transport. - Client factories. Each service has an
armXxx.NewClientFactorythat returns sub-clients. Previously every helper opened its own client. Now you can share a factory per resource group call site and stop recomputing credentials. - Pointer-heavy response shapes.
*armcompute.VirtualMachinehasProperties *VirtualMachineProperties, which hasStorageProfile *StorageProfile, which hasOSDisk *OSDisk. Almost everything is a pointer, and almost any pointer can be nil for resources in transitional states. Track 1 sometimes hid this with concrete structs. - Pagers everywhere. Listing now uses
*runtime.Pager[T]. Consistent across services, which is nice, but you write the loop everywhere.
Net effect: a “trivial” rename pulls in a credential rework, a client lifecycle rework, and a nil-safety pass. Treat each as its own concern in the diff, with separate commits where possible. Reviewers and bisects need somewhere to bite.
Pattern 1: nil guards on every pointer chain ¶
The most common bug we surface during migration. Track 2 helpers that look like this:
return *vm.Properties.StorageProfile.OSDisk.Name
panic on real resources. We standardised on the early-return pattern:
func extractVMOSDiskName(vm *armcompute.VirtualMachine) string {
if vm == nil ||
vm.Properties == nil ||
vm.Properties.StorageProfile == nil ||
vm.Properties.StorageProfile.OSDisk == nil ||
vm.Properties.StorageProfile.OSDisk.Name == nil {
return ""
}
return *vm.Properties.StorageProfile.OSDisk.Name
}
Verbose, but exactly the kind of code that is easy to review, easy to test, easy to spot when missing. We add one test case per nil level (NilProperties, NilStorageProfile, NilOSDisk) so the test matrix tracks the guard chain literally.
For list iterations, the pattern is “skip the bad item, do not error”:
for _, v := range page.Value {
if v == nil || v.Name == nil {
continue
}
names = append(names, *v.Name)
}
Tests stay cheap: one positive case, one nil case per pointer level. This catches the “someone added another * to the chain and forgot a guard” change in code review, without needing live Azure to reproduce.
Pattern 2: consolidated client factories ¶
The first migration batch landed nine helper functions that each constructed credentials, options, and clients inline. The second batch we caught and fixed in review: extract newArmCredential() and newArmClientOptions() once, have every factory call them.
Before:
func getArmKeyVaultClientFactory(subscriptionID string) (*armkeyvault.ClientFactory, error) {
cred, err := azidentity.NewDefaultAzureCredential(nil)
if err != nil {
return nil, err
}
opts := &arm.ClientOptions{
ClientOptions: azcore.ClientOptions{
Cloud: cloud.AzurePublic,
},
}
return armkeyvault.NewClientFactory(subscriptionID, cred, opts)
}
After:
func getArmKeyVaultClientFactory(subscriptionID string) (*armkeyvault.ClientFactory, error) {
cred, err := newArmCredential()
if err != nil {
return nil, err
}
return armkeyvault.NewClientFactory(subscriptionID, cred, newArmClientOptions())
}
Across nine factories in the compute batch, this removed about 100 lines of duplicated boilerplate. More importantly, it gave us one place to change cloud (Public, Government, China), one place to add a retry policy, one place to inject a custom transport for tests.
Pattern 3: fixtures over live calls ¶
Migration tempts you into a pile of integration tests. Resist. Most of the fragile code lives in response decoding and nil-safety, and that all runs against fixtures, not against Azure.
We use a small set of representative *armcompute.VirtualMachine literals built in test files, plus generated payloads for unusual shapes. Live integration tests stay narrow:
- One happy path per resource type.
- Gated by env vars.
- Run on schedule, not on every push.
Fixture coverage caught roughly 80 percent of bugs in the compute batch. The remaining 20 percent showed up in integration runs, which is exactly where we want them.
Pattern 4: lint rules to lock the migration in ¶
We expanded the golangci-lint config to flag:
staticcheckSA1019 for any remaining track 1 import. The migration cannot accidentally regress.nilnessfromgo-vetextras to catch obvious nil panics, which surfaces cases where a guard was forgotten.errcheckagainst the new pager close patterns where we wrap responses.
Combined with a focused review checklist for migration PRs, this kept the second batch substantially cleaner than the first. The third batch (network) was reviewable in under an hour.
What we would do differently ¶
- Smaller batches. The compute batch crossed too many files at once. Network and data went in tighter slices and the reviews were faster.
- Fixture generators earlier. We hand-wrote the first round of test fixtures and then realised half of them looked the same. A small helper that builds VM, NIC, and disk fixtures with sane defaults paid back fast.
- Document the nil-guard pattern in CONTRIBUTING up front. A one-paragraph instruction prevents reviewers having to repeat themselves on every PR.
Trade-offs ¶
- More verbose code. The nil-guard pattern adds 4 to 6 lines per extractor. Worth it. The alternative is panics in production tests, which is worse.
- Larger diff. Even with smaller batches, a single SDK migration is hundreds of lines per resource type. Flagging the diff up front (in the PR description) makes the review tractable.
- Coupling to a specific track 2 minor version. SDK minor versions sometimes change types subtly. Pin the SDK version in
go.modand bump it deliberately, not as part of an unrelated change.
Bottom line ¶
A new SDK is a chance to fix a class of latent bugs the old SDK was hiding. The actual API rename is the easy part. The nil-safety rework, the credential consolidation, and the fixture-based tests are where the time goes, and they are worth doing properly. Done well, the next migration after this one (track 2 to track 3, whenever it arrives) is far cheaper because the helpers are structured for it.