OpenShift Origin Activity: TLS Testing Refactor and CI Stability


The openshift/origin repository saw a concentrated burst of activity this week focusing on the reliability of the extended test suite and preparations for the upcoming RHEL 10 transition. Engineers landed several refactors to TLS validation logic while simultaneously adjusting CI pipelines to handle infrastructure shifts and image reconciliation.

A significant portion of the recent commits targeted the TLS observed config validation logic within the extended test suite. Jan Chaloupka led a series of refactors to test/extended/tls/tls_observed_config.go aimed at reducing code duplication and improving the granularity of error reporting.

Previously, several TLS tests would exit early upon encountering a single failure, which obscured other potential issues in the same run. The new approach, introduced in commits like 456ed57e17, allows tests like testDeploymentTLSEnvVars to continue execution after a check fails. By replacing generic errors with a map based reporting system in d464588398, operators can now more easily identify exactly which TLS check failed during a complex E2E run.

These changes also moved core TLS config checks under a shared function. This refactor enables these validations to be invoked from other test suites, ensuring consistent security posture checks across different components of the OpenShift ecosystem. The deduplication effort significantly cleans up the testing surface area for TLS observed configurations, making the suite more maintainable for the long term.

Stability in the continuous integration pipeline remains a high priority as the project prepares for the RHEL 10 switchover. Stephen Benjamin landed a change to test/extended/ci/job_names.go to temporarily skip the OS version test. This preemptive move avoids false positives in Prow jobs as the underlying build environment shifts toward the next major Red Hat Enterprise Linux release.

Additional CI noise reduction occurred in the Machine Config Operator area. Pablo Rodriguez Nava disabled a stream test linked to MCO 2370 to prevent flaky failures from blocking the main merge queue. While disabling tests is always a trade off, the move ensures that unrelated payload testing can proceed without interruptions from known transient issues in the stream protocol validation.

The project also saw a revert of a merge pull request by the Chai Bot in 3fdd4d2405. This highlights the automated guardrails in place to protect the main branch when integration tests detect regressions after a merge. Maintaining a clean main branch is critical for the rapid release cycles required by OpenShift.

Efficient payload testing often requires the injection of specific diagnostic images. Jacob See contributed a fix to pkg/cmd/openshift-tests/images/images_command.go to inject jessie dnsutils and nginx images at the correct indices within the image mapping logic. This change includes deduplication to ensure that the test environment does not suffer from redundant image pulls or conflicting registry references.

Jacob also mapped the glibc dns testing image to support payload testing for Kubernetes 1.36. This is a common pattern in OpenShift Origin where upstream Kubernetes features are validated against the OpenShift payload before the final integration PRs are fully merged.

On the governance side, the project updated the OWNERS file for image utilities. Jubitta John added Jacob and Jubitta as approvers while removing outgoing members. This rotation ensures that the people actively working on image mapping and registry integration have the necessary permissions to maintain the test infrastructure.

The node team continues to modernize the test suite by migrating legacy tests to the extended framework. Chandan Maurya migrated several ContainerRuntimeConfig tests in test/extended/node/node_e2e/container_runtime_config.go as part of the OCPNODE 4506 and 4539 initiatives. Moving these tests to the extended suite provides better observability and allows them to run as part of the standard E2E workflows.

In the storage and events layer, Matthew Booth updated the source of Cluster API ImageVolumeAlreadyPresent events. This adjustment ensures that pathological event detection logic correctly identifies duplicate volume events. By refining the event source, the monitoring system can better distinguish between expected retry behavior and actual volume provisioning failures in large scale clusters.

The activity this week suggests a few areas for operators to monitor:

  1. RHEL 10 Migration: The skipping of OS version tests indicates that the underlying build and test infrastructure is in flux. Watch for a re enabling of these tests once the RHEL 10 switchover is complete.
  2. TLS Validation Consistency: With the refactored TLS checks becoming more portable, expect these validations to appear in more component level tests. This should lead to more predictable TLS behavior across the platform.
  3. Payload 1.36 Readiness: The mapping of DNS testing images for Kubernetes 1.36 signals that the project is deep into validation for the next major Kubernetes release.