47. Stabilizing suites, reporting, and best practices
Goal
- Keep Appium-based Avalonia suites reliable on developer machines and CI by isolating flakiness causes.
- Capture meaningful diagnostics (logs, videos, artifacts) that accelerate investigation when tests fail.
- Scale coverage with retry, quarantine, and reporting strategies that protect signal quality.
Why this matters
- Cross-platform automation is sensitive to timing, focus, and OS updates—without discipline the suite becomes noisy.
- Fast feedback requires structured artifacts; otherwise failures devolve into manual repro marathons.
- Stakeholders need trend visibility: which areas flake, which platforms lag, and where to invest engineering effort.
Prerequisites
- Chapter 43–46 for harness setup, selectors, and advanced scenarios.
- Chapter 42 for CI pipeline integration basics.
1. Triage flakiness with classification
Begin every investigation by tagging failures:
- Timing (animations, virtualization) – resolved with better waits (
WebDriverWait, dispatcher polling).
- Environment (permissions, display scaling) – addressed by setup scripts or platform skips.
- Driver quirks (WinAppDriver Ctrl-click) – documented with
[Fact(Skip="...")] like ListBoxTests.Can_Select_Items_By_Ctrl_Clicking (external/Avalonia/tests/Avalonia.IntegrationTests.Appium/ListBoxTests.cs:36).
- App bugs – file issues with automation evidence attached.
Maintain a living flake log referencing test name, platform, root cause, and remediation. Automate updates by pushing annotations into test reporters (Azure Pipelines, GitHub Actions).
2. Quarantine and retries without hiding real bugs
Retries buy time but can mask regressions. Strategies:
- Implement targeted retries via xUnit ordering or
[RetryFact] equivalents. Avalonia currently handles retries manually by skipping unstable tests with reason strings (e.g., TrayIconTests.Should_Handle_Left_Click is marked [PlatformFact(..., Skip = "Flaky test")], external/Avalonia/tests/Avalonia.IntegrationTests.Appium/TrayIconTests.cs:29).
- Prefer automatic quarantine: tag flaky tests and run them in a separate lane, keeping main suites failure-free. Example: use xUnit traits or custom attributes to filter (
dotnet test --filter "TestCategory!=Quarantine").
- Combine retries with diagnostics: on the last retry failure, dump Appium logs and take screenshots before failing.
3. Capture rich diagnostics
For every critical failure, collect:
- Appium server logs (
appium.out in the macOS script) and publish them via CI artifacts (external/Avalonia/azure-pipelines-integrationtests.yml:27).
- Driver logs:
Session.Manage().Logs.GetLog("driver") after catch blocks to capture protocol exchanges.
- Screenshots: call
Session.GetScreenshot().SaveAsFile(...) on failure; stash path in test output.
- Videos: on Windows, VSTest runsettings
record-video.runsettings records screen output (external/Avalonia/tests/Avalonia.IntegrationTests.Appium/record-video.runsettings).
- Headless imagery: pair Appium runs with headless captures (Chapter 40) to highlight visual state at failure.
Build helper methods so tests simply call ArtifactCollector.Capture(context);. Ensure cleanup occurs even when assertions throw (use try/finally).
4. Standardize waiting and polling policies
Enforce consistent defaults:
- Set a global implicit wait (short, e.g., 1s) and rely on explicit waits for complex states. Too-long implicit waits slow down failure discovery.
- Provide
WaitForElement and WaitForCondition helpers with logging. Use them instead of ad-hoc Thread.Sleep.
- For dispatcher-driven state, expose instrumentation in the app (text fields reporting counters like
GetMoveCount in PointerTests_MacOS, external/Avalonia/tests/Avalonia.IntegrationTests.Appium/PointerTests_MacOS.cs:86). Poll those values to assert behavior deterministically.
Document wait policies in CONTRIBUTING guidelines to onboard new contributors.
5. Structure reports for quick scanning
Azure Pipelines / GitHub Actions
- Publish TRX results with names that encode platform, driver, and suite (e.g.,
Appium-macOS-Appium2.trx).
- Upload log bundles (
logs/appium.log, screenshots/*.png). Provide clickable links in summary markdown.
- Add summary steps that print failing test names grouped by category (flaky, new regression, quarantined).
Local development
- Provide a script (Chapter 44) that mirrors CI output directories so developers can inspect logs locally.
- Encourage use of
dotnet test --logger "trx;LogFileName=local.trx" + reportgenerator for HTML summaries.
6. Enforce coding standards in tests
- Selectors: centralize in PageObjects. No raw XPath in tests.
- Waits: ban
Thread.Sleep in code review; insist on helper usage.
- Cleanup: always dispose windows/sessions (
using pattern with OpenWindowWithClick). Review tests that skip cleanup (they often cause downstream failures).
- Platform gating: pair every platform-specific assertion with
[PlatformFact]/[PlatformTheory] to avoid accidental runs on unsupported OSes.
Add lint tooling (Roslyn analyzers or custom scripts) to scan for banned patterns (e.g., Thread.Sleep() in test projects.
7. Monitor and alert on trends
- Track success rate per platform, per suite. Configure dashboards (Azure Analytics, GitHub Insights) to display pass percentages over time.
- Emit custom metrics (e.g., number of retries) to a time-series store. If retries spike, alert engineers before builds start failing.
- Rotate flake triage duty; publish weekly summaries identifying top offenders and assigned owners.
8. Troubleshooting checklist
- Frequent timeouts – confirm Appium server stability, check CPU usage on agents, review wait durations.
- Intermittent focus issues – ensure tests foreground windows (
SetForegroundWindow on Windows) or click background-free zones before interacting.
- Driver crashes – update Appium/WinAppDriver, capture crash dumps, and reference known issues (e.g., mac2 driver close-session crash handled in
DefaultAppFixture.Dispose).
- Artifacts missing – verify CI scripts always run artifact upload steps with
condition: always().
- Quarantine drift – periodic reviews to reinstate fixed tests; failing to do so erodes coverage.
Practice lab
- Artifact collector – Implement a helper that captures Appium logs, driver logs, screenshots, and optional videos when a test fails. Wire it into an xUnit
IAsyncLifetime fixture so it runs automatically.
- Wait audit – Write an analyzer or script that flags
Thread.Sleep usages in the Appium test project. Replace them with explicit waits and document the change.
- Quarantine lane – Configure your CI pipeline with two jobs: stable and quarantine (
dotnet test --filter "Category!=Quarantine" vs. Category=Quarantine). Move a flaky test into the quarantine lane and verify reporting highlights it separately.
- Trend dashboard – Export TRX results for the past week and build a simple dashboard (Power BI, Grafana) showing pass/fail counts per platform. Identify top flaky tests.
- Regression template – Create an issue template that captures test name, platform, driver version, app commit, and links to artifacts. Use it when logging Appium regressions to standardize triage information.
What's next
- Return to Index for appendices, publishing checklists, or future updates.