workflow sysadmin
Ship to Production Workflow
deployment production ci-cd ops checklist
Targets
---
id: "c9e5a3f6-0d4b-4a8c-b13e-5f7d9c2a4e6b"
name: "Ship to Production Workflow"
type: workflow
category: sysadmin
version: "1.0.0"
author: "markeddown"
license: MIT
min_context_tokens: 4096
target_frameworks:
- markeddown
- cursor
- claude
- generic
tags:
- deployment
- production
- ci-cd
- ops
- checklist
---
# Ship to Production Workflow
A deployment playbook with decision gates at each stage. Every step has a failure path — never skip a gate.
## Step 1 — Pre-Flight Checks
Before anything runs, verify:
- [ ] All PRs for this release are merged to the deploy branch.
- [ ] No open "blocker" or "critical" issues tagged for this release.
- [ ] CHANGELOG or release notes are updated.
- [ ] Environment variables and secrets required by new code are set in the target environment.
- [ ] Database migrations (if any) are reviewed and tested against a copy of production data.
> **Gate:** If any box is unchecked, stop. Resolve the gap before proceeding.
## Step 2 — Run the Test Suite
- Run the full test suite (unit, integration, e2e) against the deploy branch.
- Tests must pass with zero failures. Flaky test failures must be investigated, not skipped.
> **If tests fail:**
> 1. Identify whether the failure is a real regression or a flaky test.
> 2. Real regression: fix it and restart from Step 1.
> 3. Flaky test: fix the flake, re-run, and document the flaky test for follow-up.
> 4. Never deploy with known test failures.
## Step 3 — Build the Artifact
- Run the production build (compile, bundle, Docker image, etc.).
- Tag the artifact with the git SHA and a human-readable version.
- Verify the artifact size and contents are reasonable (no dev dependencies, no test fixtures).
> **If build fails:** Check for missing dependencies, environment mismatches, or breaking changes. Fix and restart from Step 2.
## Step 4 — Deploy to Staging
- Deploy the tagged artifact to the staging environment.
- Staging must mirror production: same runtime, same env var shape, same backing services (or faithful stubs).
- Wait for the deploy to complete and all health checks to pass.
> **If staging deploy fails:** Check deploy logs, infrastructure state, and resource limits. Fix and re-deploy to staging. Do not proceed to production.
## Step 5 — Smoke Test on Staging
Run a targeted smoke test covering:
- [ ] Core user flows (login, primary feature, payment if applicable).
- [ ] Any feature or fix included in this specific release.
- [ ] API health endpoints return 200.
- [ ] No new errors in the staging error tracker.
> **Gate:** All smoke checks must pass. If any fail, diagnose on staging — do not push the problem to production.
## Step 6 — Deploy to Production
- Deploy the exact same artifact that passed staging (never rebuild).
- Use a rolling deploy or blue-green strategy to avoid downtime.
- If the deployment system supports canary releases, route a small percentage of traffic first.
> **If production deploy fails:** Immediately roll back to the previous known-good artifact. Diagnose using production logs and restart from Step 4 with the fix.
## Step 7 — Post-Deploy Monitoring
For the first 15-30 minutes after deploy:
- Watch error rates, latency percentiles (p50, p95, p99), and throughput.
- Compare key metrics against the pre-deploy baseline.
- Check for new exceptions in error tracking (Sentry, Datadog, etc.).
## Step 8 — Rollback Criteria
> **Trigger an immediate rollback if any of these are true:**
>
> - Error rate exceeds 2x the pre-deploy baseline.
> - p99 latency exceeds 3x the pre-deploy baseline.
> - Any critical user flow is broken (login, checkout, data access).
> - A new unhandled exception is firing at a rate above 1% of requests.
**Rollback procedure:**
1. Revert to the previous artifact (not a code revert — a deploy revert).
2. Verify health checks pass on the rolled-back version.
3. Notify the team with a summary: what was deployed, what broke, what was rolled back.
4. Treat the failed deploy as a bug — enter the Bug Triage Workflow.
## Constraints
- Never deploy on a Friday afternoon or before a holiday unless it is an emergency hotfix.
- Never skip staging. "It worked locally" is not a deployment gate.
- Never deploy an artifact that was not the exact one tested on staging.
- Keep the rollback path tested — run a rollback drill at least once per quarter.
Download
Compatibility
gpt-4o-mini 100% sanity-v1
claude-haiku-4-5 100% sanity-v1