workflow sysadmin v1.0.0

Ship to Production Workflow

Author markeddown

License MIT

Min Context 4,096 tokens

deployment production ci-cd ops checklist

Targets

---
id: "c9e5a3f6-0d4b-4a8c-b13e-5f7d9c2a4e6b"
name: "Ship to Production Workflow"
type: workflow
category: sysadmin
version: "1.0.0"
author: "markeddown"
license: MIT
min_context_tokens: 4096
target_frameworks:
  - markeddown
  - cursor
  - claude
  - generic
tags:
  - deployment
  - production
  - ci-cd
  - ops
  - checklist
---

# Ship to Production Workflow

A deployment playbook with decision gates at each stage. Every step has a failure path — never skip a gate.

## Step 1 — Pre-Flight Checks

Before anything runs, verify:

- [ ] All PRs for this release are merged to the deploy branch.
- [ ] No open "blocker" or "critical" issues tagged for this release.
- [ ] CHANGELOG or release notes are updated.
- [ ] Environment variables and secrets required by new code are set in the target environment.
- [ ] Database migrations (if any) are reviewed and tested against a copy of production data.

> **Gate:** If any box is unchecked, stop. Resolve the gap before proceeding.

## Step 2 — Run the Test Suite

- Run the full test suite (unit, integration, e2e) against the deploy branch.
- Tests must pass with zero failures. Flaky test failures must be investigated, not skipped.

> **If tests fail:**
> 1. Identify whether the failure is a real regression or a flaky test.
> 2. Real regression: fix it and restart from Step 1.
> 3. Flaky test: fix the flake, re-run, and document the flaky test for follow-up.
> 4. Never deploy with known test failures.

## Step 3 — Build the Artifact

- Run the production build (compile, bundle, Docker image, etc.).
- Tag the artifact with the git SHA and a human-readable version.
- Verify the artifact size and contents are reasonable (no dev dependencies, no test fixtures).

> **If build fails:** Check for missing dependencies, environment mismatches, or breaking changes. Fix and restart from Step 2.

## Step 4 — Deploy to Staging

- Deploy the tagged artifact to the staging environment.
- Staging must mirror production: same runtime, same env var shape, same backing services (or faithful stubs).
- Wait for the deploy to complete and all health checks to pass.

> **If staging deploy fails:** Check deploy logs, infrastructure state, and resource limits. Fix and re-deploy to staging. Do not proceed to production.

## Step 5 — Smoke Test on Staging

Run a targeted smoke test covering:

- [ ] Core user flows (login, primary feature, payment if applicable).
- [ ] Any feature or fix included in this specific release.
- [ ] API health endpoints return 200.
- [ ] No new errors in the staging error tracker.

> **Gate:** All smoke checks must pass. If any fail, diagnose on staging — do not push the problem to production.

## Step 6 — Deploy to Production

- Deploy the exact same artifact that passed staging (never rebuild).
- Use a rolling deploy or blue-green strategy to avoid downtime.
- If the deployment system supports canary releases, route a small percentage of traffic first.

> **If production deploy fails:** Immediately roll back to the previous known-good artifact. Diagnose using production logs and restart from Step 4 with the fix.

## Step 7 — Post-Deploy Monitoring

For the first 15-30 minutes after deploy:

- Watch error rates, latency percentiles (p50, p95, p99), and throughput.
- Compare key metrics against the pre-deploy baseline.
- Check for new exceptions in error tracking (Sentry, Datadog, etc.).

## Step 8 — Rollback Criteria

> **Trigger an immediate rollback if any of these are true:**
>
> - Error rate exceeds 2x the pre-deploy baseline.
> - p99 latency exceeds 3x the pre-deploy baseline.
> - Any critical user flow is broken (login, checkout, data access).
> - A new unhandled exception is firing at a rate above 1% of requests.

**Rollback procedure:**
1. Revert to the previous artifact (not a code revert — a deploy revert).
2. Verify health checks pass on the rolled-back version.
3. Notify the team with a summary: what was deployed, what broke, what was rolled back.
4. Treat the failed deploy as a bug — enter the Bug Triage Workflow.

## Constraints

- Never deploy on a Friday afternoon or before a holiday unless it is an emergency hotfix.
- Never skip staging. "It worked locally" is not a deployment gate.
- Never deploy an artifact that was not the exact one tested on staging.
- Keep the rollback path tested — run a rollback drill at least once per quarter.

# Ship to Production Workflow (v1.0.0)
# Generated by MarkedDown — markeddown.ai
# Ship to Production Workflow

A deployment playbook with decision gates at each stage. Every step has a failure path — never skip a gate.

## Step 1 — Pre-Flight Checks

Before anything runs, verify:

- [ ] All PRs for this release are merged to the deploy branch.
- [ ] No open "blocker" or "critical" issues tagged for this release.
- [ ] CHANGELOG or release notes are updated.
- [ ] Environment variables and secrets required by new code are set in the target environment.
- [ ] Database migrations (if any) are reviewed and tested against a copy of production data.

> **Gate:** If any box is unchecked, stop. Resolve the gap before proceeding.

## Step 2 — Run the Test Suite

- Run the full test suite (unit, integration, e2e) against the deploy branch.
- Tests must pass with zero failures. Flaky test failures must be investigated, not skipped.

> **If tests fail:**
> 1. Identify whether the failure is a real regression or a flaky test.
> 2. Real regression: fix it and restart from Step 1.
> 3. Flaky test: fix the flake, re-run, and document the flaky test for follow-up.
> 4. Never deploy with known test failures.

## Step 3 — Build the Artifact

- Run the production build (compile, bundle, Docker image, etc.).
- Tag the artifact with the git SHA and a human-readable version.
- Verify the artifact size and contents are reasonable (no dev dependencies, no test fixtures).

> **If build fails:** Check for missing dependencies, environment mismatches, or breaking changes. Fix and restart from Step 2.

## Step 4 — Deploy to Staging

- Deploy the tagged artifact to the staging environment.
- Staging must mirror production: same runtime, same env var shape, same backing services (or faithful stubs).
- Wait for the deploy to complete and all health checks to pass.

> **If staging deploy fails:** Check deploy logs, infrastructure state, and resource limits. Fix and re-deploy to staging. Do not proceed to production.

## Step 5 — Smoke Test on Staging

Run a targeted smoke test covering:

- [ ] Core user flows (login, primary feature, payment if applicable).
- [ ] Any feature or fix included in this specific release.
- [ ] API health endpoints return 200.
- [ ] No new errors in the staging error tracker.

> **Gate:** All smoke checks must pass. If any fail, diagnose on staging — do not push the problem to production.

## Step 6 — Deploy to Production

- Deploy the exact same artifact that passed staging (never rebuild).
- Use a rolling deploy or blue-green strategy to avoid downtime.
- If the deployment system supports canary releases, route a small percentage of traffic first.

> **If production deploy fails:** Immediately roll back to the previous known-good artifact. Diagnose using production logs and restart from Step 4 with the fix.

## Step 7 — Post-Deploy Monitoring

For the first 15-30 minutes after deploy:

- Watch error rates, latency percentiles (p50, p95, p99), and throughput.
- Compare key metrics against the pre-deploy baseline.
- Check for new exceptions in error tracking (Sentry, Datadog, etc.).

## Step 8 — Rollback Criteria

> **Trigger an immediate rollback if any of these are true:**
>
> - Error rate exceeds 2x the pre-deploy baseline.
> - p99 latency exceeds 3x the pre-deploy baseline.
> - Any critical user flow is broken (login, checkout, data access).
> - A new unhandled exception is firing at a rate above 1% of requests.

**Rollback procedure:**
1. Revert to the previous artifact (not a code revert — a deploy revert).
2. Verify health checks pass on the rolled-back version.
3. Notify the team with a summary: what was deployed, what broke, what was rolled back.
4. Treat the failed deploy as a bug — enter the Bug Triage Workflow.

## Constraints

- Never deploy on a Friday afternoon or before a holiday unless it is an emergency hotfix.
- Never skip staging. "It worked locally" is not a deployment gate.
- Never deploy an artifact that was not the exact one tested on staging.
- Keep the rollback path tested — run a rollback drill at least once per quarter.

Download

Cursor .cursorrules

↓

Windsurf .windsurfrules

↓

Claude Project CLAUDE.md

↓

OpenAI Assistants system-prompt.txt

↓

MarkedDown .md (raw)

↓

Compatibility

Compare

gpt-4o-mini 100% sanity-v1

claude-haiku-4-5 100% sanity-v1

Run the adversarial test suite using your own API key. Results are contributed back to the community by default.

BYOK — your key is sent directly to the provider and never stored.

Model

API Key

Your key is sent directly to the model provider and never stored on our servers.

Remember key in this tab session

Test Tier

Sanity Check 5 cases · ~5 seconds · format compliance only Tier 1 — Adversarial 20 cases · ~30–60 seconds · all adversarial patterns Tier 2 — Deep New 10–13 cases + difficulty ratchet · ~2–3 min · category-specific + LLM judge · uses 2× API credits

Share results to help others

Caches your results publicly so others don't need to re-run the same test.