Kill Switch
A global kill switch is your last line of defense — a single flag that can disable an entire subsystem, feature area, or even your whole application in an emergency. This guide covers creating, wiring, testing, and integrating a global kill switch into your incident response workflow.
This is an emergency control
Global Kill Switch Architecture
A global kill switch works by placing a flag check at the highest level of your application — before any business logic executes. When toggled OFF, all requests are short-circuited with a controlled degradation response.
┌─────────────────────────────────────────┐
│ Incoming Request │
└──────────────────┬──────────────────────┘
│
▼
┌─────────────────────────────────────────┐
│ Kill Switch Middleware (FIRST) │
│ ┌───────────────────────────────────┐ │
│ │ boolVariation("global-killswitch") │ │
│ └───────────────┬───────────────────┘ │
│ │ │
│ ┌────────┴────────┐ │
│ ▼ ▼ │
│ [ON: Continue] [OFF: 503 + │
│ ▼ Retry-After] │
│ Normal Request │
│ Processing │
└─────────────────────────────────────────┘1. Create the global kill switch flag
Create an ops-category flag that will serve as your global circuit breaker. Default to
true(ON = application normal; OFF = kill switch engaged).Create global kill switchBash1234567891011curl -X POST https://api.featuresignals.com/v1/projects/{projectID}/flags \ -H "Authorization: Bearer $API_KEY" \ -H "Content-Type: application/json" \ -d '{ "key": "global-killswitch", "name": "Global Kill Switch", "type": "boolean", "defaultValue": true, "toggleCategory": "ops", "description": "EMERGENCY: Global circuit breaker. Flip OFF to immediately degrade all traffic. Toggles are audited and trigger PagerDuty alerts." }'2. Wire it at the highest level of your app
The kill switch must execute before any business logic — in your HTTP middleware stack, API gateway, or service mesh. Here's how to implement it in various architectures:
Express/Node.js — Top-level middlewareTypeScript123456789101112131415161718192021222324252627282930313233import express from 'express'; import { FeatureSignalsClient } from '@featuresignals/node'; const client = new FeatureSignalsClient(process.env.FS_API_KEY!, { envKey: 'production', }); await client.waitForReady(); const app = express(); // ⚠️ Global kill switch — MUST be the first middleware app.use(async (req, res, next) => { const appActive = client.boolVariation( 'global-killswitch', { key: 'global' }, // Global flag — no user context needed true, // Default ON — keep serving if SDK unreachable ); if (!appActive) { // Kill switch is OFF — degrade immediately res.setHeader('Retry-After', '120'); res.status(503).json({ error: 'Service temporarily unavailable', message: 'The application is undergoing emergency maintenance.', incident_id: req.headers['x-incident-id'] || 'unknown', }); return; } next(); }); // ... rest of your middleware and routesGo — Chi middlewareGo12345678910111213141516171819202122232425262728293031323334package middleware import ( "net/http" fs "github.com/featuresignals/sdk-go" ) // GlobalKillSwitch is the first middleware in the chain. // When the kill switch flag is OFF, all requests return 503 immediately. func GlobalKillSwitch(client *fs.Client) func(http.Handler) http.Handler { return func(next http.Handler) http.Handler { return http.HandlerFunc(func(w http.ResponseWriter, r *http.Request) { // Check global kill switch with no user context active := client.BoolVariation( "global-killswitch", fs.NewContext("global"), true, // Default ON ) if !active { w.Header().Set("Content-Type", "application/json") w.Header().Set("Retry-After", "120") w.WriteHeader(http.StatusServiceUnavailable) w.Write([]byte(`{ "error": "Service temporarily unavailable", "message": "The application is undergoing emergency maintenance." }`)) return } next.ServeHTTP(w, r) }) } }API Gateway — Kong / NGINXYAML12345678910111213141516171819202122# Kong declarative config — global kill switch via FeatureSignals # Requires a custom plugin or sidecar that checks the flag _format_version: "3.0" services: - name: my-api url: http://my-api-service:8080 routes: - name: api-route paths: - /api plugins: - name: featuresignals-killswitch config: flag_key: global-killswitch environment_key: production api_key: $FS_API_KEY fallback_status: 503 retry_after: 120 degradation_message: | The application is undergoing emergency maintenance. Please try again in 2 minutes.3. Create the emergency procedure
Document the exact steps for engaging and disengaging the kill switch. This procedure should be in your incident runbook and practiced during fire drills.
Emergency Kill Switch Procedure
To ENGAGE (disable traffic):
- Declare an incident in your incident management tool
- Navigate to the global kill switch flag in FeatureSignals
- Toggle the flag OFF for the production environment
- Verify your monitoring shows traffic being diverted
- Post in #incidents Slack channel with incident ID
To DISENGAGE (restore traffic):
- Confirm the underlying issue is resolved
- Toggle the flag ON for the production environment
- Monitor error rates and latency for 5 minutes
- If stable, resolve the incident
4. Set up audit alerts
Every toggle of a global kill switch must be audited and alerted. Configure webhooks to notify your incident management tools:
Create webhook for kill switch togglesBash123456789101112curl -X POST https://api.featuresignals.com/v1/webhooks \ -H "Authorization: Bearer $API_KEY" \ -H "Content-Type: application/json" \ -d '{ "url": "https://hooks.slack.com/services/YOUR/SLACK/WEBHOOK", "events": ["flag.environment.updated"], "filter": { "flag_keys": ["global-killswitch"], "environments": ["production"] }, "description": "Alert #incidents when global kill switch is toggled" }'5. Test the kill switch
Test the kill switch in staging at least once per sprint. A kill switch that hasn't been tested is a kill switch that won't work.
Kill switch test scriptBash1234567891011121314151617181920212223242526272829303132333435363738394041424344#!/bin/bash # test-global-killswitch.sh — automated kill switch test echo "=== Global Kill Switch Test ===" # 1. Verify normal traffic echo "[1/3] Testing normal traffic..." STATUS=$(curl -s -o /dev/null -w "%{http_code}" https://api.staging.example.com/health) if [ "$STATUS" != "200" ]; then echo "FAIL: Health check returned $STATUS before kill switch" exit 1 fi echo " ✓ Normal traffic OK" # 2. Engage kill switch echo "[2/3] Engaging kill switch..." curl -s -X PATCH \ "https://api.featuresignals.com/v1/flags/by-key/global-killswitch/environments/staging" \ -H "Authorization: Bearer $FS_API_KEY" \ -H "Content-Type: application/json" \ -d '{"enabled": false}' > /dev/null sleep 30 # Wait for propagation # 3. Verify degradation echo "[3/3] Verifying kill switch degradation..." STATUS=$(curl -s -o /dev/null -w "%{http_code}" https://api.staging.example.com/health) RETRY=$(curl -s -I https://api.staging.example.com/health | grep -i "retry-after" || echo "") if [ "$STATUS" = "503" ] && [ -n "$RETRY" ]; then echo " ✓ Kill switch working (HTTP $STATUS, Retry-After present)" else echo " ✗ Kill switch NOT working (HTTP $STATUS, Retry-After: ${RETRY:-none})" fi # Restore echo "Restoring kill switch..." curl -s -X PATCH \ "https://api.featuresignals.com/v1/flags/by-key/global-killswitch/environments/staging" \ -H "Authorization: Bearer $FS_API_KEY" \ -H "Content-Type: application/json" \ -d '{"enabled": true}' > /dev/null echo "=== Test complete ==="
Best Practices
Default ON, not OFF
The kill switch flag should default to true (ON). If your SDK can't reach FeatureSignals, the application should continue serving traffic — not degrade. The kill switch is a deliberate action, not an accidental state.
Minimize propagation delay
Configure your SDK's polling interval to 15–30 seconds for kill switch flags. In an emergency, every second counts. Consider using the streaming/SSE update mode if your SDK supports it.
Never automate kill switch toggles
Kill switches should only be toggled by humans. Automated toggling (e.g., based on error rate thresholds) can create feedback loops where the kill switch engages, reducing load, which makes the error rate drop, which disengages the kill switch, which restores load...