FeatureSignals

Kill Switch

A global kill switch is your last line of defense — a single flag that can disable an entire subsystem, feature area, or even your whole application in an emergency. This guide covers creating, wiring, testing, and integrating a global kill switch into your incident response workflow.

This is an emergency control

A global kill switch is not a feature flag — it's a circuit breaker. It should only be toggled during incidents by authorized personnel. Every toggle should trigger alerts, create an audit trail, and initiate your incident response process.

Global Kill Switch Architecture

A global kill switch works by placing a flag check at the highest level of your application — before any business logic executes. When toggled OFF, all requests are short-circuited with a controlled degradation response.

┌─────────────────────────────────────────┐
│            Incoming Request               │
└──────────────────┬──────────────────────┘
                   │
                   ▼
┌─────────────────────────────────────────┐
│     Kill Switch Middleware (FIRST)       │
│  ┌───────────────────────────────────┐  │
│  │ boolVariation("global-killswitch") │  │
│  └───────────────┬───────────────────┘  │
│                  │                       │
│         ┌────────┴────────┐             │
│         ▼                 ▼             │
│    [ON: Continue]   [OFF: 503 +        │
│     ▼                Retry-After]      │
│  Normal Request                        │
│  Processing                            │
└─────────────────────────────────────────┘
  1. 1. Create the global kill switch flag

    Create an ops-category flag that will serve as your global circuit breaker. Default to true (ON = application normal; OFF = kill switch engaged).

    Create global kill switchBash
    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    11
    curl -X POST https://api.featuresignals.com/v1/projects/{projectID}/flags \
      -H "Authorization: Bearer $API_KEY" \
      -H "Content-Type: application/json" \
      -d '{
        "key": "global-killswitch",
        "name": "Global Kill Switch",
        "type": "boolean",
        "defaultValue": true,
        "toggleCategory": "ops",
        "description": "EMERGENCY: Global circuit breaker. Flip OFF to immediately degrade all traffic. Toggles are audited and trigger PagerDuty alerts."
      }'
    
  2. 2. Wire it at the highest level of your app

    The kill switch must execute before any business logic — in your HTTP middleware stack, API gateway, or service mesh. Here's how to implement it in various architectures:

    Express/Node.js — Top-level middlewareTypeScript
    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    11
    12
    13
    14
    15
    16
    17
    18
    19
    20
    21
    22
    23
    24
    25
    26
    27
    28
    29
    30
    31
    32
    33
    import express from 'express';
    import { FeatureSignalsClient } from '@featuresignals/node';
    
    
    const client = new FeatureSignalsClient(process.env.FS_API_KEY!, {
      envKey: 'production',
    });
    await client.waitForReady();
    
    
    const app = express();
    
    
    // ⚠️ Global kill switch — MUST be the first middleware
    app.use(async (req, res, next) => {
      const appActive = client.boolVariation(
        'global-killswitch',
        { key: 'global' }, // Global flag — no user context needed
        true, // Default ON — keep serving if SDK unreachable
      );
    
    
      if (!appActive) {
        // Kill switch is OFF — degrade immediately
        res.setHeader('Retry-After', '120');
        res.status(503).json({
          error: 'Service temporarily unavailable',
          message: 'The application is undergoing emergency maintenance.',
          incident_id: req.headers['x-incident-id'] || 'unknown',
        });
        return;
      }
    
    
      next();
    });
    
    
    // ... rest of your middleware and routes
    
    Go — Chi middlewareGo
    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    11
    12
    13
    14
    15
    16
    17
    18
    19
    20
    21
    22
    23
    24
    25
    26
    27
    28
    29
    30
    31
    32
    33
    34
    package middleware
    
    
    import (
        "net/http"
        fs "github.com/featuresignals/sdk-go"
    )
    
    
    // GlobalKillSwitch is the first middleware in the chain.
    // When the kill switch flag is OFF, all requests return 503 immediately.
    func GlobalKillSwitch(client *fs.Client) func(http.Handler) http.Handler {
        return func(next http.Handler) http.Handler {
            return http.HandlerFunc(func(w http.ResponseWriter, r *http.Request) {
                // Check global kill switch with no user context
                active := client.BoolVariation(
                    "global-killswitch",
                    fs.NewContext("global"),
                    true, // Default ON
                )
    
    
                if !active {
                    w.Header().Set("Content-Type", "application/json")
                    w.Header().Set("Retry-After", "120")
                    w.WriteHeader(http.StatusServiceUnavailable)
                    w.Write([]byte(`{
      "error": "Service temporarily unavailable",
      "message": "The application is undergoing emergency maintenance."
    }`))
                    return
                }
    
    
                next.ServeHTTP(w, r)
            })
        }
    }
    
    API Gateway — Kong / NGINXYAML
    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    11
    12
    13
    14
    15
    16
    17
    18
    19
    20
    21
    22
    # Kong declarative config — global kill switch via FeatureSignals
    # Requires a custom plugin or sidecar that checks the flag
    
    
    _format_version: "3.0"
    services:
      - name: my-api
        url: http://my-api-service:8080
        routes:
          - name: api-route
            paths:
              - /api
            plugins:
              - name: featuresignals-killswitch
                config:
                  flag_key: global-killswitch
                  environment_key: production
                  api_key: $FS_API_KEY
                  fallback_status: 503
                  retry_after: 120
                  degradation_message: |
                    The application is undergoing emergency maintenance.
                    Please try again in 2 minutes.
    
  3. 3. Create the emergency procedure

    Document the exact steps for engaging and disengaging the kill switch. This procedure should be in your incident runbook and practiced during fire drills.

    Emergency Kill Switch Procedure

    To ENGAGE (disable traffic):

    1. Declare an incident in your incident management tool
    2. Navigate to the global kill switch flag in FeatureSignals
    3. Toggle the flag OFF for the production environment
    4. Verify your monitoring shows traffic being diverted
    5. Post in #incidents Slack channel with incident ID

    To DISENGAGE (restore traffic):

    1. Confirm the underlying issue is resolved
    2. Toggle the flag ON for the production environment
    3. Monitor error rates and latency for 5 minutes
    4. If stable, resolve the incident
  4. 4. Set up audit alerts

    Every toggle of a global kill switch must be audited and alerted. Configure webhooks to notify your incident management tools:

    Create webhook for kill switch togglesBash
    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    11
    12
    curl -X POST https://api.featuresignals.com/v1/webhooks \
      -H "Authorization: Bearer $API_KEY" \
      -H "Content-Type: application/json" \
      -d '{
        "url": "https://hooks.slack.com/services/YOUR/SLACK/WEBHOOK",
        "events": ["flag.environment.updated"],
        "filter": {
          "flag_keys": ["global-killswitch"],
          "environments": ["production"]
        },
        "description": "Alert #incidents when global kill switch is toggled"
      }'
    
  5. 5. Test the kill switch

    Test the kill switch in staging at least once per sprint. A kill switch that hasn't been tested is a kill switch that won't work.

    Kill switch test scriptBash
    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    11
    12
    13
    14
    15
    16
    17
    18
    19
    20
    21
    22
    23
    24
    25
    26
    27
    28
    29
    30
    31
    32
    33
    34
    35
    36
    37
    38
    39
    40
    41
    42
    43
    44
    #!/bin/bash
    # test-global-killswitch.sh — automated kill switch test
    
    
    echo "=== Global Kill Switch Test ==="
    
    
    # 1. Verify normal traffic
    echo "[1/3] Testing normal traffic..."
    STATUS=$(curl -s -o /dev/null -w "%{http_code}" https://api.staging.example.com/health)
    if [ "$STATUS" != "200" ]; then
      echo "FAIL: Health check returned $STATUS before kill switch"
      exit 1
    fi
    echo "  ✓ Normal traffic OK"
    
    
    # 2. Engage kill switch
    echo "[2/3] Engaging kill switch..."
    curl -s -X PATCH \
      "https://api.featuresignals.com/v1/flags/by-key/global-killswitch/environments/staging" \
      -H "Authorization: Bearer $FS_API_KEY" \
      -H "Content-Type: application/json" \
      -d '{"enabled": false}' > /dev/null
    
    
    sleep 30  # Wait for propagation
    
    
    # 3. Verify degradation
    echo "[3/3] Verifying kill switch degradation..."
    STATUS=$(curl -s -o /dev/null -w "%{http_code}" https://api.staging.example.com/health)
    RETRY=$(curl -s -I https://api.staging.example.com/health | grep -i "retry-after" || echo "")
    
    
    if [ "$STATUS" = "503" ] && [ -n "$RETRY" ]; then
      echo "  ✓ Kill switch working (HTTP $STATUS, Retry-After present)"
    else
      echo "  ✗ Kill switch NOT working (HTTP $STATUS, Retry-After: ${RETRY:-none})"
    fi
    
    
    # Restore
    echo "Restoring kill switch..."
    curl -s -X PATCH \
      "https://api.featuresignals.com/v1/flags/by-key/global-killswitch/environments/staging" \
      -H "Authorization: Bearer $FS_API_KEY" \
      -H "Content-Type: application/json" \
      -d '{"enabled": true}' > /dev/null
    
    
    echo "=== Test complete ==="
    

Best Practices

Default ON, not OFF

The kill switch flag should default to true (ON). If your SDK can't reach FeatureSignals, the application should continue serving traffic — not degrade. The kill switch is a deliberate action, not an accidental state.

Minimize propagation delay

Configure your SDK's polling interval to 15–30 seconds for kill switch flags. In an emergency, every second counts. Consider using the streaming/SSE update mode if your SDK supports it.

Never automate kill switch toggles

Kill switches should only be toggled by humans. Automated toggling (e.g., based on error rate thresholds) can create feedback loops where the kill switch engages, reducing load, which makes the error rate drop, which disengages the kill switch, which restores load...

Next Steps