RUE Logo

Module 1.9 - Reset & Supervisory

Ensuring reliable system startup, fault detection, and watchdog protection

1. Power-On Reset Timing Critical

What It Is

Power-On Reset (POR) ensures all digital ICs begin operation from a known state after power is applied. The reset signal must be held active (typically LOW) long enough for all power rails to stabilize, oscillators to start, and internal IC initialization to complete. POR timing must account for the slowest rail to reach regulation, the longest oscillator startup, and any IC-specific initialization requirements.

POR is generated by dedicated supervisor ICs, internal MCU POR circuits, or external RC delay networks.

Why It Matters

If reset is released before power is stable, the MCU may start executing code while voltage is still rising, leading to corrupted flash reads, random register states, and undefined behavior. In systems with multiple ICs, releasing reset at different times can cause bus contention (one IC operating while another is still in reset with undefined output states). Unreliable POR is the leading cause of "device sometimes doesn't boot" field failures that are nearly impossible to reproduce in the lab.

How to Check - Step by Step

  1. Identify the POR source: internal MCU POR circuit, external supervisor IC (TPS3839, MAX809), or RC network.
  2. Determine the slowest power rail startup time (from regulator datasheet soft-start specification).
  3. Determine oscillator startup time (crystal worst case at cold temperature).
  4. Verify POR hold time exceeds: max(rail_startup_time, oscillator_startup_time) + safety margin (typically 10ms extra).
  5. For external supervisor: verify threshold voltage is set correctly (just below the minimum operating voltage of the supervised IC).
  6. Check POR behavior during slow power ramp: if power rises very slowly (> 100ms), some simple RC POR circuits release too early.
  7. Verify POR release is monotonic (no glitches that could cause partial reset of some ICs).

Robust POR with supervisor IC:

Supervisor: TPS3839K33 (3.3V rail monitoring)
  Threshold: 3.08V (releases reset when VCC > 3.08V, ~93% of 3.3V)
  Reset pulse width: 200ms (adjustable with external cap, CT pin)
  Output: Active-low open-drain (can wire-OR with other reset sources)

Timing analysis:
  TPS62130 soft-start: 3ms to reach 3.3V regulation
  Crystal startup: 5ms worst case at -40C
  TPS3839 detects VCC > 3.08V at t = 2.8ms (just before regulation complete)
  Holds reset LOW for 200ms additional
  Reset released at t = 202.8ms
  Both regulator (stable since 3ms) and crystal (stable since 8ms) are solid.
  MCU begins execution with all supplies stable and clock valid. Reliable boot.

RC POR only (unreliable): 10k resistor + 100nF capacitor creates POR delay of RC = 1ms. Power rail takes 5ms to stabilize (regulator soft-start). Reset releases at 1ms while power is still at 2.1V (below 2.7V minimum operating voltage for MCU). MCU reads garbage from flash, jumps to random address, and either hangs or executes random code. Sometimes it works (fast power supply), sometimes it doesn't (slow supply or low input voltage). "Intermittent boot failure" bug filed. Takes 6 months to diagnose.

KiCad: Include supervisor IC in schematic with clear connection to all reset pins. Document timing requirements. Use text notes showing timing budget.

Altium: Simulate POR timing with transient analysis including power supply startup ramp. Verify reset timing against all IC requirements simultaneously.

OrCAD/PSpice: Simulate power-up sequence with all regulators, supervisor, and clock startup. Plot reset signal vs. VCC and clock amplitude to verify sequencing.

  • Internal POR reliance: Many MCU internal POR circuits have fixed thresholds and no adjustable hold time. They may not account for external peripherals needing longer initialization.
  • Slow power ramp: If power takes 500ms to ramp (large input capacitance, weak supply), a simple RC POR releases too early. Use a supervisor IC with voltage-level detection instead of time-delay.
  • Power cycling too fast: If power drops and rises again quickly (< POR delay), the MCU may not reset properly. Supervisor ICs with power-fail detection handle this correctly.

2. Brownout Detection Threshold Critical

What It Is

Brownout detection (BOD/BOR - Brown-Out Reset) monitors the power supply voltage during operation and triggers a reset if it drops below a critical threshold. This prevents the MCU from continuing to execute code at a voltage where the flash memory, SRAM, and logic cannot function reliably. The threshold must be set above the minimum operating voltage of all components on that rail but below the normal regulated voltage (accounting for transient dips).

BOD provides runtime protection against supply droop - complementing POR which only handles startup.

Why It Matters

Without brownout detection, a brief power supply dip (caused by motor starting, RF transmission, or poor supply) can cause the MCU to execute corrupted instructions without resetting. This can corrupt EEPROM/flash data (partial write operations), send invalid commands to actuators (safety risk in motor control or medical devices), or enter unrecoverable states requiring manual power cycle. Brownout detection is mandatory for safety-critical applications and highly recommended for all designs.

How to Check - Step by Step

  1. Identify the minimum reliable operating voltage for the MCU (from datasheet "Operating Conditions" table, e.g., VDD_min = 2.7V for STM32F4 at full speed).
  2. Verify BOD threshold is set ABOVE this minimum (e.g., BOR Level 3 on STM32 = 2.82V > 2.7V minimum).
  3. Check that BOD threshold is BELOW normal operating voltage minus expected transient dips: threshold < 3.3V - 200mV(transient) = 3.1V.
  4. Verify BOD has hysteresis to prevent oscillation: typical 100-200mV between trip and release points.
  5. Check BOD response time: should be fast enough to reset BEFORE the MCU can execute corrupted code (typically < 10us).
  6. Verify behavior after BOD reset: clean restart (same as POR) or can retain SRAM contents for crash recovery.
  7. For external supervisor: verify supervisor threshold matches MCU minimum operating voltage (not just the rail's nominal voltage).

STM32F407 BOR configuration:

STM32F407 at 168MHz: Minimum VDD = 2.7V (from datasheet, for full-speed operation)
Normal operating voltage: 3.3V +/- 5% = 3.135V to 3.465V
Expected transient dips: up to 200mV during WiFi TX bursts

BOR threshold options (STM32F4 Option Bytes):
  BOR Level 1: Reset below 2.1V (too low - MCU unreliable below 2.7V)
  BOR Level 2: Reset below 2.4V (still too low for 168MHz operation)
  BOR Level 3: Reset below 2.7V (matches minimum operating voltage)

Selected: BOR Level 3 (2.7V threshold)
  Hysteresis: 100mV (resets at 2.7V falling, releases at 2.8V rising)
  Normal dips: 3.3V - 0.2V = 3.1V (above 2.7V - no false trigger)
  BOD only triggers on genuine brownout events. CORRECT.

External backup: TPS3839K33 supervisor also monitors (threshold 3.08V)
  Provides clean hardware reset even if MCU BOR is misconfigured.

BOD disabled: Engineer disables brownout detection "because it was causing resets during WiFi transmission" (power supply inadequate, not a BOD problem). During a power dip to 2.2V lasting 50ms (weak USB cable at maximum current), MCU continues executing at unreliable voltage. Flash read returns corrupted data. MCU writes 0xFF to configuration EEPROM (corruption). After voltage recovers, system starts with default configuration - customer loses all settings. Worse: in motor control application, invalid PWM duty cycle command at low voltage could cause motor runaway.

KiCad: Document BOD threshold on schematic near MCU. Show relationship between minimum operating voltage, BOD threshold, and normal rail voltage.

Altium: Include BOD configuration in MCU setup documentation on schematic. Cross-reference with power supply worst-case analysis.

OrCAD: Annotate BOD thresholds on schematic. Include power supply droop simulation results showing BOD triggers before MCU minimum operating point.

  • Disabling BOD to "fix" resets: If BOD triggers during operation, the ROOT CAUSE is inadequate power supply - fix the supply, don't disable the protection.
  • Threshold too close to operating voltage: BOD at 3.2V on a 3.3V rail triggers from normal noise. Need 200mV+ gap between normal minimum and BOD threshold.
  • Speed-dependent minimum: MCU minimum voltage depends on clock speed. At 168MHz: 2.7V min. At 84MHz: 2.4V min. If firmware dynamically changes clock speed, BOD threshold may need adjustment.

3. Watchdog Timer Configuration Critical

What It Is

A watchdog timer (WDT) is a hardware countdown timer that resets the system if firmware fails to periodically "kick" (reset) the timer before it expires. It detects firmware lockups, infinite loops, deadlocks, and corruption that prevent normal execution. The timeout period must be long enough for normal operation but short enough to detect faults quickly. Hardware watchdogs (external IC) are more reliable than software watchdogs (internal MCU peripheral) because they cannot be disabled by corrupted firmware.

Watchdogs are the last line of defense against firmware failures that would otherwise require manual power cycling.

Why It Matters

Without a watchdog, a firmware bug (null pointer, stack overflow, infinite loop, deadlock) causes the system to hang permanently until someone manually power cycles it. For remote/unattended systems (IoT sensors, industrial controllers, satellite), there is nobody to press the reset button - the system stays dead until maintenance arrives. A watchdog automatically recovers the system within seconds of any firmware failure, making the difference between a momentary glitch and a permanent system outage requiring truck roll ($500-$5000 per visit).

How to Check - Step by Step

  1. Verify a watchdog is present: internal MCU watchdog (IWDG on STM32) OR external watchdog IC (TPS3813, MAX6369).
  2. For critical systems: verify BOTH internal AND external watchdog (defense in depth - external watches for MCU failure).
  3. Check timeout period: long enough for worst-case normal operation loop time, short enough for acceptable recovery time.
  4. Verify watchdog cannot be disabled by firmware once started (STM32 IWDG cannot be stopped once enabled - good).
  5. Check window watchdog if available: must kick within a window (not too early, not too late) - detects runaway fast loops too.
  6. For external watchdog: verify kick signal is a specific pattern (toggle, pulse) not just a level (a short circuit to VCC would prevent timeout).
  7. Verify reset output from watchdog is connected to all ICs that need reset (not just MCU if system has other processors).

Dual watchdog system:

Internal watchdog: STM32 IWDG
  Clock: LSI (32kHz)
  Prescaler: /128, Reload: 4095
  Timeout: 128/32000 * 4095 = 16.38 seconds (maximum)
  Configured: Prescaler /32, Reload: 1000 --> Timeout = 1 second
  Firmware kicks every 500ms in main loop (50% margin)
  Cannot be disabled once started (hardware lock)

External watchdog: TPS3813K33 (for MCU hang detection)
  Timeout: 1.6 seconds (fixed, selected by part number suffix)
  Kick input: must see rising edge within timeout window
  MCU GPIO toggles watchdog kick pin every 500ms
  If MCU firmware hangs OR MCU itself fails (latch-up, clock loss):
    External WDT triggers hardware reset on nRESET pin
    Resets entire system including MCU, peripherals, and power sequencing

Both watchdogs must be satisfied - single point of failure eliminated.

No watchdog or software-only: IoT sensor deployed in remote agricultural field (nearest technician: 2-hour drive). No hardware watchdog. Software watchdog in firmware ISR - but a stack overflow corrupts the ISR vector table, so the watchdog ISR never fires. System hangs permanently. Customer doesn't notice for 3 days (missing data). Technician dispatched for $500 service call to press reset button. Happens again 2 weeks later. Product returned. Reputation damaged.

KiCad: Include external watchdog IC on schematic with clear connection to MCU GPIO (kick) and system nRESET (output). Document timeout period on schematic.

Altium: Add watchdog configuration parameters to schematic notes. Include timing diagram showing kick interval vs. timeout window.

OrCAD: Place watchdog in reset/supervisory schematic section. Document all connections and timing requirements in schematic text annotations.

  • Kicking in ISR: If firmware kicks the watchdog in a timer ISR (interrupt), the ISR fires even if the main loop is hung. Watchdog never triggers. Must kick ONLY from main loop or task-level code.
  • Too-long timeout: A 60-second watchdog timeout means the system can be dead for a full minute before recovery. For user-facing products, 1-5 seconds is appropriate.
  • Debug interference: During debugging (JTAG/SWD halt), the watchdog fires and resets the MCU. Provide a mechanism to disable watchdog in debug mode ONLY (debug register freeze).

4. Reset Distribution to All ICs Major

What It Is

Reset distribution ensures that ALL ICs in the system that have reset inputs receive a synchronized reset signal. This includes the MCU, Ethernet PHY, USB hubs, FPGAs, display controllers, wireless modules, external ADCs, and any other IC with a reset pin. All devices must enter and exit reset together (or in a defined sequence) to prevent bus conflicts and ensure clean initialization.

A common nRESET bus, driven by the supervisor IC, ensures system-wide coordinated startup.

Why It Matters

If the MCU resets but the Ethernet PHY does not, the PHY may continue transmitting stale data while the MCU reinitializes. If an FPGA holds its outputs active while the MCU resets, bus contention occurs. If a USB hub doesn't reset with the system, it may hold the bus in an invalid state, preventing re-enumeration. Incomplete reset distribution causes: initialization failures, communication lockups, and states that only clear with a full power cycle (defeating the purpose of having a reset at all).

How to Check - Step by Step

  1. List ALL ICs in the design that have a reset pin (nRESET, RST, RESET_N, etc.).
  2. Trace the reset signal from the supervisor/reset source to verify it reaches every IC's reset pin.
  3. Check that reset polarity is correct for each IC (some are active-LOW, some active-HIGH).
  4. Verify reset line drive strength: supervisor IC can sink/source enough current for all connected reset inputs.
  5. For devices needing longer reset pulses: verify reset pulse width meets ALL devices' minimum requirements (some need 1us, others 10ms).
  6. Check for reset sequence requirements: some devices must exit reset before others (e.g., clock generator before MCU).
  7. Verify manual reset button (if present) also resets all system ICs, not just the MCU.

Complete reset distribution:

Reset source: TPS3839K33 (open-drain output, can sink 10mA)
Reset bus: nSYS_RESET (active-low, 10k pull-up to 3.3V)

Connected devices:
  - STM32F407 nRST pin (requires > 20us minimum reset pulse)
  - KSZ8081 Ethernet PHY nRESET (requires > 10ms pulse, 25MHz clock must be valid)
  - W25Q128 Flash (no reset pin - held inactive by nCS during MCU reset)
  - SX1276 LoRa nRESET (requires > 100us pulse)
  - ILI9341 LCD nRESET (requires > 10us pulse)
  - Manual reset button (pulls nSYS_RESET LOW through 100R + debounce cap)

Supervisor output pulse: 200ms (exceeds ALL requirements)
Pull-up ensures clean HIGH when supervisor releases reset.
All devices reset and release simultaneously. System initializes cleanly.

Partial reset: Supervisor only connected to MCU nRESET. Ethernet PHY nRESET tied directly to VCC (permanently out of reset). After watchdog timeout, MCU resets and re-initializes. Ethernet PHY retains old state - its TX is still active from previous operation. MCU tries to initialize PHY via MDIO but PHY is in unknown state. MDIO communication fails. Ethernet non-functional until full power cycle. Watchdog recovery is ineffective for network-related hangs.

KiCad: Use a single net label (nSYS_RESET) for the common reset bus. Trace all connections in net inspector. Verify all reset pins are connected to this net.

Altium: Create "Reset" net class. Use Net List to verify all reset pins are connected. Generate connectivity report for the reset net.

OrCAD: Use cross-reference report to find all instances of the reset net. Verify complete connectivity. Check net has correct driver (supervisor output) and all loads (IC reset pins).

  • Active-high vs active-low: Most ICs use active-LOW reset. But some (like certain WiFi modules) use active-HIGH. Connecting active-HIGH reset to an active-LOW bus means that device is ALWAYS in reset.
  • Reset pulse too short: Supervisor provides 1ms pulse. Ethernet PHY datasheet requires minimum 10ms. PHY exits reset before internal initialization completes, enters undefined state.
  • Open-drain loading: Too many devices on reset bus with input capacitance causes slow rise time. If rise time exceeds reset input's maximum transition time spec, device may not recognize the reset release.

5. Manual Reset Button Debouncing Minor

What It Is

A manual reset button allows engineers and users to force a system reset without power cycling. The button's mechanical contacts bounce when pressed and released, generating multiple rapid transitions (typically 1-10ms of bouncing) that can cause multiple reset pulses. Debouncing ensures a clean, single reset event. Additionally, the button circuit must not inject noise onto the reset line during normal operation.

Debouncing is implemented with an RC filter (simple) or a dedicated debounce IC (complex but more reliable).

Why It Matters

Without debouncing, a single button press can generate 5-50 rapid reset pulses in 10ms. Most MCUs tolerate this (they simply stay in reset), but some devices with reset counters or reset state machines may enter unexpected states. More importantly, a poorly filtered reset button can couple noise from button press EMI onto the reset line, causing spurious resets during vibration (automotive, industrial) or when the button is pressed firmly (capacitive coupling from finger through button housing).

How to Check - Step by Step

  1. Verify a manual reset button exists in the design (essential for development, useful for field service).
  2. Check that a debounce capacitor (100nF - 1uF) is present from the reset line to ground near the button.
  3. Verify a series resistor (100-1k ohms) limits current from the reset bus through the button (prevents short-circuiting supervisor output).
  4. Calculate the RC time constant and verify it provides sufficient debounce time (> 10ms for most buttons).
  5. Check that the button does not conflict with the supervisor IC output (open-drain outputs allow wire-OR with button; push-pull outputs require isolation).
  6. Verify the button is accessible (debug) or not accessible (product enclosure) as appropriate for the design phase.

Properly debounced reset button:

Circuit: nRESET line <--[10k pull-up to VCC]-->
         Supervisor (open-drain) --+-- [100nF cap to GND]
                                   |
                            [100R series] -- [Button] -- GND

Operation:
  Normal: nRESET held HIGH by 10k pull-up. Supervisor open-drain is high-Z.
  Supervisor reset: Supervisor pulls nRESET LOW (overcomes 10k pull-up).
  Manual reset: Button pulls nRESET LOW through 100R.
    100nF cap provides debouncing: RC = 10k * 100nF = 1ms (fast enough for clean reset)
    100R limits peak current when button pressed (protects supervisor output if momentarily driving HIGH)

Both reset sources can independently assert reset without conflict.

Undebounced and conflicting: Button connects directly to nRESET with no series resistor and no filter cap. Supervisor IC has push-pull output driving nRESET HIGH. When button is pressed: short circuit between supervisor's HIGH output and ground through button. Supervisor output fights button at 20mA+ current. Supervisor IC damaged over time. No debouncing means 20 glitches per button press. Also: button has long wire (antenna) picking up EMI from nearby motor, causing random resets during motor operation.

KiCad: Place reset button circuit near the supervisor IC on schematic. Include RC components clearly. Note "DEBOUNCE" in component grouping.

Altium: Include debounce components as part of the reset subsection. Verify no conflicts between push-pull outputs and button with DRC rules.

OrCAD: Document the reset button circuit with all debounce components. Verify supervisor output type (OD vs PP) compatibility with button in design notes.

  • Output type conflict: If supervisor has push-pull output driving HIGH, pressing the reset button shorts the output to ground. Must use open-drain supervisor or add isolation (diode or series resistor).
  • ESD on button: External reset buttons are user-accessible. They need ESD protection just like any other external port. A TVS on the reset line protects against static discharge when button is pressed.
  • Long button wires: In products where the reset button is on a front panel connected by a wire to the main board, the wire acts as an antenna. Must include filtering at the board end (RC or ferrite + cap).

6. Reset Sequence Documentation Minor

What It Is

Reset sequence documentation formally records the complete reset behavior of the system: which events trigger reset, the order of reset assertion and deassertion, timing relationships between resets of different subsystems, and the expected initialization sequence after reset release. This document serves as the reference for firmware development, debugging, and verification that hardware and firmware agree on system startup behavior.

Documentation should include timing diagrams showing all reset-related signals vs. time during power-up, watchdog reset, and manual reset scenarios.

Why It Matters

Without reset sequence documentation, firmware engineers make assumptions about hardware reset behavior that may be wrong. Hardware engineers make assumptions about what firmware will initialize. The gap between these assumptions causes integration bugs: firmware tries to access a peripheral before its reset is released, or hardware releases reset before firmware expects, or debug tools interfere with reset sequencing. Clear documentation eliminates these gaps and speeds up board bring-up significantly.

How to Check - Step by Step

  1. Verify a reset timing diagram exists (either on the schematic or in a separate design document).
  2. Check that the diagram shows: VCC rise, oscillator start, supervisor threshold, reset release, and PLL lock timing.
  3. Verify all reset sources are documented: POR, brownout, watchdog, manual button, software reset, external signal.
  4. Check that the diagram shows relative timing of multiple device resets (if sequential reset is used).
  5. Verify boot mode pin states are documented in relation to reset timing (what state are BOOT0/BOOT1 during reset?).
  6. Check that the document specifies expected firmware behavior after each type of reset (cold boot vs. watchdog recovery vs. soft reset).

Reset sequence timing documentation:

POWER-UP SEQUENCE:
  t=0ms:     VCC_3V3 starts rising (TPS62130 soft-start begins)
  t=3ms:     VCC_3V3 reaches 3.08V -- supervisor TPS3839 detects power-good
  t=3ms:     nSYS_RESET held LOW by supervisor (200ms timer starts)
  t=5ms:     VCC_3V3 reaches 3.3V regulation
  t=8ms:     HSE crystal oscillation stable (worst-case cold startup)
  t=203ms:   Supervisor releases nSYS_RESET (goes HIGH)
  t=203ms:   STM32, Ethernet PHY, LoRa module all exit reset simultaneously
  t=204ms:   STM32 begins execution (internal HSI clock initially)
  t=210ms:   STM32 firmware switches to HSE/PLL clock (48MHz USB ready)
  t=250ms:   Full system initialization complete, peripherals configured

RESET SOURCES:
  1. POR: Supervisor triggers on VCC < 3.08V (200ms hold)
  2. BOR: STM32 internal, threshold 2.7V (immediate reset)
  3. WDT: External TPS3813 timeout 1.6s (drives nSYS_RESET LOW)
  4. Manual: Button press (immediate, held as long as pressed + RC debounce)
  5. Software: STM32 NVIC_SystemReset() (internal only, does NOT reset PHY/radio)

No documentation: No timing diagram, no reset source list, no description of expected boot behavior. Firmware engineer assumes "reset happens, then my code runs." Does not wait for Ethernet PHY to complete its internal 100ms initialization after reset release - tries to configure PHY via MDIO immediately, reads all zeros, configures wrong link speed. Only works when MCU startup code (flash wait states, PLL config) accidentally takes long enough for PHY to be ready. Fails randomly when optimization flags change compilation.

KiCad: Add timing diagram as a graphical annotation on the reset/power schematic sheet. Use text frames for detailed timing description.

Altium: Create a dedicated documentation sheet in the schematic project showing reset timing. Use drawing primitives for timing diagrams.

OrCAD: Add reset sequence as a schematic page with timing diagram drawn using graphical primitives. Reference from other sheets with text notes.

  • Software reset is different: Software reset (NVIC_SystemReset) only resets the MCU core. External ICs maintain their state. Firmware must handle this differently from power-on reset.
  • Document staleness: Reset documentation created during design but never updated when reset supervisor was changed or timeout modified. Current schematic doesn't match document.
  • Debug mode differences: During JTAG debugging, reset behavior changes (some peripherals don't reset, watchdog may be frozen). Document this to avoid confusion during development.