Ensuring reliable system startup, fault detection, and watchdog protection
Power-On Reset (POR) ensures all digital ICs begin operation from a known state after power is applied. The reset signal must be held active (typically LOW) long enough for all power rails to stabilize, oscillators to start, and internal IC initialization to complete. POR timing must account for the slowest rail to reach regulation, the longest oscillator startup, and any IC-specific initialization requirements.
POR is generated by dedicated supervisor ICs, internal MCU POR circuits, or external RC delay networks.
If reset is released before power is stable, the MCU may start executing code while voltage is still rising, leading to corrupted flash reads, random register states, and undefined behavior. In systems with multiple ICs, releasing reset at different times can cause bus contention (one IC operating while another is still in reset with undefined output states). Unreliable POR is the leading cause of "device sometimes doesn't boot" field failures that are nearly impossible to reproduce in the lab.
Robust POR with supervisor IC:
Supervisor: TPS3839K33 (3.3V rail monitoring) Threshold: 3.08V (releases reset when VCC > 3.08V, ~93% of 3.3V) Reset pulse width: 200ms (adjustable with external cap, CT pin) Output: Active-low open-drain (can wire-OR with other reset sources) Timing analysis: TPS62130 soft-start: 3ms to reach 3.3V regulation Crystal startup: 5ms worst case at -40C TPS3839 detects VCC > 3.08V at t = 2.8ms (just before regulation complete) Holds reset LOW for 200ms additional Reset released at t = 202.8ms Both regulator (stable since 3ms) and crystal (stable since 8ms) are solid. MCU begins execution with all supplies stable and clock valid. Reliable boot.
RC POR only (unreliable): 10k resistor + 100nF capacitor creates POR delay of RC = 1ms. Power rail takes 5ms to stabilize (regulator soft-start). Reset releases at 1ms while power is still at 2.1V (below 2.7V minimum operating voltage for MCU). MCU reads garbage from flash, jumps to random address, and either hangs or executes random code. Sometimes it works (fast power supply), sometimes it doesn't (slow supply or low input voltage). "Intermittent boot failure" bug filed. Takes 6 months to diagnose.
KiCad: Include supervisor IC in schematic with clear connection to all reset pins. Document timing requirements. Use text notes showing timing budget.
Altium: Simulate POR timing with transient analysis including power supply startup ramp. Verify reset timing against all IC requirements simultaneously.
OrCAD/PSpice: Simulate power-up sequence with all regulators, supervisor, and clock startup. Plot reset signal vs. VCC and clock amplitude to verify sequencing.
Brownout detection (BOD/BOR - Brown-Out Reset) monitors the power supply voltage during operation and triggers a reset if it drops below a critical threshold. This prevents the MCU from continuing to execute code at a voltage where the flash memory, SRAM, and logic cannot function reliably. The threshold must be set above the minimum operating voltage of all components on that rail but below the normal regulated voltage (accounting for transient dips).
BOD provides runtime protection against supply droop - complementing POR which only handles startup.
Without brownout detection, a brief power supply dip (caused by motor starting, RF transmission, or poor supply) can cause the MCU to execute corrupted instructions without resetting. This can corrupt EEPROM/flash data (partial write operations), send invalid commands to actuators (safety risk in motor control or medical devices), or enter unrecoverable states requiring manual power cycle. Brownout detection is mandatory for safety-critical applications and highly recommended for all designs.
STM32F407 BOR configuration:
STM32F407 at 168MHz: Minimum VDD = 2.7V (from datasheet, for full-speed operation) Normal operating voltage: 3.3V +/- 5% = 3.135V to 3.465V Expected transient dips: up to 200mV during WiFi TX bursts BOR threshold options (STM32F4 Option Bytes): BOR Level 1: Reset below 2.1V (too low - MCU unreliable below 2.7V) BOR Level 2: Reset below 2.4V (still too low for 168MHz operation) BOR Level 3: Reset below 2.7V (matches minimum operating voltage) Selected: BOR Level 3 (2.7V threshold) Hysteresis: 100mV (resets at 2.7V falling, releases at 2.8V rising) Normal dips: 3.3V - 0.2V = 3.1V (above 2.7V - no false trigger) BOD only triggers on genuine brownout events. CORRECT. External backup: TPS3839K33 supervisor also monitors (threshold 3.08V) Provides clean hardware reset even if MCU BOR is misconfigured.
BOD disabled: Engineer disables brownout detection "because it was causing resets during WiFi transmission" (power supply inadequate, not a BOD problem). During a power dip to 2.2V lasting 50ms (weak USB cable at maximum current), MCU continues executing at unreliable voltage. Flash read returns corrupted data. MCU writes 0xFF to configuration EEPROM (corruption). After voltage recovers, system starts with default configuration - customer loses all settings. Worse: in motor control application, invalid PWM duty cycle command at low voltage could cause motor runaway.
KiCad: Document BOD threshold on schematic near MCU. Show relationship between minimum operating voltage, BOD threshold, and normal rail voltage.
Altium: Include BOD configuration in MCU setup documentation on schematic. Cross-reference with power supply worst-case analysis.
OrCAD: Annotate BOD thresholds on schematic. Include power supply droop simulation results showing BOD triggers before MCU minimum operating point.
A watchdog timer (WDT) is a hardware countdown timer that resets the system if firmware fails to periodically "kick" (reset) the timer before it expires. It detects firmware lockups, infinite loops, deadlocks, and corruption that prevent normal execution. The timeout period must be long enough for normal operation but short enough to detect faults quickly. Hardware watchdogs (external IC) are more reliable than software watchdogs (internal MCU peripheral) because they cannot be disabled by corrupted firmware.
Watchdogs are the last line of defense against firmware failures that would otherwise require manual power cycling.
Without a watchdog, a firmware bug (null pointer, stack overflow, infinite loop, deadlock) causes the system to hang permanently until someone manually power cycles it. For remote/unattended systems (IoT sensors, industrial controllers, satellite), there is nobody to press the reset button - the system stays dead until maintenance arrives. A watchdog automatically recovers the system within seconds of any firmware failure, making the difference between a momentary glitch and a permanent system outage requiring truck roll ($500-$5000 per visit).
Dual watchdog system:
Internal watchdog: STM32 IWDG
Clock: LSI (32kHz)
Prescaler: /128, Reload: 4095
Timeout: 128/32000 * 4095 = 16.38 seconds (maximum)
Configured: Prescaler /32, Reload: 1000 --> Timeout = 1 second
Firmware kicks every 500ms in main loop (50% margin)
Cannot be disabled once started (hardware lock)
External watchdog: TPS3813K33 (for MCU hang detection)
Timeout: 1.6 seconds (fixed, selected by part number suffix)
Kick input: must see rising edge within timeout window
MCU GPIO toggles watchdog kick pin every 500ms
If MCU firmware hangs OR MCU itself fails (latch-up, clock loss):
External WDT triggers hardware reset on nRESET pin
Resets entire system including MCU, peripherals, and power sequencing
Both watchdogs must be satisfied - single point of failure eliminated.
No watchdog or software-only: IoT sensor deployed in remote agricultural field (nearest technician: 2-hour drive). No hardware watchdog. Software watchdog in firmware ISR - but a stack overflow corrupts the ISR vector table, so the watchdog ISR never fires. System hangs permanently. Customer doesn't notice for 3 days (missing data). Technician dispatched for $500 service call to press reset button. Happens again 2 weeks later. Product returned. Reputation damaged.
KiCad: Include external watchdog IC on schematic with clear connection to MCU GPIO (kick) and system nRESET (output). Document timeout period on schematic.
Altium: Add watchdog configuration parameters to schematic notes. Include timing diagram showing kick interval vs. timeout window.
OrCAD: Place watchdog in reset/supervisory schematic section. Document all connections and timing requirements in schematic text annotations.
Reset distribution ensures that ALL ICs in the system that have reset inputs receive a synchronized reset signal. This includes the MCU, Ethernet PHY, USB hubs, FPGAs, display controllers, wireless modules, external ADCs, and any other IC with a reset pin. All devices must enter and exit reset together (or in a defined sequence) to prevent bus conflicts and ensure clean initialization.
A common nRESET bus, driven by the supervisor IC, ensures system-wide coordinated startup.
If the MCU resets but the Ethernet PHY does not, the PHY may continue transmitting stale data while the MCU reinitializes. If an FPGA holds its outputs active while the MCU resets, bus contention occurs. If a USB hub doesn't reset with the system, it may hold the bus in an invalid state, preventing re-enumeration. Incomplete reset distribution causes: initialization failures, communication lockups, and states that only clear with a full power cycle (defeating the purpose of having a reset at all).
Complete reset distribution:
Reset source: TPS3839K33 (open-drain output, can sink 10mA) Reset bus: nSYS_RESET (active-low, 10k pull-up to 3.3V) Connected devices: - STM32F407 nRST pin (requires > 20us minimum reset pulse) - KSZ8081 Ethernet PHY nRESET (requires > 10ms pulse, 25MHz clock must be valid) - W25Q128 Flash (no reset pin - held inactive by nCS during MCU reset) - SX1276 LoRa nRESET (requires > 100us pulse) - ILI9341 LCD nRESET (requires > 10us pulse) - Manual reset button (pulls nSYS_RESET LOW through 100R + debounce cap) Supervisor output pulse: 200ms (exceeds ALL requirements) Pull-up ensures clean HIGH when supervisor releases reset. All devices reset and release simultaneously. System initializes cleanly.
Partial reset: Supervisor only connected to MCU nRESET. Ethernet PHY nRESET tied directly to VCC (permanently out of reset). After watchdog timeout, MCU resets and re-initializes. Ethernet PHY retains old state - its TX is still active from previous operation. MCU tries to initialize PHY via MDIO but PHY is in unknown state. MDIO communication fails. Ethernet non-functional until full power cycle. Watchdog recovery is ineffective for network-related hangs.
KiCad: Use a single net label (nSYS_RESET) for the common reset bus. Trace all connections in net inspector. Verify all reset pins are connected to this net.
Altium: Create "Reset" net class. Use Net List to verify all reset pins are connected. Generate connectivity report for the reset net.
OrCAD: Use cross-reference report to find all instances of the reset net. Verify complete connectivity. Check net has correct driver (supervisor output) and all loads (IC reset pins).
A manual reset button allows engineers and users to force a system reset without power cycling. The button's mechanical contacts bounce when pressed and released, generating multiple rapid transitions (typically 1-10ms of bouncing) that can cause multiple reset pulses. Debouncing ensures a clean, single reset event. Additionally, the button circuit must not inject noise onto the reset line during normal operation.
Debouncing is implemented with an RC filter (simple) or a dedicated debounce IC (complex but more reliable).
Without debouncing, a single button press can generate 5-50 rapid reset pulses in 10ms. Most MCUs tolerate this (they simply stay in reset), but some devices with reset counters or reset state machines may enter unexpected states. More importantly, a poorly filtered reset button can couple noise from button press EMI onto the reset line, causing spurious resets during vibration (automotive, industrial) or when the button is pressed firmly (capacitive coupling from finger through button housing).
Properly debounced reset button:
Circuit: nRESET line <--[10k pull-up to VCC]-->
Supervisor (open-drain) --+-- [100nF cap to GND]
|
[100R series] -- [Button] -- GND
Operation:
Normal: nRESET held HIGH by 10k pull-up. Supervisor open-drain is high-Z.
Supervisor reset: Supervisor pulls nRESET LOW (overcomes 10k pull-up).
Manual reset: Button pulls nRESET LOW through 100R.
100nF cap provides debouncing: RC = 10k * 100nF = 1ms (fast enough for clean reset)
100R limits peak current when button pressed (protects supervisor output if momentarily driving HIGH)
Both reset sources can independently assert reset without conflict.
Undebounced and conflicting: Button connects directly to nRESET with no series resistor and no filter cap. Supervisor IC has push-pull output driving nRESET HIGH. When button is pressed: short circuit between supervisor's HIGH output and ground through button. Supervisor output fights button at 20mA+ current. Supervisor IC damaged over time. No debouncing means 20 glitches per button press. Also: button has long wire (antenna) picking up EMI from nearby motor, causing random resets during motor operation.
KiCad: Place reset button circuit near the supervisor IC on schematic. Include RC components clearly. Note "DEBOUNCE" in component grouping.
Altium: Include debounce components as part of the reset subsection. Verify no conflicts between push-pull outputs and button with DRC rules.
OrCAD: Document the reset button circuit with all debounce components. Verify supervisor output type (OD vs PP) compatibility with button in design notes.
Reset sequence documentation formally records the complete reset behavior of the system: which events trigger reset, the order of reset assertion and deassertion, timing relationships between resets of different subsystems, and the expected initialization sequence after reset release. This document serves as the reference for firmware development, debugging, and verification that hardware and firmware agree on system startup behavior.
Documentation should include timing diagrams showing all reset-related signals vs. time during power-up, watchdog reset, and manual reset scenarios.
Without reset sequence documentation, firmware engineers make assumptions about hardware reset behavior that may be wrong. Hardware engineers make assumptions about what firmware will initialize. The gap between these assumptions causes integration bugs: firmware tries to access a peripheral before its reset is released, or hardware releases reset before firmware expects, or debug tools interfere with reset sequencing. Clear documentation eliminates these gaps and speeds up board bring-up significantly.
Reset sequence timing documentation:
POWER-UP SEQUENCE: t=0ms: VCC_3V3 starts rising (TPS62130 soft-start begins) t=3ms: VCC_3V3 reaches 3.08V -- supervisor TPS3839 detects power-good t=3ms: nSYS_RESET held LOW by supervisor (200ms timer starts) t=5ms: VCC_3V3 reaches 3.3V regulation t=8ms: HSE crystal oscillation stable (worst-case cold startup) t=203ms: Supervisor releases nSYS_RESET (goes HIGH) t=203ms: STM32, Ethernet PHY, LoRa module all exit reset simultaneously t=204ms: STM32 begins execution (internal HSI clock initially) t=210ms: STM32 firmware switches to HSE/PLL clock (48MHz USB ready) t=250ms: Full system initialization complete, peripherals configured RESET SOURCES: 1. POR: Supervisor triggers on VCC < 3.08V (200ms hold) 2. BOR: STM32 internal, threshold 2.7V (immediate reset) 3. WDT: External TPS3813 timeout 1.6s (drives nSYS_RESET LOW) 4. Manual: Button press (immediate, held as long as pressed + RC debounce) 5. Software: STM32 NVIC_SystemReset() (internal only, does NOT reset PHY/radio)
No documentation: No timing diagram, no reset source list, no description of expected boot behavior. Firmware engineer assumes "reset happens, then my code runs." Does not wait for Ethernet PHY to complete its internal 100ms initialization after reset release - tries to configure PHY via MDIO immediately, reads all zeros, configures wrong link speed. Only works when MCU startup code (flash wait states, PLL config) accidentally takes long enough for PHY to be ready. Fails randomly when optimization flags change compilation.
KiCad: Add timing diagram as a graphical annotation on the reset/power schematic sheet. Use text frames for detailed timing description.
Altium: Create a dedicated documentation sheet in the schematic project showing reset timing. Use drawing primitives for timing diagrams.
OrCAD: Add reset sequence as a schematic page with timing diagram drawn using graphical primitives. Reference from other sheets with text notes.