# ERROR-MODEL.md ## Scope This document defines Kron’s error model. It specifies: * Error categories * Error propagation rules * Handling semantics * Logging requirements * Failure guarantees This applies to: * `kron-core` * `krond` * `kron-operator` --- ## Design Principles 1. Errors must be explicit. 2. Errors must never cause duplicate execution. 3. Errors must never cause unbounded replay. 4. Determinism must not be compromised by errors. 5. Invalid configuration must fail fast. 6. Runtime failures must degrade safely. --- ## Error Categories Errors are classified as: * `ConfigurationError` * `ValidationError` * `SchedulingError` * `ConstraintError` * `ExecutionError` * `PersistenceError` * `SystemError` * `IncompatibleStateError` Each category has defined behavior. --- ## ConfigurationError Raised when: * Syntax is invalid. * Required fields missing. * Unknown distribution. * Unknown seed strategy. * Invalid parameters. * Invalid timezone. * Negative durations. ### Behavior * In `krond`: daemon refuses to start. * In `kron-operator`: resource marked invalid; reconciliation stops for that resource. * In `kron-core`: error returned, no decision produced. No scheduling must occur for invalid configuration. --- ## ValidationError Raised when configuration is syntactically valid but semantically invalid. Examples: * Window duration too large. * Constraint clauses malformed. * Distribution parameters out of allowed range. ### Behavior Same as `ConfigurationError`. --- ## SchedulingError Raised during decision computation when engine cannot proceed. Examples: * Internal invariant violation. * PRNG initialization failure. * Hash computation failure. ### Behavior * `kron-core` returns error. * Adapter logs error. * No execution occurs. * Period is not marked handled. * Retry allowed on next reconciliation or loop iteration. Scheduling errors must not advance period state. --- ## ConstraintError Raised when: * Constraint evaluation fails due to malformed clause. * Timezone resolution fails during constraint evaluation. ### Behavior * Treated as `ValidationError` if static. * Treated as `SchedulingError` if dynamic. * No execution occurs. --- ## Unschedulable Condition Not an error. Occurs when: * Valid configuration * Valid scheduling * No candidate satisfies constraints within sampling budget ### Behavior * Period outcome is `unschedulable`. * Period is marked handled. * No retry. * Logged at `WARN`. --- ## ExecutionError Occurs when: * Fork fails. * Exec fails. * Permission drop fails. * Command not found. * Process exits with non-zero code. ### Behavior * If fork/exec fails before process starts: * Period outcome is `executed` only if process was created. * If process was not created, treat as `missed` only if deadline exceeded. * Otherwise log error and do not mark handled until explicit outcome determined. * If process exits non-zero: * Period outcome is `executed`. * Exit code recorded. * No automatic retry. Execution failures do not trigger retries unless explicitly implemented in future versions. --- ## PersistenceError Occurs when: * State file write fails. * fsync fails. * Rename fails. * State file unreadable. * Migration fails. ### Behavior Before execution: * If state cannot be read safely: * Daemon must refuse to start. After execution begins: * If state write fails after marking execution started: * Process must be terminated. * Fatal error. * Daemon exits. After terminal outcome: * If state write fails: * Fatal error. * Daemon exits. Persistence integrity is mandatory for idempotency. --- ## SystemError Occurs when: * PID verification fails. * OS-level resource exhaustion. * File descriptor exhaustion. * Lock acquisition fails. ### Behavior * If critical to correctness: * Fatal. * Daemon exits. * If transient: * Log error. * Retry with backoff. * Do not mark period handled. System errors must never silently skip execution. --- ## IncompatibleStateError Occurs when: * State version unsupported. * Migration fails. * Required fields missing in state file. ### Behavior * Daemon refuses to start. * No scheduling occurs. * Explicit log at `ERROR`. --- ## Deadline Interaction If evaluation occurs after deadline: * Period outcome is `missed`. * This is not an error. * Logged at `INFO`. Deadline expiration must not be treated as failure. --- ## Concurrency Conflicts If: * `forbid` and active execution exists: * Period outcome is `skipped`. * Not an error. If: * `replace` and termination fails: * Log `ERROR`. * Do not start new execution. * Period remains unhandled. * Retry permitted until deadline exceeded. --- ## Clock Anomalies If system clock jumps forward: * Evaluate current period. * Apply deadline rules. * No error. If system clock jumps backward: * Already handled periods must not re-execute. * No error. * Logged at `WARN`. Clock changes are not treated as failures. --- ## Partial Failure Handling If: * Decision computed successfully * Trigger attempted * State write fails Daemon must exit immediately to prevent duplicate execution. --- ## Retry Rules Retries are allowed only for: * Transient scheduling errors * Transient system errors before execution begins Retries must: * Not alter seed inputs * Not alter decision * Not generate a new chosen time Retries must not create new periods. --- ## Error Logging Contract Every error must log: * `identity` * `period_id` (if applicable) * `component` * `error_type` * `operation` * `message` Fatal errors must log at `ERROR` level before exit. --- ## Fatal Conditions Daemon must exit on: * State corruption without recovery * State write failure after execution start * Lock acquisition failure * Incompatible state version * Irrecoverable persistence error --- ## Non-Fatal Conditions Daemon must continue operation on: * Single job configuration error (if multi-job environment) * Execution non-zero exit code * Missed deadlines * Unschedulable period * Constraint rejection --- ## Invariants Under Error Kron guarantees: 1. No duplicate execution for same period. 2. No execution without a valid decision. 3. No execution when configuration invalid. 4. No execution beyond deadline. 5. Fatal persistence errors prevent further scheduling. 6. Unschedulable periods are terminal. 7. Execution failure does not trigger implicit retry. --- ## Adapter-Specific Notes ### krond * Fatal persistence errors require immediate shutdown. * State integrity is mandatory. ### kron-operator * Errors must surface as Kubernetes Events and Conditions. * Reconciliation must be idempotent. * Controller must not create duplicate Jobs. --- ## Summary Errors in Kron are: * Categorized. * Explicit. * Logged. * Deterministic in handling. * Never allowed to violate idempotency or determinism. Correctness and safety take precedence over availability.