nlsq.streaming.telemetry

Telemetry and monitoring for the defense layer strategy.

Added in version 1.2.0: Extracted from nlsq.streaming.adaptive_hybrid for modularity.

This module provides telemetry infrastructure for the 4-layer defense strategy used in adaptive hybrid streaming optimization. It tracks activation counts, timing, and effectiveness metrics for each defense layer.

Defense Layers

The telemetry system monitors four defense layers:

  1. Layer 1 - Warm Start: Detects when initial parameters are close to optimal

  2. Layer 2 - Adaptive Step Size: Monitors step size adjustments

  3. Layer 3 - Cost Guard: Tracks cost increase rejections

  4. Layer 4 - Step Clipping: Records step size limiting events

Classes

DefenseLayerTelemetry

Main telemetry class that collects and reports on defense layer activity.

Module Contents

Telemetry for monitoring defense layer activations.

This module provides telemetry tracking for the 4-layer defense strategy used during L-BFGS warmup in the adaptive hybrid streaming optimizer.

class nlsq.streaming.telemetry.DefenseLayerTelemetry[source]

Bases: object

Telemetry for monitoring 4-layer defense strategy activations.

Tracks when each defense layer is triggered during warmup to help with production monitoring and tuning. This class maintains thread-safe statistics that can be queried or exported for monitoring dashboards.

The 4 layers tracked are:
  • Layer 1: Warm start detection (skips warmup)

  • Layer 2: Adaptive step size selection (refinement/careful/exploration)

  • Layer 3: Cost-increase guard (aborts warmup if loss increases)

  • Layer 4: Step clipping (limits update magnitude)

layer1_warm_start_triggers

Count of warm start detection activations (warmup skipped)

Type:

int

layer2_lr_mode_counts

Counts per LR mode: {“refinement”: n, “careful”: m, “exploration”: k}

Type:

dict[str, int]

layer3_cost_guard_triggers

Count of cost-increase guard aborts

Type:

int

layer4_clip_triggers

Count of step clipping activations

Type:

int

total_warmup_calls

Total number of warmup phase executions

Type:

int

__init__()[source]

Initialize telemetry with zeroed counters.

reset()[source]

Reset all telemetry counters to zero.

record_warmup_start()[source]

Record start of a warmup phase.

record_layer1_trigger(relative_loss, threshold)[source]

Record Layer 1 warm start detection trigger.

Parameters:
  • relative_loss (float) – Relative loss that triggered warm start

  • threshold (float) – Threshold value that was exceeded

record_layer2_lr_mode(mode, relative_loss)[source]

Record Layer 2 adaptive LR mode selection.

Parameters:
  • mode (str) – Selected LR mode: “refinement”, “careful”, “exploration”, or “fixed”

  • relative_loss (float) – Relative loss that determined the mode

record_layer3_trigger(cost_ratio, tolerance, iteration)[source]

Record Layer 3 cost-increase guard trigger.

Parameters:
  • cost_ratio (float) – Cost increase ratio that triggered the guard

  • tolerance (float) – Tolerance threshold that was exceeded

  • iteration (int) – Iteration number when triggered

record_layer4_clip(original_norm, max_norm)[source]

Record Layer 4 step clipping activation.

Parameters:
  • original_norm (float) – Original update norm before clipping

  • max_norm (float) – Maximum allowed norm (clipping threshold)

record_lbfgs_history_fill(iteration)[source]

Record L-BFGS history buffer fill event.

Called when the L-BFGS history buffer becomes fully populated, signaling transition from cold start to full L-BFGS mode.

Parameters:

iteration (int) – Iteration number when history buffer filled

record_lbfgs_line_search_failure(iteration, reason='')[source]

Record L-BFGS line search failure event.

Called when the L-BFGS line search fails to find an acceptable step.

Parameters:
  • iteration (int) – Iteration number when line search failed

  • reason (str, optional) – Reason for line search failure

get_trigger_rates()[source]

Get trigger rates as percentage of total warmup calls.

Returns:

Trigger rates for each layer as percentages (0-100)

Return type:

dict[str, float]

get_summary()[source]

Get summary statistics for all defense layers.

Returns:

Summary with counts and rates for each layer

Return type:

dict

get_recent_events(n=10)[source]

Get most recent N events.

Parameters:

n (int) – Number of recent events to return

Returns:

Most recent events

Return type:

list[dict]

export_metrics()[source]

Export metrics in a format suitable for monitoring systems.

Returns:

Metrics with consistent naming for Prometheus/Grafana/etc.

Return type:

dict

nlsq.streaming.telemetry.get_defense_telemetry()[source]

Get global defense layer telemetry instance.

Returns:

Global telemetry instance (created on first call)

Return type:

DefenseLayerTelemetry

nlsq.streaming.telemetry.reset_defense_telemetry()[source]

Reset global defense layer telemetry.

Usage Example

from nlsq.streaming.telemetry import DefenseLayerTelemetry

# Create telemetry instance
telemetry = DefenseLayerTelemetry()

# Record layer activations during optimization
telemetry.record_layer1_activation(cost_reduction=0.05)
telemetry.record_layer3_rejection(cost_increase=0.02)

# Get summary report
report = telemetry.get_summary()
print(f"Layer 1 activations: {report['layer1_count']}")
print(f"Layer 3 rejections: {report['layer3_count']}")

See Also