Drift Detection

What is Semantic Drift?

Over time, the inputs your AI functions receive may change — new topics, different phrasing patterns, or entirely new use cases. Semantic Drift occurs when new inputs move far from your validated "Golden Dataset."

This matters because:

Cached results may become irrelevant
Model performance may degrade on unfamiliar inputs
Errors may increase for edge cases you haven't tested

VectorWave's Drift Radar monitors this in real-time.

How Drift Detection Works

New Input → Embedding → KNN Distance Check
                              │
                    ┌─────────┴──────────┐
                    │                     │
           Avg Distance < 0.25    Avg Distance ≥ 0.25
              (Within bounds)       (DRIFT DETECTED)
                    │                     │
                    ▼                     ▼
              Normal execution      Webhook Alert
                                   (Discord/Slack)

When a new input arrives, VectorWave queries the K-nearest neighbors from successful past executions of the function
It calculates the average distance to these K neighbors (default K=5, configurable via DRIFT_NEIGHBOR_AMOUNT)
If the average distance exceeds the threshold (default: 0.25, configurable via DRIFT_DISTANCE_THRESHOLD), a drift alert is triggered

This KNN-based approach is more robust than a single centroid comparison, as it accounts for the local density of the vector space.

Configuration

Enable drift detection via environment variables:

# .env
DRIFT_DETECTION_ENABLED=True
DRIFT_DISTANCE_THRESHOLD=0.25    # Distance threshold (default: 0.25)
DRIFT_NEIGHBOR_AMOUNT=5          # Number of neighbors for KNN check (default: 5)

Webhook Alerts

When drift is detected, VectorWave sends a real-time alert via webhook. Configure alerts through environment variables:

# .env
ALERTER_STRATEGY=webhook
ALERTER_WEBHOOK_URL=https://discord.com/api/webhooks/...
ALERTER_MIN_LEVEL=ERROR

The alert payload includes:

Function name and module path
Error code (if applicable)
Trace ID for debugging
Captured attributes (inputs, tags)
Full stack trace (for error-triggered drifts)
Average distance to K-nearest neighbors

Alert Payload

{
  "type": "drift_detected",
  "function_name": "generate_response",
  "module_path": "app.ai.generator",
  "distance": 0.34,
  "threshold": 0.25,
  "input_preview": "How do I mine bitcoin...",
  "trace_id": "abc-123",
  "timestamp": "2025-01-15T10:30:00Z"
}

Threshold Tuning

Threshold	Sensitivity	Use Case
`0.15`	Very sensitive	Safety-critical applications
`0.25`	Default	General production monitoring
`0.35`	Lenient	Exploratory / creative applications
`0.50`	Very lenient	Wide-scope functions

The threshold applies globally via the DRIFT_DISTANCE_THRESHOLD environment variable.

Simulating Drift

Test whether an input would trigger a drift alert without actually executing the function:

from vectorwave import simulate_drift_check

result = simulate_drift_check(
    text="How do I mine bitcoin with my GPU?",
    function_name="generate_response",
    threshold=0.25,   # Optional, defaults to DRIFT_DISTANCE_THRESHOLD
    k=5,              # Optional, defaults to DRIFT_NEIGHBOR_AMOUNT
)

print(result)
# { "is_drift": True, "distance": 0.38, "threshold": 0.25, "neighbors": 5 }

This is useful for:

Testing threshold sensitivity before going to production
Pre-validating user inputs in a safety layer
Building custom drift dashboards

VectorSurfer: Drift monitoring is visualized in the VectorSurfer dashboard — real-time drift distance charts, alert history, and drift visualization per function.

Next Steps

Golden Dataset — Manage the baseline for drift detection
Replay Testing — Test against Golden Dataset
Advanced Configuration — Webhooks, tagging, archiving