VectorWave
Golden Dataset
Curate verified executions for cache priority, drift detection, and regression testing.
What is the Golden Dataset?
The Golden Dataset is a curated collection of verified, high-quality function executions. When you promote an execution to "Golden" status, it becomes a trusted reference used across multiple VectorWave features.
initialize_database() creates the VectorWaveGoldenDataset collection automatically.
Why Use Golden Data?
Golden data is used in three key areas:
1. Semantic Cache Priority
When semantic caching is enabled, VectorWave searches the Golden Dataset first before checking standard execution logs. This ensures:
- Deterministic results for known input patterns
- Higher quality cached responses (manually verified)
- Consistent behavior across deployments
See Semantic Caching for details on the 2-tier cache lookup.
2. Drift Detection Baseline
The Drift Radar uses a KNN-based approach — when a new input arrives, it's compared against the K-nearest neighbors from past successful executions. The average distance to these neighbors determines whether drift has occurred.
More historical data = more reliable drift detection = fewer false positives.
3. Replay Testing Baseline
Replay testing uses Golden entries as expected outputs. When you run VectorWaveReplayer, Golden data is tested first (priority), then standard execution logs.
Managing Golden Data
Promoting Executions
from vectorwave import VectorWaveDatasetManager
dm = VectorWaveDatasetManager()
# Promote a verified execution to Golden status
dm.register_as_golden(
log_uuid="abc-123",
note="Verified by QA team",
tags=["v2", "production"],
)
The original execution's vector and properties are copied to the Golden collection.
Auto-Recommending Candidates
VectorWave analyzes the vector distribution of your existing Golden data and recommends new candidates:
candidates = dm.recommend_candidates(
function_name="generate_response",
limit=5,
)
for c in candidates:
print(f"UUID: {c['uuid']}, Type: {c['type']}, Distance: {c['distance']:.3f}")
Candidates are classified as:
| Type | Description |
|---|---|
| STEADY | Close to existing Golden centroid — reinforces known patterns |
| DISCOVERY | Farther from centroid — represents new valid patterns |
VectorSurfer: Golden Dataset management is built into the VectorSurfer dashboard — browse, promote, and manage golden entries visually with AI-recommended candidates.
Next Steps
- Semantic Caching — How Golden data gets cache priority
- Drift Detection — Golden data as drift baseline
- Replay Testing — Golden data as test baseline