cozymori
cozymori

Simpler, Easier, For Developers. Open-source frameworks for AI observability.

Products

  • VectorWave
  • VectorSurfer

Resources

  • Documentation
  • GitHub

© 2026 cozymori. All rights reserved.

Built with simplicity.

Overview

Getting Started

  • Introduction
  • Quick Start

VectorWave

  • VectorWave Overview
  • Installation
  • @vectorize Core
  • Semantic Caching
  • Self-Healing
  • Golden Dataset
  • Drift Detection
  • Replay Testing
  • RAG Search
  • Advanced Configuration
  • API Reference

VectorSurfer

  • VectorSurfer Overview
  • Getting Started
  • Usage Guide

Ecosystem

  • VectorCheck
  • VectorSurferSTL
  • Contributing

VectorWave

Golden Dataset

Curate verified executions for cache priority, drift detection, and regression testing.

What is the Golden Dataset?

The Golden Dataset is a curated collection of verified, high-quality function executions. When you promote an execution to "Golden" status, it becomes a trusted reference used across multiple VectorWave features.

initialize_database() creates the VectorWaveGoldenDataset collection automatically.

Why Use Golden Data?

Golden data is used in three key areas:

1. Semantic Cache Priority

When semantic caching is enabled, VectorWave searches the Golden Dataset first before checking standard execution logs. This ensures:

  • Deterministic results for known input patterns
  • Higher quality cached responses (manually verified)
  • Consistent behavior across deployments

See Semantic Caching for details on the 2-tier cache lookup.

2. Drift Detection Baseline

The Drift Radar uses a KNN-based approach — when a new input arrives, it's compared against the K-nearest neighbors from past successful executions. The average distance to these neighbors determines whether drift has occurred.

More historical data = more reliable drift detection = fewer false positives.

3. Replay Testing Baseline

Replay testing uses Golden entries as expected outputs. When you run VectorWaveReplayer, Golden data is tested first (priority), then standard execution logs.

Managing Golden Data

Promoting Executions

from vectorwave import VectorWaveDatasetManager

dm = VectorWaveDatasetManager()

# Promote a verified execution to Golden status
dm.register_as_golden(
    log_uuid="abc-123",
    note="Verified by QA team",
    tags=["v2", "production"],
)

The original execution's vector and properties are copied to the Golden collection.

Auto-Recommending Candidates

VectorWave analyzes the vector distribution of your existing Golden data and recommends new candidates:

candidates = dm.recommend_candidates(
    function_name="generate_response",
    limit=5,
)

for c in candidates:
    print(f"UUID: {c['uuid']}, Type: {c['type']}, Distance: {c['distance']:.3f}")

Candidates are classified as:

TypeDescription
STEADYClose to existing Golden centroid — reinforces known patterns
DISCOVERYFarther from centroid — represents new valid patterns

VectorSurfer: Golden Dataset management is built into the VectorSurfer dashboard — browse, promote, and manage golden entries visually with AI-recommended candidates.

Next Steps

  • Semantic Caching — How Golden data gets cache priority
  • Drift Detection — Golden data as drift baseline
  • Replay Testing — Golden data as test baseline