cozymori
cozymori

Simpler, Easier, For Developers. Open-source frameworks for AI observability.

Products

  • VectorWave
  • VectorSurfer

Resources

  • Documentation
  • GitHub

© 2026 cozymori. All rights reserved.

Built with simplicity.

Overview

Getting Started

  • Introduction
  • Quick Start

VectorWave

  • VectorWave Overview
  • Installation
  • @vectorize Core
  • Semantic Caching
  • Self-Healing
  • Golden Dataset
  • Drift Detection
  • Replay Testing
  • RAG Search
  • Advanced Configuration
  • API Reference

VectorSurfer

  • VectorSurfer Overview
  • Getting Started
  • Usage Guide

Ecosystem

  • VectorCheck
  • VectorSurferSTL
  • Contributing

Ecosystem

VectorCheck

CLI regression testing framework for AI/LLM applications.

What is VectorCheck?

VectorCheck is a CLI regression testing framework designed for AI/LLM applications. Traditional assert a == b fails for generative AI — the same prompt can produce different valid outputs. VectorCheck solves this with vector similarity and LLM judge evaluation.

pip install vectorcheck

Why VectorCheck?

ApproachHow It WorksLimitation
assert a == bExact string matchFails for AI outputs — same meaning, different words
VectorCheck ExactCharacter-by-characterFor deterministic functions only
VectorCheck SemanticEmbedding cosine similarityHandles paraphrasing and variation
VectorCheck LLM JudgeGPT-4 evaluates equivalenceMost flexible, handles complex outputs

CLI Commands

Run Tests

# Test all tracked functions
vw test --target all

# Test a specific function
vw test --target app.generate_response

# Semantic comparison mode
vw test --target all --semantic --threshold 0.85

# LLM judge mode
vw test --target all --judge --model gpt-4-turbo

Export Data

# Export execution logs to JSONL
vw export --target app.generate_response --output data.jsonl

# Export with filters
vw export --target all --status success --output successes.jsonl

# Export Golden Dataset only
vw export --target all --golden --output golden.jsonl

Inspect Functions

# List all tracked functions
vw list

# Show details for a specific function
vw inspect app.generate_response

# Show recent executions
vw history app.generate_response --limit 20

Testing Modes

Exact Match

Compares outputs character-by-character. Best for deterministic functions.

vw test --target app.calculate_total --exact

Pass criteria: outputs must be identical.

Semantic Comparison

Compares outputs using embedding cosine similarity. Best for AI/NLP outputs.

vw test --target app.generate_response --semantic --threshold 0.85
ThresholdStrictnessUse Case
0.95Very strictFactual Q&A, summaries
0.85RecommendedGeneral LLM outputs
0.75LenientCreative writing, open-ended

LLM Judge

Uses GPT-4 to evaluate whether two outputs are semantically equivalent.

vw test --target app.generate_response --judge --model gpt-4-turbo

The LLM judge considers:

  • Semantic meaning
  • Factual accuracy
  • Completeness
  • Tone and style (configurable)

CI/CD Integration

GitHub Actions

name: AI Regression Tests
on: [push, pull_request]

jobs:
  test:
    runs-on: ubuntu-latest
    services:
      weaviate:
        image: semitechnologies/weaviate:1.26.1
        ports:
          - 8080:8080
    steps:
      - uses: actions/checkout@v4
      - uses: actions/setup-python@v5
        with:
          python-version: '3.12'
      - run: pip install vectorwave vectorcheck
      - run: vw test --target all --semantic --threshold 0.85

Exit Codes

CodeMeaning
0All tests passed
1One or more tests failed
2Configuration error

Configuration File

Create vectorcheck.yml in your project root for persistent configuration:

# vectorcheck.yml
default_mode: semantic
default_threshold: 0.85
targets:
  - name: app.generate_response
    mode: semantic
    threshold: 0.90
  - name: app.calculate_total
    mode: exact
  - name: app.creative_writer
    mode: judge
    model: gpt-4-turbo

Then simply run:

vw test  # Uses configuration from vectorcheck.yml

Relationship to VectorWave

VectorCheck reads from the same Weaviate instance as VectorWave:

Your App + @vectorize → Weaviate ← VectorCheck CLI

The @vectorize(replay=True) decorator stores inputs and outputs that VectorCheck uses as test cases. The Golden Dataset provides verified baselines.

Next Steps

  • VectorWave Replay Testing — Programmatic replay API
  • VectorSurfer Replay UI — Visual testing interface
  • Contributing — How to contribute