Advanced Configuration

Custom Properties (.weaviate_properties)

Extend VectorWave's default schema with custom properties by creating a .weaviate_properties JSON file in your project root:

{
  "team": {
    "data_type": "TEXT",
    "description": "Team name",
    "tokenization": "word"
  },
  "priority": {
    "data_type": "INT",
    "description": "Priority level"
  },
  "region": {
    "data_type": "TEXT",
    "description": "Geographic region",
    "tokenization": "field"
  }
}

These properties are added to VectorWaveFunctions, VectorWaveExecutions, and VectorWaveGoldenDataset collections. Use them as **execution_tags in the decorator:

@vectorize(auto=True, team="analytics", priority=1, region="US-WEST")  # custom tags
def process_order(order_id: str, amount: float):
    return {"status": "processed", "order_id": order_id}

After adding new properties, run update_database_schema() to apply them to existing collections.

Tokenization Options

Tokenization	Behavior	Use Case
`"word"`	Splits on whitespace	Natural text fields
`"field"`	Treats entire value as one token	IDs, codes, exact matches
`"lowercase"`	Word-split + lowercase	Case-insensitive text

Tip: Use "field" tokenization for ID fields like order_id, user_id, etc. This ensures exact-match filtering works correctly.

Dynamic Execution Tagging

Add metadata tags to executions for filtering and monitoring.

Global Tags (Environment Variables)

Set tags that apply to all functions:

# .env
VECTORWAVE_TAGS_ENVIRONMENT=production
VECTORWAVE_TAGS_REGION=us-east-1
VECTORWAVE_TAGS_VERSION=v2.3.0

Function-Level Tags (Decorator)

@vectorize(
    auto=True,
    team="payments",
    tags={"priority": "high", "sla": "99.9%"},
)
def process_payment(amount: float):
    ...

Tag Merge Rules

When both global and function-level tags exist, function-level tags take priority:

Global:   { environment: "production", region: "us-east-1" }
Function: { environment: "staging", priority: "high" }
Result:   { environment: "staging", region: "us-east-1", priority: "high" }

Real-Time Webhook Alerting

Configure webhooks for real-time notifications on errors and drift via environment variables:

# .env
ALERTER_STRATEGY=webhook
ALERTER_WEBHOOK_URL=https://discord.com/api/webhooks/...
ALERTER_MIN_LEVEL=ERROR

Alert payloads include:

Error code and function name
Trace ID (clickable link to VectorSurfer)
Captured attributes and tags
Full stack trace
Drift distance (for drift events)

The webhook format is automatically determined by the URL (Discord, Slack, or generic JSON).

Auto-Injection

Inject VectorWave into existing modules without modifying source code using class methods:

from vectorwave import VectorWaveAutoInjector

# Set default configuration for all inject calls
VectorWaveAutoInjector.configure(
    auto=True,
    capture_return_value=True,
    team="ai-team",
)

# Inject @vectorize into all functions in a module
VectorWaveAutoInjector.inject(
    target_module_path="app.services.ai",
)

# Now all functions in app.services.ai are vectorized
# without changing a single line of their source code

Recursive Injection

# Inject into a module and all its submodules
VectorWaveAutoInjector.inject(
    target_module_path="app.services",
    recursive=True,
    auto=True,  # Override config per inject call
)

Note: Auto-injection works at import time. Call VectorWaveAutoInjector.inject() before importing the target module.

Schema Migration

When upgrading VectorWave versions, the Weaviate schema may change:

from vectorwave import update_database_schema

# Zero-downtime migration — existing data is preserved
update_database_schema()

This:

Detects schema differences between current and required
Adds new properties without dropping existing ones
Migrates data if needed
Preserves all existing execution logs and Golden Dataset entries

Sensitive Data Masking

VectorWave automatically masks sensitive fields in captured inputs and outputs. By default, fields named password, api_key, token, secret, and auth_token are replaced with ***MASKED***.

Customize the list via environment variable:

# .env
SENSITIVE_FIELD_NAMES=password,api_key,token,secret,auth_token,ssn,credit_card

Masking applies to all execution logs — both @vectorize and @trace_span.

Async Logging

For latency-sensitive applications, enable async database logging to avoid blocking function execution:

# .env
ASYNC_LOGGING=True

When enabled, execution logs are queued and written to Weaviate in a background thread. The function returns immediately without waiting for the DB write.

Note: Async logging may lose data if the process crashes before the queue is flushed. Use force_sync=True on critical @trace_span calls to override.

Batch Performance Tuning

VectorWave batches writes to Weaviate for efficiency. Tune the batch behavior with:

# .env
BATCH_THRESHOLD=20          # Flush after this many objects (default: 20)
FLUSH_INTERVAL_SECONDS=2.0  # Flush at least every N seconds (default: 2.0)

Higher BATCH_THRESHOLD: Fewer write calls but higher memory usage and latency
Lower BATCH_THRESHOLD: More frequent writes with lower latency

Token Usage Tracking

VectorWave tracks LLM token usage in the VectorWaveTokenUsage collection. View aggregate stats:

from vectorwave import get_token_usage_stats

stats = get_token_usage_stats()
# { "total_tokens": 125000, "by_category": { "embedding": 80000, "completion": 45000 } }

VectorSurfer: Token usage is visualized as a donut chart widget in the VectorSurfer dashboard.

Data Archiving (VectorWaveArchiver)

For large-scale deployments, archive old execution data to keep Weaviate performant. The archiver is an advanced import (not part of the public API):

from vectorwave.database.archiver import VectorWaveArchiver

archiver = VectorWaveArchiver()

# Export old data to a snapshot file and remove from Weaviate
archiver.export_and_clear(
    older_than_days=30,
    mode="archive",   # "snapshot" (export only) | "archive" (export + delete) | "purge" (delete only)
    output_path="./archives/",
)

Function Cache

VectorWave caches registered function metadata locally in .vectorwave_functions_cache.json. This avoids redundant Weaviate queries on startup when the function code hasn't changed.

The cache is automatically invalidated when:

Function source code changes
.weaviate_properties is modified
update_database_schema() is called

Delete the cache file manually to force re-registration.

Rust Core (Optional)

VectorWave includes an optional Rust-accelerated core via PyO3 for performance-critical operations:

Batch Manager — High-throughput batch writing to Weaviate
Data Masking — Fast field-level masking for large payloads

If the Rust extension is not available (e.g., unsupported platform), VectorWave automatically falls back to pure Python implementations. No configuration needed.

Tracer Performance Optimization

VectorWave uses inspect.signature internally for function introspection. For high-throughput applications, this is cached via LRU Cache to avoid repeated reflection overhead.

No configuration needed — this optimization is automatic.

Next Steps

API Reference — Complete parameter reference
VectorSurfer Dashboard — Visual monitoring
Contributing — How to contribute to VectorWave