VectorWave
Advanced Configuration
Custom properties, dynamic tagging, webhooks, auto-injection, and data archiving.
Custom Properties (.weaviate_properties)
Extend VectorWave's default schema with custom properties by creating a .weaviate_properties JSON file in your project root:
{
"team": {
"data_type": "TEXT",
"description": "Team name",
"tokenization": "word"
},
"priority": {
"data_type": "INT",
"description": "Priority level"
},
"region": {
"data_type": "TEXT",
"description": "Geographic region",
"tokenization": "field"
}
}
These properties are added to VectorWaveFunctions, VectorWaveExecutions, and VectorWaveGoldenDataset collections. Use them as **execution_tags in the decorator:
@vectorize(auto=True, team="analytics", priority=1, region="US-WEST") # custom tags
def process_order(order_id: str, amount: float):
return {"status": "processed", "order_id": order_id}
After adding new properties, run update_database_schema() to apply them to existing collections.
Tokenization Options
| Tokenization | Behavior | Use Case |
|---|---|---|
"word" | Splits on whitespace | Natural text fields |
"field" | Treats entire value as one token | IDs, codes, exact matches |
"lowercase" | Word-split + lowercase | Case-insensitive text |
Tip: Use
"field"tokenization for ID fields likeorder_id,user_id, etc. This ensures exact-match filtering works correctly.
Dynamic Execution Tagging
Add metadata tags to executions for filtering and monitoring.
Global Tags (Environment Variables)
Set tags that apply to all functions:
# .env
VECTORWAVE_TAGS_ENVIRONMENT=production
VECTORWAVE_TAGS_REGION=us-east-1
VECTORWAVE_TAGS_VERSION=v2.3.0
Function-Level Tags (Decorator)
@vectorize(
auto=True,
team="payments",
tags={"priority": "high", "sla": "99.9%"},
)
def process_payment(amount: float):
...
Tag Merge Rules
When both global and function-level tags exist, function-level tags take priority:
Global: { environment: "production", region: "us-east-1" }
Function: { environment: "staging", priority: "high" }
Result: { environment: "staging", region: "us-east-1", priority: "high" }
Real-Time Webhook Alerting
Configure webhooks for real-time notifications on errors and drift via environment variables:
# .env
ALERTER_STRATEGY=webhook
ALERTER_WEBHOOK_URL=https://discord.com/api/webhooks/...
ALERTER_MIN_LEVEL=ERROR
Alert payloads include:
- Error code and function name
- Trace ID (clickable link to VectorSurfer)
- Captured attributes and tags
- Full stack trace
- Drift distance (for drift events)
The webhook format is automatically determined by the URL (Discord, Slack, or generic JSON).
Auto-Injection
Inject VectorWave into existing modules without modifying source code using class methods:
from vectorwave import VectorWaveAutoInjector
# Set default configuration for all inject calls
VectorWaveAutoInjector.configure(
auto=True,
capture_return_value=True,
team="ai-team",
)
# Inject @vectorize into all functions in a module
VectorWaveAutoInjector.inject(
target_module_path="app.services.ai",
)
# Now all functions in app.services.ai are vectorized
# without changing a single line of their source code
Recursive Injection
# Inject into a module and all its submodules
VectorWaveAutoInjector.inject(
target_module_path="app.services",
recursive=True,
auto=True, # Override config per inject call
)
Note: Auto-injection works at import time. Call
VectorWaveAutoInjector.inject()before importing the target module.
Schema Migration
When upgrading VectorWave versions, the Weaviate schema may change:
from vectorwave import update_database_schema
# Zero-downtime migration — existing data is preserved
update_database_schema()
This:
- Detects schema differences between current and required
- Adds new properties without dropping existing ones
- Migrates data if needed
- Preserves all existing execution logs and Golden Dataset entries
Sensitive Data Masking
VectorWave automatically masks sensitive fields in captured inputs and outputs. By default, fields named password, api_key, token, secret, and auth_token are replaced with ***MASKED***.
Customize the list via environment variable:
# .env
SENSITIVE_FIELD_NAMES=password,api_key,token,secret,auth_token,ssn,credit_card
Masking applies to all execution logs — both @vectorize and @trace_span.
Async Logging
For latency-sensitive applications, enable async database logging to avoid blocking function execution:
# .env
ASYNC_LOGGING=True
When enabled, execution logs are queued and written to Weaviate in a background thread. The function returns immediately without waiting for the DB write.
Note: Async logging may lose data if the process crashes before the queue is flushed. Use
force_sync=Trueon critical@trace_spancalls to override.
Batch Performance Tuning
VectorWave batches writes to Weaviate for efficiency. Tune the batch behavior with:
# .env
BATCH_THRESHOLD=20 # Flush after this many objects (default: 20)
FLUSH_INTERVAL_SECONDS=2.0 # Flush at least every N seconds (default: 2.0)
- Higher
BATCH_THRESHOLD: Fewer write calls but higher memory usage and latency - Lower
BATCH_THRESHOLD: More frequent writes with lower latency
Token Usage Tracking
VectorWave tracks LLM token usage in the VectorWaveTokenUsage collection. View aggregate stats:
from vectorwave import get_token_usage_stats
stats = get_token_usage_stats()
# { "total_tokens": 125000, "by_category": { "embedding": 80000, "completion": 45000 } }
VectorSurfer: Token usage is visualized as a donut chart widget in the VectorSurfer dashboard.
Data Archiving (VectorWaveArchiver)
For large-scale deployments, archive old execution data to keep Weaviate performant. The archiver is an advanced import (not part of the public API):
from vectorwave.database.archiver import VectorWaveArchiver
archiver = VectorWaveArchiver()
# Export old data to a snapshot file and remove from Weaviate
archiver.export_and_clear(
older_than_days=30,
mode="archive", # "snapshot" (export only) | "archive" (export + delete) | "purge" (delete only)
output_path="./archives/",
)
Function Cache
VectorWave caches registered function metadata locally in .vectorwave_functions_cache.json. This avoids redundant Weaviate queries on startup when the function code hasn't changed.
The cache is automatically invalidated when:
- Function source code changes
.weaviate_propertiesis modifiedupdate_database_schema()is called
Delete the cache file manually to force re-registration.
Rust Core (Optional)
VectorWave includes an optional Rust-accelerated core via PyO3 for performance-critical operations:
- Batch Manager — High-throughput batch writing to Weaviate
- Data Masking — Fast field-level masking for large payloads
If the Rust extension is not available (e.g., unsupported platform), VectorWave automatically falls back to pure Python implementations. No configuration needed.
Tracer Performance Optimization
VectorWave uses inspect.signature internally for function introspection. For high-throughput applications, this is cached via LRU Cache to avoid repeated reflection overhead.
No configuration needed — this optimization is automatic.
Next Steps
- API Reference — Complete parameter reference
- VectorSurfer Dashboard — Visual monitoring
- Contributing — How to contribute to VectorWave