原理 - NebulaBunny

原理

Principles

I built a verifiable backtesting and shadow-matching engine designed to reproduce identical results under the same specification and data.
The purpose is not to show “how much it earns,” but to demonstrate verifiable consistency between strategy and live trading.

Reproducibility: identical spec + identical data → result consistency.

Chain of evidence: each release provides a FREEZE package and fingerprints — spec_hash_short, freeze_sha256_short, code_git_short.

Shadow matching: compares live fills with simulated fills to produce drift metrics (Drift bps, p50/p90) and funding-fee deviation.

Currently covers the cryptocurrency market, with pricing and funding rules consistent with exchange-level specifications.

Read-only delay: all data displayed with delay ≥15 m; no trading commands exposed.

I also built an autonomous strategy evolution engine. It performs deterministic evolution under locked specifications, with hybrid evolutionary schemes, drift-aware optimization, and shadow-based promotion. Anti-overfitting mechanisms include frozen datasets, dual-run consistency checks, complexity penalties, and out-of-sample validation. The evolution engine now forms a small closed loop with the backtester — but it’s not public yet, haha.

Reproducible · Auditable · Deterministic · Verified

Q&A (Selected)

Excerpts from discussions on “Reproducibility & FREEZE Packages,” “Shadow Matching & Drift,” “System & Data,” and “Advanced Challenges.”

Group 1 – Reproducibility & FREEZE Packages (Core Commitment)

Q1 : After obtaining the spec_hash and FREEZE package, how can I reproduce a byte-identical report locally? Does it require connecting to your server? What’s inside the FREEZE package?

chipThe FREEZE package is an offline experimental snapshot that contains:

Complete results（result.json、report.json、audit.json）；
The exact CONTRACT and engine_spec.yaml used;
Five fingerprints (code_git_hash, data_version, spec_hash, random_seed, env_fingerprint);
Data slice checksums (.sha256), logs, and configs.

Reproduction steps: unzip → verify dependencies via manifest.json → run python3 -m crypt.runners.replay_freeze manifest.json (offline) → sha256sum -c manifest.sha256 to validate consistency. It’s not a Docker image, but the metadata and hashes are sufficient for full determinism.

Q2 : What exactly composes spec_hash? Does it include code, parameters, data versions, fee/slippage models, rebalancer, and random seed?

Aspec_hash = SHA1(engine_spec.yaml + CONTRACT.json + fee_model.py + slippage_model.py + rebalancer.py + seed)。

Any modification of those components generates a new spec_hash.

Q3 : How is determinism ensured when randomness is involved? Can it be 100% identical across platforms?

AWe fix random_seed across numpy, random, and torch; record evolutionary paths; and log env_fingerprint. As long as Python and dependencies match, results are hash-identical across Linux and Windows.

Group 2 – Shadow Matching & Drift Metrics (Live Verification)

Q4 : How does shadow matching work? Does it use tick-level order books?

AWe compare live execution prices with simulated ones; the difference is execution_drift. Default granularity is 4h candles, Level-2 supported when exchanges permit.

Q5 : How is “Median Drift × bps” calculated? Are p90/p95/p99 visible? Will extreme markets amplify it?

A bps = basis points relative to traded value: drift_bps = (fill_shadow − fill_real)/fill_real × 10⁴. Full distribution available; extreme periods widen tails and are flagged in reports.

Q6 : How is the funding-fee alignment error computed?

A Historical funding replay; funding_align_bps = (funding_shadow − funding_real)/funding_real × 10⁴. No forecasting involved.

Group 3 – System Architecture & Data (Infrastructure)

Q7 : Is it an offline engine or a centralized service? How is it delivered?

ASelf-deployable: Docker image or FREEZE + Runner toolkit. Runs entirely in-house, no API dependency.

Q8 : What’s the origin and cleaning process of historical data?

ADirectly from exchanges and the Binance Data Portal. Cleaning includes 5σ filtering, UTC normalization, gap filling, and cross-exchange reconciliation; daily data_audit_stub ensures integrity.

Group 4 – Advanced Challenges (External Dependencies & Statistical Validity)

Q9 : How do strategies using external data lock dependencies for reproducibility?

A By storing source digests and cached mirrors; when a source expires, playback uses the cached copy. Strategies with dynamic APIs are excluded from “deterministic backtesting.”

Q10 : What does “mathematically proven” mean in your context?

A It’s not a formal proof but the provability of deterministic consistency and statistical significance tests (KS / t-test).

Industry Significance (Excerpt)

Research Gold Standard： Combining spec_hash + FREEZE achieves byte-level reproducibility, ending “alchemy-style” black boxes.
Audit & Compliance Automation: Five fingerprints form a verifiable evidence chain, reducing trust cost dramatically.
Value Shift: From “mystical alpha” to “verifiable robustness.”
Potential Risk: Convergent frameworks may cause systemic fragility — requires decentralization and monitoring.

Once it was alchemy; now it’s chemistry — standard process, clean reagents, repeatable experiments — genuine trust.

← Back to Home