原理

Principles

  • I built a verifiable backtesting and shadow-matching engine designed to reproduce identical results under the same specification and data.
  • The purpose is not to show “how much it earns,” but to demonstrate verifiable consistency between strategy and live trading.
  • Reproducibility: identical spec + identical data → byte-level identical output.
  • Chain of evidence: each release provides a FREEZE package and fingerprints — spec_hash_short, freeze_sha256_short, code_git_short.
  • Shadow matching: compares live fills with simulated fills to produce drift metrics (Drift bps, p50/p90) and funding-fee deviation.
  • Currently covers the cryptocurrency market, with pricing and funding rules consistent with exchange-level specifications.
  • Read-only delay: all data displayed with delay ≥15 m; no trading commands exposed.
  • I also built an autonomous strategy evolution engine. It performs deterministic evolution under locked specifications, with hybrid evolutionary schemes, drift-aware optimization, and shadow-based promotion. Anti-overfitting mechanisms include frozen datasets, dual-run consistency checks, complexity penalties, and out-of-sample validation. The evolution engine now forms a small closed loop with the backtester — but it’s not public yet, haha.

Reproducible · Auditable · Deterministic · Verified

Q&A (Selected)

Excerpts from discussions on “Reproducibility & FREEZE Packages,” “Shadow Matching & Drift,” “System & Data,” and “Advanced Challenges.”

Group 1 – Reproducibility & FREEZE Packages (Core Commitment)
Q1 : After obtaining the spec_hash and FREEZE package, how can I reproduce a byte-identical report locally? Does it require connecting to your server? What’s inside the FREEZE package?

chipThe FREEZE package is an offline experimental snapshot that contains:

  • Complete results(result.jsonreport.jsonaudit.json);
  • The exact CONTRACT and engine_spec.yaml used;
  • Five fingerprints (code_git_hash, data_version, spec_hash, random_seed, env_fingerprint);
  • Data slice checksums (.sha256), logs, and configs.

Reproduction steps: unzip → verify dependencies via manifest.json → run python3 -m crypt.runners.replay_freeze manifest.json (offline) → sha256sum -c manifest.sha256 to validate consistency. It’s not a Docker image, but the metadata and hashes are sufficient for full determinism.

Q2 : What exactly composes spec_hash? Does it include code, parameters, data versions, fee/slippage models, rebalancer, and random seed?

Aspec_hash = SHA1(engine_spec.yaml + CONTRACT.json + fee_model.py + slippage_model.py + rebalancer.py + seed)

Any modification of those components generates a new spec_hash.

Q3 : How is determinism ensured when randomness is involved? Can it be 100% identical across platforms?

AWe fix random_seed across numpy, random, and torch; record evolutionary paths; and log env_fingerprint. As long as Python and dependencies match, results are hash-identical across Linux and Windows.

Group 2 – Shadow Matching & Drift Metrics (Live Verification)
Q4 : How does shadow matching work? Does it use tick-level order books?

AWe compare live execution prices with simulated ones; the difference is execution_drift. Default granularity is 4h candles, Level-2 supported when exchanges permit.

Q5 : How is “Median Drift × bps” calculated? Are p90/p95/p99 visible? Will extreme markets amplify it?

A bps = basis points relative to traded value: drift_bps = (fill_shadow − fill_real)/fill_real × 10⁴. Full distribution available; extreme periods widen tails and are flagged in reports.

Q6 : How is the funding-fee alignment error computed?

A Historical funding replay; funding_align_bps = (funding_shadow − funding_real)/funding_real × 10⁴. No forecasting involved.

Group 3 – System Architecture & Data (Infrastructure)
Q7 : Is it an offline engine or a centralized service? How is it delivered?

ASelf-deployable: Docker image or FREEZE + Runner toolkit. Runs entirely in-house, no API dependency.

Q8 : What’s the origin and cleaning process of historical data?

ADirectly from exchanges and the Binance Data Portal. Cleaning includes 5σ filtering, UTC normalization, gap filling, and cross-exchange reconciliation; daily data_audit_stub ensures integrity.

Group 4 – Advanced Challenges (External Dependencies & Statistical Validity)
Q9 : How do strategies using external data lock dependencies for reproducibility?

A By storing source digests and cached mirrors; when a source expires, playback uses the cached copy. Strategies with dynamic APIs are excluded from “deterministic backtesting.”

Q10 : What does “mathematically proven” mean in your context?

A It’s not a formal proof but the provability of deterministic consistency and statistical significance tests (KS / t-test).

Industry Significance (Excerpt)
  • Research Gold Standard: Combining spec_hash + FREEZE achieves byte-level reproducibility, ending “alchemy-style” black boxes.
  • Audit & Compliance Automation: Five fingerprints form a verifiable evidence chain, reducing trust cost dramatically.
  • Value Shift: From “mystical alpha” to “verifiable robustness.”
  • Potential Risk: Convergent frameworks may cause systemic fragility — requires decentralization and monitoring.

Once it was alchemy; now it’s chemistry — standard process, clean reagents, repeatable experiments — genuine trust.

← Back to Home