This thesis investigates whether cryptocurrency dynamics contain information about the evolving dependence structure of conventional financial assets. Bitcoin serves as the base crypto asset; the conventional universe spans equity indices, precious-metal ETFs, and the U.S. dollar index. A reproducible walk-forward ML pipeline is developed and benchmarked against a leakage-safe DCC-GARCH(1,1) specification across 240 model evaluations (10 models × 6 pairs × 4 windows).
The central question is whether persistence in rolling correlation — and not nonlinear cross-asset signals — is the dominant source of forecastability. The empirical evidence confirms this, while also showing that all ML models significantly outperform the DCC-GARCH econometric benchmark.
Are time-varying intermarket dependencies between BTC and conventional assets forecastable one step ahead under strict out-of-sample evaluation?
Do ML models (ElasticNet, Ridge, GBM, XGBoost) outperform the DCC-GARCH(1,1) econometric benchmark under leakage-safe walk-forward evaluation?
Which predictors carry the most information — historical dependency persistence, volatility, or cross-asset momentum signals?
Can crypto-derived features predict investor stress days in traditional asset markets as a binary classification task?
Run the thesis models in real time — directly in your browser. Adjust market conditions and watch the AR1 dependency forecast and stress classifier respond instantly. Try the historical scenario presets to see how the signal would have fired during real market events.
Base asset. World's largest cryptocurrency by market cap. Trades 24/7 including weekends. Serves as the primary source of crypto-derived features throughout the thesis.
Crypto reference asset. Included to contrast cross-crypto dependency (BTC↔ETH) with crypto-to-conventional dependency. Start date of sample determined by ETH availability on Yahoo Finance.
Broad U.S. equity benchmark. Represents large-cap risk appetite; most tightly linked to global funding conditions and sentiment — the same macro forces that drive Bitcoin.
Tech-heavy equity index. Higher beta to growth and speculative sentiment than S&P 500, making it a natural comparator for crypto co-movement.
Largest gold ETF. Captures safe-haven demand, inflation expectations, and real interest rate dynamics — structural drivers distinct from crypto.
Silver has both safe-haven and industrial demand components. Higher volatility than gold; provides an additional precious-metal data point.
Proxy for broad USD strength (DXY). Reflects global risk-off positioning and macro defensive flows. Responds to geopolitical risk premia not captured by crypto features.
Period: 9 Nov 2017 — 16 May 2026 · 3,053 trading days per asset. Start date is set by the earliest available ETH-USD quote on Yahoo Finance. Earlier dates contain no Ethereum observations and are excluded.
Alignment: Bitcoin and Ethereum trade 24/7 including weekends. The panel is aligned on exchange-calendar dates shared with equity markets. Weekend and holiday crypto quotes are forward-filled up to two consecutive calendar days, then dropped, so all seven tickers share an identical date index.
Returns: Log-returns rt = log Pt − log Pt−1 are used throughout. Log-returns are additive over time, approximately normal at moderate horizons, and standard in both econometric modelling and ML pipelines for financial time series.
| Ticker | N obs | % Miss. | Skew | Ex.Kurt | Ann.Vol% |
|---|---|---|---|---|---|
| BTC-USD | 3,053 | 0.00 | −0.73 | 13.38 | 56.08 |
| ETH-USD | 3,053 | 0.00 | −0.74 | 10.77 | 72.68 |
| GLD | 3,053 | 0.00 | −0.89 | 13.96 | 13.82 |
| SLV | 3,053 | 0.00 | −2.77 | 51.61 | 28.20 |
| UUP | 3,053 | 0.00 | −0.03 | 8.58 | 5.85 |
| ^GSPC | 3,053 | 0.00 | −0.73 | 22.59 | 16.20 |
| ^IXIC | 3,053 | 0.00 | −0.46 | 12.65 | 19.66 |
Zero missing values across all tickers after alignment. Crypto assets show substantially higher annualised volatility (56–73% vs 6–28% for conventional assets) and fat tails (excess kurtosis >10).
Layer 1 — Dependency Forecasting: Predict the next value of Fisher-z transformed rolling correlation ρ̂t+1 between Bitcoin and each conventional asset. Expanding-window walk-forward validation; min_train = 800 obs; refit every 20 trading days.
Layer 2 — Investor Signal: Use the Layer 1 forecast + crypto features to classify whether tomorrow is a stress day for the conventional asset (|ret| > 0.75σ of negative returns, ~8–12% of days). Three classifiers: Logit, Random Forest, GBM.
Rolling Windows Evaluated
Lowest R². High variance makes one-step-ahead forecasting harder. ML advantage minimal.
Highest DM significance. Optimal signal-to-noise ratio. Best setting for all comparisons.
Smoother series — strong XGBoost advantage. AR(1) competitive but ML adds value.
Very smooth target. AR(1) almost matches XGBoost; persistence nearly accounts for all variance.
| Group | Variables | Rationale |
|---|---|---|
| Dependency lags | dep_lag1/2/5/10 | Most important — target is persistent |
| Return lags | r_base/r_other lag1/2/5 | Recent directional info |
| Rolling volatility | vol_base, vol_other | Local regime conditions |
| Rolling means | mean_base, mean_other | Momentum over short windows |
| Spread features | spread_abs, spread_sign | Divergence between crypto and asset |
| Corr regime gap | corr_diff (short−long ρ) | Signals impending regime shifts |
Lagged correlation (dep_lag1) alone accounts for 42% of feature importance — confirming that persistence is the primary driver. The AR(1) baseline captures ~54% of XGBoost performance using only a single feature.
| Model | Type | Role | Key Parameters |
|---|---|---|---|
| Naive_Last | Baseline | Persistence: ŷt+1 = yt | None |
| AR1 | Baseline | 1-lag autoregression | LinearRegression, 1 feature |
| HAR | Baseline | Daily / weekly / monthly lags | OLS on dep_lag1, avg5, avg22 |
| Ridge | Linear ML | L2 regularisation — best avg RMSE | α=0.5 + StandardScaler |
| ElasticNet | Linear ML | L1+L2 regularisation | α=0.005, l1_ratio=0.5 |
| Ensemble | Adaptive | Inverse-RMSE-weighted blend | 60-step rolling window weights |
| RF | Tree ensemble | Bagging | 200 trees, max_depth=10, min_leaf=5 |
| GBM | Tree ensemble | Sequential boosting (HistGBM) | 300 iters, lr=0.02, l2_reg=0.1 |
| XGB_GPU | Tree ensemble | GPU-accelerated boosting | 500 trees, subsample=0.8, colsample=0.8 |
| DCC_GARCH | Econometric | Leakage-safe benchmark | Walk-forward re-estimated, Engle 2002 |
Key finding: The dependency target is forecastable OOS across all 24 pairs × windows. Best average RMSE belongs to Ridge (0.0656), forming a near-indistinguishable top cluster with AR1 (0.0659) and HAR (0.0659) — a 0.0003 spread within which no model is statistically distinguishable. This convergence confirms that the target has a nearly linear autoregressive structure. All ML models significantly outperform DCC-GARCH (avg RMSE 0.2136 vs 0.0656–0.1241 for ML models).
| Model | Avg RMSE | Avg MAE | Avg R² | Rank | vs DCC-GARCH |
|---|---|---|---|---|---|
| Ridge | 0.0656 | 0.0419 | 0.9432 | #1 Best | −69.3% |
| AR1 | 0.0659 | 0.0403 | 0.9424 | #2 ≈#1 | −69.2% |
| HAR | 0.0659 | 0.0405 | 0.9424 | #3 ≈#1 | −69.2% |
| Naive_Last | 0.0666 | 0.0395 | 0.9412 | #4 | −68.8% |
| ElasticNet | 0.0669 | 0.0434 | 0.9411 | #5 ML | −68.7% |
| Ensemble | 0.0694 | 0.0462 | 0.9365 | #6 ML | −67.5% |
| GBM | 0.0886 | 0.0625 | 0.8983 | #7 ML | −58.5% |
| RF | 0.0905 | 0.0636 | 0.8936 | #8 ML | −57.6% |
| XGB_GPU | 0.1241 | 0.0864 | 0.7923 | #9 ML | −41.9% |
| DCC_GARCH | 0.2136 | 0.1683 | 0.3715 | Benchmark | — |
| Model | MAE | RMSE | R² | vs Naive_Last |
|---|
The Diebold–Mariano test compares out-of-sample loss series between model pairs with Newey-West heteroskedasticity-robust correction. A positive DM statistic means the first model has lower forecast errors (is better). A negative statistic means it is worse. All p-values are at machine-epsilon (p < 10⁻³⁰⁸).
Positive DM → Ridge is better. Negative → Naive wins. Result depends on window: Ridge beats Naive at w=14, ties at medium windows, falls behind only at w=90.
Positive DM → ML model is better than DCC-GARCH. Ridge wins in every single experiment. Margin grows with window length.
Cross-asset DM test (equity vs non-equity): DM = −9.016, p = 0.000. Equity dependency (^GSPC, ^IXIC) is statistically significantly easier to forecast than precious metals or dollar — consistent with the information-gap hypothesis: crypto features are well-specified for equity stress but incomplete for metals and FX.
A second-layer binary classifier predicts stress days on conventional asset markets using Bitcoin dynamics and the Layer 1 dependency forecast. A stress day is a daily loss exceeding 0.75σ of the rolling 20-day negative-return distribution (~8–12% of days). The output is a probability-guided warning — not a directional trade signal.
Design principle: The signal is framed as a risk overlay — a supplementary input to the risk-management process. Its operational role is to raise the probability threshold for accepting risk, prompt reductions in gross exposure, or tighten dynamic risk limits when the classified probability of a stress regime is elevated.
| Window | Classifier | Bal. Acc | F1 (down) | AUC | Exit Rate |
|---|---|---|---|---|---|
| w=14 | Logit | 0.517 | 0.223 | 0.512 | 31.3% |
| w=14 | GBM_Cls | 0.509 | 0.156 | 0.521 | 12.9% |
| w=30 | Logit | 0.531 | 0.243 | 0.532 | 34.6% |
| w=30 | GBM_Cls | 0.473 | 0.093 | 0.493 | 13.3% |
| w=60 | Logit | 0.521 | 0.228 | 0.513 | 31.9% |
| w=90 | Logit | 0.526 | 0.234 | 0.502 | 34.6% |
Best: Logit at w=30 — Balanced Accuracy 0.531, F1down=0.243, AUC=0.532. Uninformative baseline = 0.500. Results are moderate but non-trivial given class imbalance.
| Classifier | Bal. Accuracy | F1 (down) | AUC |
|---|---|---|---|
| Logit | 0.511 | 0.214 | 0.513 |
| GBM_Cls | 0.508 | 0.158 | 0.517 |
| RF_Cls | 0.502 | 0.042 | 0.519 |
Logit achieves the highest F1 (0.214) for the stress class, maintaining sensitivity without collapsing to near-zero exit rate. GBM fires less often but preserves meaningful AUC. RF degrades to near-zero F1 due to probability calibration issues in highly imbalanced settings.
Equity (^GSPC, ^IXIC): Strongest signal. Broad equity markets are tightly linked to global risk appetite and funding liquidity — the same forces that drive Bitcoin. Crypto features capture a meaningful portion of equity stress episodes.
Precious metals (GLD, SLV): Attenuated signal. Gold and silver respond to real interest rate differentials, safe-haven demand, and geopolitical risk — factors not fully represented in the crypto-centred feature set.
Dollar index (UUP): Weakest signal. USD dynamics reflect macro and geopolitical forces that are structurally distinct from crypto-market drivers.
A risk-aware investor monitoring BTC dynamics could use the signal layer to anticipate equity stress 1 day in advance — before stress materialises in conventional markets. For loss functions that assign disproportionate weight to severe drawdowns, even a moderate F1 generates positive expected utility.
How does the Bitcoin–asset correlation behave when markets actually break? These eleven episodes from 2020 to 2026 put the thesis's central variable under stress, one event at a time, and a clear split emerges. When the whole market sells off, Bitcoin and equities fall together and the diversification case disappears; when the shock is crypto-specific, the equity correlation barely reacts. Every figure and statistic below is produced automatically by 08_Market_Events_Showcase.ipynb (Appendix F of the thesis).
| Event | Date | Type | BTC 7d | S&P 7d | ρ pre | ρ post | Δρ |
|---|---|---|---|---|---|---|---|
| COVID “Black Thursday” | 2020-03-12 | Crash | −34.7% | −26.2% | 0.17 | 0.57 | +0.40 |
| Institutional Adoption Wave | 2020-10-08 | Rally | +6.4% | +3.5% | 0.45 | 0.43 | −0.03 |
| China Mining Ban & Musk | 2021-05-19 | Crash | −36.7% | +1.1% | 0.16 | 0.42 | +0.26 |
| Bitcoin ATH $69,000 | 2021-11-10 | ATH | −4.6% | +1.2% | 0.29 | 0.37 | +0.08 |
| Terra/Luna Collapse | 2022-05-09 | Crash | −25.3% | −3.0% | 0.69 | 0.78 | +0.10 |
| FTX Collapse | 2022-11-08 | Crash | −19.4% | +3.0% | 0.55 | 0.46 | −0.09 |
| SVB Banking Crisis | 2023-03-10 | Divergence | +15.5% | −1.6% | 0.37 | 0.20 | −0.17 |
| SEC Approves Spot BTC ETFs | 2024-01-10 | Milestone | −5.1% | −0.1% | −0.04 | 0.19 | +0.22 |
| Bitcoin ATH $73,000 | 2024-03-05 | ATH | +27.1% | +2.1% | 0.17 | 0.12 | −0.05 |
| Trump Election Victory | 2024-11-06 | Rally | +22.0% | +2.6% | 0.42 | 0.70 | +0.28 |
| “Liberation Day” Tariff Shock | 2025-04-02 | Crash | −5.8% | −5.7% | 0.34 | 0.21 | −0.12 |
BTC 7d / S&P 7d = price change over the 7 days from the event. ρ pre / ρ post = 30-day average BTC–S&P 500 correlation before/after the event; Δρ = post − pre. Red Δρ = correlation rose (diversification weakened); green Δρ = decoupling.
Each four-panel figure: normalised price paths · daily log-returns · 30-day & 14-day rolling correlation · pre/post correlation by pair. Click to enlarge; source links open the contemporary news report.
Time-varying BTC–asset dependence really is forecastable one step ahead, out of sample and under strict temporal ordering. The dominant signal is the persistence of the correlation series itself, not cross-asset momentum — a substantive scientific result, not a methodological failure.
Every ML model — even the simple baselines — beats DCC-GARCH(1,1) significantly (DM test, p<10⁻³⁰⁸) across all 24 pair × window combinations. DCC-GARCH lags not because it is a bad model but because it is built for long-horizon covariance, not one-step-ahead forecasting of a smoothed scalar.
Crypto-derived features give a statistically meaningful early warning of stress days in equity markets. Logit averages F1=0.214 and AUC=0.513 across all experiments, and the best configuration (BTC/S&P 500, w=30) reaches Balanced Accuracy 0.531. It is best read as a risk-management overlay, not a standalone trading engine, and it fades for precious metals and FX.
Only Bitcoin used as source asset. Extension to ETH, SOL, and broader crypto baskets needed to generalise findings.
Structurally favours persistence-driven models. ML comparative advantage may be more pronounced at multi-day or multi-week horizons where autoregressive signal decays.
Tail dependence coefficients, copula-based measures, and dynamic partial correlations may reveal features not captured by Pearson correlation.
Transaction costs, position sizing rules, and turnover constraints not modelled. The signal is evaluated on statistical accuracy only.
Investigate 5-day and 20-day ahead horizons to determine whether ML comparative advantage is more pronounced beyond the one-step persistence regime.
Substitute vine-copula dependence structures and tail dependence coefficients for rolling Pearson correlation to capture joint extreme-return behaviour.
LSTM/Transformer architectures for sequential dependency modelling; HMM-conditioned regime-aware model selection.
Integrate the signal layer into a portfolio optimisation framework with explicit turnover constraints to evaluate drawdown reduction and Calmar/Sortino improvement.
The final model is the result of several months of iterative research. Each version taught something concrete about the structure of the forecasting problem.
First working prototype with 7 models (AR1, Ridge, ElasticNet, RF, GBM, XGBoost, Naive_Last). Minimal features: 4 AR lags, short-window volatilities, returns up to lag 5. No DCC-GARCH benchmark. No feature scaling for linear models. Ridge α=1.0, XGBoost lr=0.05.
Critical fix: DCC-GARCH added with proper walk-forward protocol (previously would have used full-sample future data — invalid comparison). Investor stress-signal layer added as a binary classifier on top of correlation forecasts. No change to ML hyperparameters or features.
Focused on robustness as a stabilisation checkpoint: XGBoost GPU→CPU fallback, consistent exception handling, clean metric exports. No metric improvements — this confirmed the Version 1–2 plateau was genuine, not an implementation bug.
Key insight: results identical to V2 → the performance ceiling is real, not a bug.
Targeted improvements based on lessons from V1–V3:
Expanding features from 15 to 35 did not help tree-based models — XGBoost RMSE was unchanged or slightly worse. Ridge improved dramatically, but only after rescaling and regularisation adjustment, not because of new features. The forecasting target has a nearly linear autoregressive structure. Better performance likely requires multi-step horizons or higher-frequency data, not more features at daily frequency.































| File | Description |
|---|---|
| metrics.csv | OOS metrics, all 192 rows |
| dm_tests.csv | DM statistics, all pairs × windows |
| signal_metrics.csv | Classifier performance |
| metrics_with_ci.csv | Bootstrap 95% confidence intervals |
| refit_sensitivity.csv | Refit frequency sweep [5..63] |
| cross_asset_dm_test.csv | Equity vs non-equity DM test |
| *.tex (4 files) | LaTeX-ready tables for thesis |
| *.png (~75 figures) | All thesis figures |
Selected educational videos and external resources on the key methodologies used in this thesis — DCC-GARCH, walk-forward validation, gradient boosting, and intermarket dependency in finance.
Dynamic Conditional Correlation. Journal of Business & Economic Statistics, 20(3), 339–350.
doi:10.1198/073500102288618487 →Comparing Predictive Accuracy. JBES, 13(3), 253–263.
doi:10.1080/07350015.1995.10524599 →XGBoost: A Scalable Tree Boosting System. KDD 2016, 785–794.
doi:10.1145/2939672.2939785 →Full bibliography as cited in the thesis. All DOI links are clickable and point to the original publications.
I'm Bogdan Babaev — originally from Samara, Russia, currently finishing my M.Sc. in Artificial Intelligence at the University of Kragujevac, Serbia (2023–2026).
My research interests sit at the intersection of machine learning and quantitative finance — specifically time series forecasting, intermarket dependencies, and risk-aware modelling. This thesis is the result of about a year of work combining econometrics, walk-forward ML evaluation, and practical signal design.
Before the master's programme I studied and worked in Russia. Moving to Serbia was both a practical and academic decision — the programme in Kragujevac offered solid theoretical foundations and the freedom to pursue an applied quantitative thesis topic.
GPA: 8.0 / 10 · GitHub: github.com/b0gdaan · Email: babaev.bogdan.ru@gmail.com