Master Thesis · University of Kragujevac · 2026

Forecasting Time-Varying Intermarket Dependencies Between Cryptocurrencies and Conventional Assets Using Machine Learning

Bogdan Babaev University of Kragujevac · M.Sc. Artificial Intelligence Supervisor: Assoc. Prof. Dr Vladimir M. Milovanović Defense: 2026
University of Kragujevac AI Programme

This thesis investigates whether cryptocurrency dynamics contain information about the evolving dependence structure of conventional financial assets. Bitcoin serves as the base crypto asset; the conventional universe spans equity indices, precious-metal ETFs, and the U.S. dollar index. A reproducible walk-forward ML pipeline is developed and benchmarked against a leakage-safe DCC-GARCH(1,1) specification across 240 model evaluations (10 models × 6 pairs × 4 windows).

The central question is whether persistence in rolling correlation — and not nonlinear cross-asset signals — is the dominant source of forecastability. The empirical evidence confirms this, while also showing that all ML models significantly outperform the DCC-GARCH econometric benchmark.

3,053
Daily observations
7
Assets
240
Model evaluations
4
Rolling windows
0.943
Best OOS R² (Ridge avg)
p<10⁻³⁰⁸
ML vs DCC-GARCH (DM)

Research Questions

RQ1 · Forecastability

Are time-varying intermarket dependencies between BTC and conventional assets forecastable one step ahead under strict out-of-sample evaluation?

RQ2 · ML vs DCC-GARCH

Do ML models (ElasticNet, Ridge, GBM, XGBoost) outperform the DCC-GARCH(1,1) econometric benchmark under leakage-safe walk-forward evaluation?

RQ3 · Feature Importance

Which predictors carry the most information — historical dependency persistence, volatility, or cross-asset momentum signals?

RQ4 · Signal Layer

Can crypto-derived features predict investor stress days in traditional asset markets as a binary classification task?

▶ Live Investor Signal Demo

Run the thesis models in real time — directly in your browser. Adjust market conditions and watch the AR1 dependency forecast and stress classifier respond instantly. Try the historical scenario presets to see how the signal would have fired during real market events.

Forecasting model
AR1 · RMSE=0.0659 · R²=0.9424
Signal classifier
Logit · F1=0.243 · AUC=0.532
Historical scenario presets — click to load
Typical trading day — BTC showing moderate positive correlation with S&P 500, normal volatility, no strong directional momentum.

Adjust Market Conditions

−1 fully inverse0 uncorrelated+1 co-moving
−20% crash0% flat+20% rally
20% calm55% typical150% extreme
−40% downtrend0% flat+40% uptrend
How it works
STEP 1
AR1 forecasts tomorrow's BTC–S&P500 dependency using today's Fisher-z correlation
STEP 2
Logistic regression combines the forecast + crypto features to estimate stress probability
STAY IN MARKET
Low stress probability — dependency stable
0.368
Current dep. (Fisher-z)
0.364
Tomorrow's forecast
Stress Probability 12.4%
0% 30% alert 60% exit100%
Low stress probability. Maintain normal equity exposure. Monitor for BTC volatility increases.
Recommended action
Hold current equity positions. No risk reduction needed based on crypto signal.
Educational demo — not financial advice. Coefficients are approximate, derived from walk-forward OOS evaluation 2020–2026 on 3,053 daily observations.

Dataset & Asset Universe

Asset Universe — Detailed

BTC-USD
Bitcoin

Base asset. World's largest cryptocurrency by market cap. Trades 24/7 including weekends. Serves as the primary source of crypto-derived features throughout the thesis.

Base crypto
ETH-USD
Ethereum

Crypto reference asset. Included to contrast cross-crypto dependency (BTC↔ETH) with crypto-to-conventional dependency. Start date of sample determined by ETH availability on Yahoo Finance.

Crypto ref.
^GSPC
S&P 500 Index

Broad U.S. equity benchmark. Represents large-cap risk appetite; most tightly linked to global funding conditions and sentiment — the same macro forces that drive Bitcoin.

Equity
^IXIC
NASDAQ Composite

Tech-heavy equity index. Higher beta to growth and speculative sentiment than S&P 500, making it a natural comparator for crypto co-movement.

Equity
GLD
SPDR Gold Shares ETF

Largest gold ETF. Captures safe-haven demand, inflation expectations, and real interest rate dynamics — structural drivers distinct from crypto.

Precious metal
SLV
iShares Silver Trust ETF

Silver has both safe-haven and industrial demand components. Higher volatility than gold; provides an additional precious-metal data point.

Precious metal
UUP
Invesco US Dollar Index ETF

Proxy for broad USD strength (DXY). Reflects global risk-off positioning and macro defensive flows. Responds to geopolitical risk premia not captured by crypto features.

FX / risk-off

Sample Construction & Alignment

Period: 9 Nov 2017 — 16 May 2026 · 3,053 trading days per asset. Start date is set by the earliest available ETH-USD quote on Yahoo Finance. Earlier dates contain no Ethereum observations and are excluded.

Alignment: Bitcoin and Ethereum trade 24/7 including weekends. The panel is aligned on exchange-calendar dates shared with equity markets. Weekend and holiday crypto quotes are forward-filled up to two consecutive calendar days, then dropped, so all seven tickers share an identical date index.

Returns: Log-returns rt = log Pt − log Pt−1 are used throughout. Log-returns are additive over time, approximately normal at moderate horizons, and standard in both econometric modelling and ML pipelines for financial time series.

Data Quality Summary

TickerN obs% Miss.SkewEx.KurtAnn.Vol%
BTC-USD3,0530.00−0.7313.3856.08
ETH-USD3,0530.00−0.7410.7772.68
GLD3,0530.00−0.8913.9613.82
SLV3,0530.00−2.7751.6128.20
UUP3,0530.00−0.038.585.85
^GSPC3,0530.00−0.7322.5916.20
^IXIC3,0530.00−0.4612.6519.66

Zero missing values across all tickers after alignment. Crypto assets show substantially higher annualised volatility (56–73% vs 6–28% for conventional assets) and fat tails (excess kurtosis >10).

Walk-Forward Split

2017–2021
Initial training window
(min 800 obs)
2021–2026
Out-of-sample test
~5 years
Every 20d
Model refit cadence
(~1 calendar month)

Exploratory Figures

Normalized prices
Fig 1 — Normalized Price PathsAll assets scaled to a common base of 1. Illustrates divergent performance: BTC/ETH multiples vs near-flat conventional assets.
Rolling volatility
Fig 2 — Rolling VolatilityCrypto assets exhibit episodically much higher volatility (~5× the S&P 500), motivating dynamic rather than static dependency modelling.
Correlation heatmap
Fig 3 — Full-Sample Correlation HeatmapBTC/ETH show low static correlations with conventional assets — motivating the time-varying analysis over a single full-sample estimate.
Rolling correlations
Fig 4 — Rolling Dependency OverviewFisher-z transformed rolling correlations for all pairs across windows 14/30/60/90 days. Strong time-variation and persistence visible across all pairs.
Coverage
Fig 5 — Observation CoverageAll 7 tickers share identical date range after ETH-USD alignment. No gaps or missing bars.
Fisher transform
Fig 6 — Fisher-z TransformationMaps bounded ρ ∈ (−1,1) to unbounded ℝ, stabilising variance and improving normality of the regression target.

Methodology

Two-Layer Pipeline

Layer 1 — Dependency Forecasting: Predict the next value of Fisher-z transformed rolling correlation ρ̂t+1 between Bitcoin and each conventional asset. Expanding-window walk-forward validation; min_train = 800 obs; refit every 20 trading days.

Layer 2 — Investor Signal: Use the Layer 1 forecast + crypto features to classify whether tomorrow is a stress day for the conventional asset (|ret| > 0.75σ of negative returns, ~8–12% of days). Three classifiers: Logit, Random Forest, GBM.

Rolling Windows Evaluated

w = 14 days w = 30 days w = 60 days w = 90 days

Window Size Analysis

14d
Noisy

Lowest R². High variance makes one-step-ahead forecasting harder. ML advantage minimal.

30d
Best balance

Highest DM significance. Optimal signal-to-noise ratio. Best setting for all comparisons.

60d
High persist.

Smoother series — strong XGBoost advantage. AR(1) competitive but ML adds value.

90d
Near-trivial

Very smooth target. AR(1) almost matches XGBoost; persistence nearly accounts for all variance.

Feature Engineering

GroupVariablesRationale
Dependency lagsdep_lag1/2/5/10Most important — target is persistent
Return lagsr_base/r_other lag1/2/5Recent directional info
Rolling volatilityvol_base, vol_otherLocal regime conditions
Rolling meansmean_base, mean_otherMomentum over short windows
Spread featuresspread_abs, spread_signDivergence between crypto and asset
Corr regime gapcorr_diff (short−long ρ)Signals impending regime shifts

Feature Importance (XGBoost avg. across all pairs)

dep_lag1
42%
dep_lag2/MA
21%
BTC vol (20d)
16%
Cross momentum
11%
Other
10%

Lagged correlation (dep_lag1) alone accounts for 42% of feature importance — confirming that persistence is the primary driver. The AR(1) baseline captures ~54% of XGBoost performance using only a single feature.

Model Suite — All 10 Configurations

ModelTypeRoleKey Parameters
Naive_LastBaselinePersistence: ŷt+1 = ytNone
AR1Baseline1-lag autoregressionLinearRegression, 1 feature
HARBaselineDaily / weekly / monthly lagsOLS on dep_lag1, avg5, avg22
RidgeLinear MLL2 regularisation — best avg RMSEα=0.5 + StandardScaler
ElasticNetLinear MLL1+L2 regularisationα=0.005, l1_ratio=0.5
EnsembleAdaptiveInverse-RMSE-weighted blend60-step rolling window weights
RFTree ensembleBagging200 trees, max_depth=10, min_leaf=5
GBMTree ensembleSequential boosting (HistGBM)300 iters, lr=0.02, l2_reg=0.1
XGB_GPUTree ensembleGPU-accelerated boosting500 trees, subsample=0.8, colsample=0.8
DCC_GARCHEconometricLeakage-safe benchmarkWalk-forward re-estimated, Engle 2002
Grid search
Grid Search — BTC/S&P500, w=30Cross-validated RMSE across hyperparameter grid. ElasticNet slightly below Ridge due to stronger L1 shrinkage on weak features.
Feature importance
RF Feature Importance — BTC/S&P500, w=30dep_lag1 dominates; all other features secondary. Right panel excludes dep_lag1 to show relative importance of remaining predictors.

Empirical Results

Key finding: The dependency target is forecastable OOS across all 24 pairs × windows. Best average RMSE belongs to Ridge (0.0656), forming a near-indistinguishable top cluster with AR1 (0.0659) and HAR (0.0659) — a 0.0003 spread within which no model is statistically distinguishable. This convergence confirms that the target has a nearly linear autoregressive structure. All ML models significantly outperform DCC-GARCH (avg RMSE 0.2136 vs 0.0656–0.1241 for ML models).

Average Performance — All 24 Experiments (6 pairs × 4 windows)

Model Avg RMSE Avg MAE Avg R² Rank vs DCC-GARCH
Ridge0.06560.04190.9432#1 Best−69.3%
AR10.06590.04030.9424#2 ≈#1−69.2%
HAR0.06590.04050.9424#3 ≈#1−69.2%
Naive_Last0.06660.03950.9412#4−68.8%
ElasticNet0.06690.04340.9411#5 ML−68.7%
Ensemble0.06940.04620.9365#6 ML−67.5%
GBM0.08860.06250.8983#7 ML−58.5%
RF0.09050.06360.8936#8 ML−57.6%
XGB_GPU0.12410.08640.7923#9 ML−41.9%
DCC_GARCH0.21360.16830.3715Benchmark

Performance by Pair and Window — BTC-USD vs

ModelMAERMSEvs Naive_Last
RMSE comparison
RMSE Comparison — Aggregated All Pairs & WindowsRidge, AR1, and HAR form indistinguishable top cluster; all ML models outperform DCC-GARCH.
R2 comparison
OOS R² — All Pairs at w=30All models except DCC-GARCH exceed R²=0.85. ETH pair shows highest R² (closer crypto co-movement).
Forecast
Forecast vs Actual — BTC/S&P500, w=30All models overlaid on realized Fisher-z correlation. Persistence models track realized series closely.
Rolling RMSE
Rolling RMSE Over Time — BTC/S&P500, w=30All models spike during COVID-19 (2020) and BTC crash (2022). DCC-GARCH error is persistently higher throughout.

Diebold–Mariano Tests

The Diebold–Mariano test compares out-of-sample loss series between model pairs with Newey-West heteroskedasticity-robust correction. A positive DM statistic means the first model has lower forecast errors (is better). A negative statistic means it is worse. All p-values are at machine-epsilon (p < 10⁻³⁰⁸).

ML vs Persistence Baseline (Naive_Last)

Positive DM → Ridge is better. Negative → Naive wins. Result depends on window: Ridge beats Naive at w=14, ties at medium windows, falls behind only at w=90.

+5.14
Ridge vs Naive · BTC/S&P500 · w=14 p<0.001
+0.80
Ridge vs Naive · BTC/S&P500 · w=30 p=0.43 n.s.
−1.59
Ridge vs Naive · BTC/S&P500 · w=60 p=0.11 n.s.
−3.91
Ridge vs Naive · BTC/S&P500 · w=90 p<0.001

ML vs DCC-GARCH Benchmark

Positive DM → ML model is better than DCC-GARCH. Ridge wins in every single experiment. Margin grows with window length.

+25.91
Ridge vs DCC · BTC/S&P500 · w=14 p<10⁻³⁰⁸
+26.86
Ridge vs DCC · BTC/S&P500 · w=30 p<10⁻³⁰⁸
+35.11
Ridge vs DCC · BTC/S&P500 · w=60 p<10⁻³⁰⁸
+39.31
Ridge vs DCC · BTC/S&P500 · w=90 p<10⁻³⁰⁸

Cross-asset DM test (equity vs non-equity): DM = −9.016, p = 0.000. Equity dependency (^GSPC, ^IXIC) is statistically significantly easier to forecast than precious metals or dollar — consistent with the information-gap hypothesis: crypto features are well-specified for equity stress but incomplete for metals and FX.

DM heatmap
DM Test Heatmap — Ridge vs DCC-GARCHAll pairs × all windows. Every cell is positive (ML wins), all p-values < 0.001.

Investor Signal Layer

A second-layer binary classifier predicts stress days on conventional asset markets using Bitcoin dynamics and the Layer 1 dependency forecast. A stress day is a daily loss exceeding 0.75σ of the rolling 20-day negative-return distribution (~8–12% of days). The output is a probability-guided warning — not a directional trade signal.

Design principle: The signal is framed as a risk overlay — a supplementary input to the risk-management process. Its operational role is to raise the probability threshold for accepting risk, prompt reductions in gross exposure, or tighten dynamic risk limits when the classified probability of a stress regime is elevated.

S&P 500 Signal Performance (BTC → ^GSPC)

WindowClassifierBal. AccF1 (down)AUCExit Rate
w=14Logit0.5170.2230.51231.3%
w=14GBM_Cls0.5090.1560.52112.9%
w=30Logit0.5310.2430.53234.6%
w=30GBM_Cls0.4730.0930.49313.3%
w=60Logit0.5210.2280.51331.9%
w=90Logit0.5260.2340.50234.6%

Best: Logit at w=30 — Balanced Accuracy 0.531, F1down=0.243, AUC=0.532. Uninformative baseline = 0.500. Results are moderate but non-trivial given class imbalance.

Average Signal Performance (all pairs & windows)

ClassifierBal. AccuracyF1 (down)AUC
Logit0.5110.2140.513
GBM_Cls0.5080.1580.517
RF_Cls0.5020.0420.519

Logit achieves the highest F1 (0.214) for the stress class, maintaining sensitivity without collapsing to near-zero exit rate. GBM fires less often but preserves meaningful AUC. RF degrades to near-zero F1 due to probability calibration issues in highly imbalanced settings.

Cross-Asset Signal Quality

Equity (^GSPC, ^IXIC): Strongest signal. Broad equity markets are tightly linked to global risk appetite and funding liquidity — the same forces that drive Bitcoin. Crypto features capture a meaningful portion of equity stress episodes.

Precious metals (GLD, SLV): Attenuated signal. Gold and silver respond to real interest rate differentials, safe-haven demand, and geopolitical risk — factors not fully represented in the crypto-centred feature set.

Dollar index (UUP): Weakest signal. USD dynamics reflect macro and geopolitical forces that are structurally distinct from crypto-market drivers.

Practical implication

A risk-aware investor monitoring BTC dynamics could use the signal layer to anticipate equity stress 1 day in advance — before stress materialises in conventional markets. For loss functions that assign disproportionate weight to severe drawdowns, even a moderate F1 generates positive expected utility.

Signal SP500 w30
Signal — BTC/S&P500, w=30Upper: stress probability vs threshold. Lower: flagged vs realized stress events. Best overall configuration.
Signal SP500 w14
Signal — BTC/S&P500, w=14Shorter window provides more reactive but noisier stress flags. Higher exit rate (30.4%).

Market-Event Case Studies

How does the Bitcoin–asset correlation behave when markets actually break? These eleven episodes from 2020 to 2026 put the thesis's central variable under stress, one event at a time, and a clear split emerges. When the whole market sells off, Bitcoin and equities fall together and the diversification case disappears; when the shock is crypto-specific, the equity correlation barely reacts. Every figure and statistic below is produced automatically by 08_Market_Events_Showcase.ipynb (Appendix F of the thesis).

EventDateType BTC 7dS&P 7d ρ preρ postΔρ
COVID “Black Thursday”2020-03-12Crash−34.7%−26.2%0.170.57+0.40
Institutional Adoption Wave2020-10-08Rally+6.4%+3.5%0.450.43−0.03
China Mining Ban & Musk2021-05-19Crash−36.7%+1.1%0.160.42+0.26
Bitcoin ATH $69,0002021-11-10ATH−4.6%+1.2%0.290.37+0.08
Terra/Luna Collapse2022-05-09Crash−25.3%−3.0%0.690.78+0.10
FTX Collapse2022-11-08Crash−19.4%+3.0%0.550.46−0.09
SVB Banking Crisis2023-03-10Divergence+15.5%−1.6%0.370.20−0.17
SEC Approves Spot BTC ETFs2024-01-10Milestone−5.1%−0.1%−0.040.19+0.22
Bitcoin ATH $73,0002024-03-05ATH+27.1%+2.1%0.170.12−0.05
Trump Election Victory2024-11-06Rally+22.0%+2.6%0.420.70+0.28
“Liberation Day” Tariff Shock2025-04-02Crash−5.8%−5.7%0.340.21−0.12

BTC 7d / S&P 7d = price change over the 7 days from the event. ρ pre / ρ post = 30-day average BTC–S&P 500 correlation before/after the event; Δρ = post − pre. Red Δρ = correlation rose (diversification weakened); green Δρ = decoupling.

What the Eleven Episodes Reveal

  • Macro risk-off events drive the biggest correlation spikes. In the 2020 COVID crash the 30-day BTC–S&P 500 correlation climbed to 0.57, and during the 2025 Liberation Day tariff shock Bitcoin (−5.8%) and the S&P (−5.7%) fell together within days. These are exactly the periods where the models' rolling RMSE peaks.
  • Crypto-specific shocks barely move the equity correlation. The China mining ban, the Luna collapse, and the FTX bankruptcy each inflicted heavy Bitcoin losses while equities held steady, producing at most a brief, fading blip in correlation.
  • The ETF era keeps correlations elevated. Since the spot-ETF approvals of January 2024, institutional money has traded Bitcoin alongside growth equities, and the BTC–NASDAQ correlation has run higher than in any comparable earlier window.
  • Safe-haven moments happen, but they don't last. When SVB failed in 2023, Bitcoin rallied as equities fell — a genuine decoupling — yet the gap closed again within a few weeks. That short memory is exactly why simple persistence models keep winning out of sample: they re-absorb each regime shift after a brief lag.

Cross-Event Synthesis

Market events master timeline
Master Timeline 2018–2026BTC price, S&P 500 / NASDAQ / Gold, and the 30-day BTC–S&P 500 correlation with all 11 events marked. Correlation spikes cluster around macro risk-off episodes, not crypto-specific ones.
Correlation regime map
Correlation Regime MapLeft: post-event BTC correlation (30-day average) with each conventional asset. Right: change Δρ versus the pre-event average — isolating the regime shift caused by the event.
Return scatter grid
Return Scatter GridBTC vs S&P 500 daily returns for the ±30-day window around each event. Grey = pre-event, coloured = post-event; the within-window ρ annotates each panel.
Event regime scatter
Event Regime ScatterEach event in (BTC 7-day return, Δρ) space. Upper-left = macro risk-off crashes; lower-right = crypto-specific rallies that decoupled from equities.

Individual Event Deep-Dives

Each four-panel figure: normalised price paths · daily log-returns · 30-day & 14-day rolling correlation · pre/post correlation by pair. Click to enlarge; source links open the contemporary news report.

COVID Black Thursday
1 · COVID “Black Thursday”Liquidity-driven sell-off: BTC −35% over 7 days with a −26% S&P. 30-day BTC–S&P 500 correlation jumped 0.17 → 0.57 — diversification evaporated.Source: Reuters ↗
Institutional adoption wave 2020
2 · Institutional Adoption WaveMicroStrategy / Square / PayPal drove BTC from $10K toward $20K while equities were flat. Correlation held moderate (~0.43), well below the COVID spike — March was a liquidity event, not a structural shift.Source: The Guardian ↗
China mining ban 2021
3 · China Mining Ban & MuskMining ban + Musk tweets cut BTC −37% in a week while the S&P barely moved (+1%). A textbook crypto-idiosyncratic shock — only a brief correlation rise (0.16 → 0.42).Source: BBC ↗
Bitcoin ATH 2021
4 · Bitcoin ATH $69,000At the peak, BTC–NASDAQ co-movement was elevated on shared risk appetite. The high correlation foreshadowed the 2022 bear market as the Fed began tightening.Source: BBC ↗
Terra Luna collapse 2022
5 · Terra/Luna Collapse$60B algorithmic-stablecoin implosion: BTC −25% while the S&P fell only −3% on unrelated macro. Gold flat — crypto contagion did not reach traditional havens.Source: The Guardian ↗
FTX collapse 2022
6 · FTX CollapseExchange bankruptcy: BTC −19% over the week while the S&P rose +3% on a soft CPI print — a brief divergence, with the 30-day correlation easing 0.55 → 0.46.Source: BBC ↗
SVB banking crisis 2023
7 · SVB Banking CrisisBank failure triggered a +15% BTC rally as both BTC and Gold rose while equities fell — a clear decoupling, with the 30-day BTC–S&P correlation dropping 0.37 → 0.20 (its largest risk-off divergence in the sample).Source: NYT ↗
Spot Bitcoin ETF approval 2024
8 · SEC Approves Spot BTC ETFsAfter a decade of rejections, BlackRock IBIT & Fidelity FBTC approved. Classic buy-the-rumour, sell-the-news (BTC −5%) — but BTC became accessible to US institutions for the first time.Source: CNBC ↗
Bitcoin ATH 2024
9 · Bitcoin ATH $73,000ETF inflows (IBIT absorbed ~$10B in 7 weeks) drove BTC +27% to a new high. BTC and NASDAQ hit ATHs the same week — institutional channels create sustained co-movement.Source: CoinDesk ↗
Trump election 2024
10 · Trump Election VictoryPro-crypto platform (proposed BTC strategic reserve) lifted BTC +22% in a week; equities rallied too. Correlation rose sharply (0.42 → 0.70) on a shared political catalyst.Source: Reuters ↗
Liberation Day tariff shock 2025
11 · “Liberation Day” Tariff ShockSweeping tariffs sent the S&P −6% and BTC −6% together in days — the fastest equity drop since 2020, with Gold surging. A clear macro risk-off co-crash: BTC behaved as a risk asset, not a haven.Source: Reuters ↗

Conclusions, Limitations & Future Work

C1

Forecastability Confirmed

Time-varying BTC–asset dependence really is forecastable one step ahead, out of sample and under strict temporal ordering. The dominant signal is the persistence of the correlation series itself, not cross-asset momentum — a substantive scientific result, not a methodological failure.

C2

ML Outperforms DCC-GARCH

Every ML model — even the simple baselines — beats DCC-GARCH(1,1) significantly (DM test, p<10⁻³⁰⁸) across all 24 pair × window combinations. DCC-GARCH lags not because it is a bad model but because it is built for long-horizon covariance, not one-step-ahead forecasting of a smoothed scalar.

C3

Investor Signal Layer

Crypto-derived features give a statistically meaningful early warning of stress days in equity markets. Logit averages F1=0.214 and AUC=0.513 across all experiments, and the best configuration (BTC/S&P 500, w=30) reaches Balanced Accuracy 0.531. It is best read as a risk-management overlay, not a standalone trading engine, and it fades for precious metals and FX.

Limitations

Single base asset

Only Bitcoin used as source asset. Extension to ETH, SOL, and broader crypto baskets needed to generalise findings.

One-step-ahead horizon

Structurally favours persistence-driven models. ML comparative advantage may be more pronounced at multi-day or multi-week horizons where autoregressive signal decays.

Rolling Pearson correlation only

Tail dependence coefficients, copula-based measures, and dynamic partial correlations may reveal features not captured by Pearson correlation.

No portfolio backtest

Transaction costs, position sizing rules, and turnover constraints not modelled. The signal is evaluated on statistical accuracy only.

Directions for Future Work

Multi-horizon forecasting

Investigate 5-day and 20-day ahead horizons to determine whether ML comparative advantage is more pronounced beyond the one-step persistence regime.

Richer dependence measures

Substitute vine-copula dependence structures and tail dependence coefficients for rolling Pearson correlation to capture joint extreme-return behaviour.

Deep learning extensions

LSTM/Transformer architectures for sequential dependency modelling; HMM-conditioned regime-aware model selection.

Portfolio-level evaluation

Integrate the signal layer into a portfolio optimisation framework with explicit turnover constraints to evaluate drawdown reduction and Calmar/Sortino improvement.

Model Development Journey

The final model is the result of several months of iterative research. Each version taught something concrete about the structure of the forecasting problem.

1
Version 1 September 2025 ~15 features

Minimal Baseline

First working prototype with 7 models (AR1, Ridge, ElasticNet, RF, GBM, XGBoost, Naive_Last). Minimal features: 4 AR lags, short-window volatilities, returns up to lag 5. No DCC-GARCH benchmark. No feature scaling for linear models. Ridge α=1.0, XGBoost lr=0.05.

0.0897
Ridge RMSE (BTC/SPX w=30)
7
Models
DCC-GARCH
2
Version 2 November 2025 + DCC-GARCH

Econometric Benchmark + Signal Layer

Critical fix: DCC-GARCH added with proper walk-forward protocol (previously would have used full-sample future data — invalid comparison). Investor stress-signal layer added as a binary classifier on top of correlation forecasts. No change to ML hyperparameters or features.

0.2193
DCC RMSE (BTC/SPX w=30)
8
Models
Signal Layer
3
Version 3 February 2026 Stabilisation

Pipeline Robustness

Focused on robustness as a stabilisation checkpoint: XGBoost GPU→CPU fallback, consistent exception handling, clean metric exports. No metric improvements — this confirmed the Version 1–2 plateau was genuine, not an implementation bug.

Key insight: results identical to V2 → the performance ceiling is real, not a bug.

4
Version 4 — Current May 2026 ~35 features

Feature Expansion · HAR · Adaptive Ensemble

Targeted improvements based on lessons from V1–V3:

  • 35+ features — added correlation momentum, z-score, vol ratio, extended lags (lag20, lag60), squared returns
  • HAR model — Heterogeneous AutoRegressive with daily/weekly/monthly components (standard in financial correlation forecasting)
  • Adaptive ensemble — inverse-RMSE weighting over last 60 OOS steps instead of equal weights
  • Ridge rescaled — α 1.0→0.5 + StandardScaler: RMSE dropped from 0.0897 to 0.0651 (nearly matched AR1)
  • Bootstrap CIs + refit-frequency sensitivity sweep
0.0651
Ridge RMSE (↓27%)
10
Models incl. HAR
81
Auto-tests
0.994
Best R² (ETH w=90)

Key Lesson from 4 Versions

Expanding features from 15 to 35 did not help tree-based models — XGBoost RMSE was unchanged or slightly worse. Ridge improved dramatically, but only after rescaling and regularisation adjustment, not because of new features. The forecasting target has a nearly linear autoregressive structure. Better performance likely requires multi-step horizons or higher-frequency data, not more features at daily frequency.

Reproducibility

Pipeline Structure

master rad/ ├── run_all.py # Master orchestrator ├── config.yaml # All parameters ├── thesis_app/ │ ├── main.py # Entry point │ ├── pipeline.py # Core walk-forward loop │ ├── data_quality.py # Diagnostics + LaTeX tables │ └── dcc_module.py # Leakage-safe DCC-GARCH ├── notebooks/ # 7 analysis notebooks ├── outputs/ │ ├── results/ # CSVs: metrics, DM, signal │ ├── figures/ # ~75 PNG figures │ └── tables/ # LaTeX-ready .tex tables └── thesis/ # LaTeX source → PDF

Key Parameters (config.yaml)

min_train_size: 800 # ~3.2 years refit_every: 20 # ~1 calendar month rolling_windows: [14, 30, 60, 90] use_fisher_transform: true forecast_horizon: 1 # one-step-ahead signal_stress_sigma: 0.75 # ~top 10% stress bootstrap.n_samples: 500 # CI estimation xgb_device: "cuda" # GPU if available

Run Instructions

# Full pipeline + notebooks + LaTeX python run_all.py # Pipeline only (~4.5h on GPU) python thesis_app/main.py # DCC-GARCH benchmark separately python thesis_app/dcc_module.py # LaTeX only (after pipeline) cd thesis && latexmk -pdf main.tex

Output Files Generated

FileDescription
metrics.csvOOS metrics, all 192 rows
dm_tests.csvDM statistics, all pairs × windows
signal_metrics.csvClassifier performance
metrics_with_ci.csvBootstrap 95% confidence intervals
refit_sensitivity.csvRefit frequency sweep [5..63]
cross_asset_dm_test.csvEquity vs non-equity DM test
*.tex (4 files)LaTeX-ready tables for thesis
*.png (~75 figures)All thesis figures

Learning Resources

Selected educational videos and external resources on the key methodologies used in this thesis — DCC-GARCH, walk-forward validation, gradient boosting, and intermarket dependency in finance.

Key Papers

Engle (2002)

Dynamic Conditional Correlation. Journal of Business & Economic Statistics, 20(3), 339–350.

doi:10.1198/073500102288618487 →

Diebold & Mariano (1995)

Comparing Predictive Accuracy. JBES, 13(3), 253–263.

doi:10.1080/07350015.1995.10524599 →

Chen & Guestrin (2016)

XGBoost: A Scalable Tree Boosting System. KDD 2016, 785–794.

doi:10.1145/2939672.2939785 →

References

Full bibliography as cited in the thesis. All DOI links are clickable and point to the original publications.

  1. Engle, R.F. (2002). Dynamic Conditional Correlation: A Simple Class of Multivariate Generalized Autoregressive Conditional Heteroskedasticity Models. Journal of Business & Economic Statistics, 20(3), 339–350. DCC-GARCH doi:10.1198/073500102288618487
  2. Diebold, F.X., & Mariano, R.S. (1995). Comparing Predictive Accuracy. Journal of Business & Economic Statistics, 13(3), 253–263. DM test doi:10.1080/07350015.1995.10524599
  3. Zou, H., & Hastie, T. (2005). Regularization and Variable Selection via the Elastic Net. Journal of the Royal Statistical Society: Series B, 67(2), 301–320. ElasticNet doi:10.1111/j.1467-9868.2005.00503.x
  4. Friedman, J.H. (2001). Greedy Function Approximation: A Gradient Boosting Machine. Annals of Statistics, 29(5), 1189–1232. GBM doi:10.1214/aos/1013203451
  5. Chen, T., & Guestrin, C. (2016). XGBoost: A Scalable Tree Boosting System. Proceedings of the 22nd ACM SIGKDD Conference, 785–794. XGBoost doi:10.1145/2939672.2939785
  6. Welch, I., & Goyal, A. (2008). A Comprehensive Look at the Empirical Performance of Equity Premium Prediction. Review of Financial Studies, 21(4), 1455–1508. Forecastability doi:10.1093/rfs/hhm014
  7. Hoerl, A.E., & Kennard, R.W. (1970). Ridge Regression: Biased Estimation for Nonorthogonal Problems. Technometrics, 12(1), 55–67. Ridge doi:10.1080/00401706.1970.10488634
  8. Bollerslev, T. (1986). Generalized Autoregressive Conditional Heteroskedasticity. Journal of Econometrics, 31(3), 307–327. GARCH doi:10.1016/0304-4076(86)90063-1
  9. Breiman, L. (2001). Random Forests. Machine Learning, 45(1), 5–32. Random Forest doi:10.1023/A:1010933404324
  10. Fisher, R.A. (1915). Frequency Distribution of the Values of the Correlation Coefficient in Samples from an Indefinitely Large Population. Biometrika, 10(4), 507–521. Fisher-z doi:10.2307/2331838
  11. Newey, W.K., & West, K.D. (1987). A Simple, Positive Semi-Definite, Heteroskedasticity and Autocorrelation Consistent Covariance Matrix. Econometrica, 55(3), 703–708. HAC correction doi:10.2307/1913610
  12. Pedregosa, F. et al. (2011). Scikit-learn: Machine Learning in Python. Journal of Machine Learning Research, 12, 2825–2830. scikit-learn jmlr.org/papers/v12/pedregosa11a.html
  13. Ardia, D., Boudt, K., & Ghalanos, A. (2019). rmgarch: Multivariate GARCH Models in R. CRAN package, version 1.3-9. DCC implementation cran.r-project.org/package=rmgarch
  14. Ran Aroussi (2023). yfinance: Yahoo! Finance Market Data Downloader. Python package. Data source pypi.org/project/yfinance
Bogdan Babaev

About

I'm Bogdan Babaev — originally from Samara, Russia, currently finishing my M.Sc. in Artificial Intelligence at the University of Kragujevac, Serbia (2023–2026).

My research interests sit at the intersection of machine learning and quantitative finance — specifically time series forecasting, intermarket dependencies, and risk-aware modelling. This thesis is the result of about a year of work combining econometrics, walk-forward ML evaluation, and practical signal design.

Before the master's programme I studied and worked in Russia. Moving to Serbia was both a practical and academic decision — the programme in Kragujevac offered solid theoretical foundations and the freedom to pursue an applied quantitative thesis topic.

GPA: 8.0 / 10 · GitHub: github.com/b0gdaan · Email: babaev.bogdan.ru@gmail.com