Chaotic Atmospheric System Climate Prediction
In computational fluid dynamics and climate science, accurate simulation is often limited by computational resources. Global climate models operate on coarse grids, resolving large-scale weather patterns while neglecting small-scale turbulence due to prohibitive costs. However, these small, unobserved scales exert significant influence on the large-scale system through non-linear interactions.
This challenge focuses on the Closure Problem: Can a machine learning model observe only the macro-scale variables of a system and successfully infer the statistical impact of the hidden, sub-grid physics?
Participants must build a generative model capable of emulating the long-term "Climate" (statistical equilibrium) of a chaotic atmospheric system.
Unlike standard time-series forecasting, which prioritizes short-term trajectory accuracy (minimizing Mean Squared Error), this task requires long-horizon autoregressive generation. Because the system is chaotic, point-wise prediction is theoretically impossible beyond a short horizon. Therefore, models are evaluated on their ability to maintain the correct statistical distribution and temporal dynamics over 5,000 future time steps, without collapsing to a mean state or diverging into unphysical noise.
The data for this competition is generated using a modified high-dimensional variant of the Lorenz-96 system, a canonical mathematical model used to study atmospheric circulation and turbulence.
While the underlying simulation involves complex multi-scale interactions (macro, meso, and micro-scales), you are provided only with the Macro-scale variables (X).
The system is driven by unobserved, high-frequency turbulent forcings (hidden variables). These hidden dynamics inject energy into the observed variables, effectively acting as stochastic forcing. Your model must learn to infer the influence of these hidden variables solely from the behavior of the observed data.
The system is Stationary and Ergodic. This means that although the specific trajectories in the Training and Test sets are statistically independent (different start times and locations), they are generated by the exact same set of physical equations and parameters.
The "Climate" (the probability distribution of states and the frequency of oscillations) is identical between the Training and Test files. A model that correctly captures the invariant dynamics of the Training set will mathematically generalize to the Test set.
The dataset is provided via Kaggle Dataset (bh-2026)
For each of the 500 test scenarios, your model must:
The model must operate blindly during the generation phase, relying solely on its learned dynamics to propagate the system forward.
Participants must submit a single binary file.
import numpy as np
# Output shape must be (500, 5100, 11)
# predictions = model.predict(...)
np.save("submission.npy", predictions)
Submissions are evaluated on Statistical Fidelity rather than trajectory Mean Squared Error (MSE). The scoring metric compares the statistical properties of the generated data against a ground-truth climate simulation of 10 million steps.
Score = 0.5 × Spatial Metric + 0.5 × Temporal Metric
(Lower scores indicate better performance)