Competition Info

1. Problem Context: The Closure Problem

In computational fluid dynamics and climate science, accurate simulation is often limited by computational resources. Global climate models operate on coarse grids, resolving large-scale weather patterns while neglecting small-scale turbulence due to prohibitive costs. However, these small, unobserved scales exert significant influence on the large-scale system through non-linear interactions.

This challenge focuses on the Closure Problem: Can a machine learning model observe only the macro-scale variables of a system and successfully infer the statistical impact of the hidden, sub-grid physics?

2. Objective

Participants must build a generative model capable of emulating the long-term "Climate" (statistical equilibrium) of a chaotic atmospheric system.

Unlike standard time-series forecasting, which prioritizes short-term trajectory accuracy (minimizing Mean Squared Error), this task requires long-horizon autoregressive generation. Because the system is chaotic, point-wise prediction is theoretically impossible beyond a short horizon. Therefore, models are evaluated on their ability to maintain the correct statistical distribution and temporal dynamics over 5,000 future time steps, without collapsing to a mean state or diverging into unphysical noise.

3. The Physics: Lorenz-96

The data for this competition is generated using a modified high-dimensional variant of the Lorenz-96 system, a canonical mathematical model used to study atmospheric circulation and turbulence.

Partial Observability

While the underlying simulation involves complex multi-scale interactions (macro, meso, and micro-scales), you are provided only with the Macro-scale variables (X).

The system is driven by unobserved, high-frequency turbulent forcings (hidden variables). These hidden dynamics inject energy into the observed variables, effectively acting as stochastic forcing. Your model must learn to infer the influence of these hidden variables solely from the behavior of the observed data.

System Stationarity (Crucial Note)

The system is Stationary and Ergodic. This means that although the specific trajectories in the Training and Test sets are statistically independent (different start times and locations), they are generated by the exact same set of physical equations and parameters.

The "Climate" (the probability distribution of states and the frequency of oscillations) is identical between the Training and Test files. A model that correctly captures the invariant dynamics of the Training set will mathematically generalize to the Test set.

4. Dataset Description

The dataset is provided via Kaggle Dataset (bh-2026)

1. train_ensemble.npy

Purpose: Model training and validation
Shape: (4000, 1000, 11)
Description: Contains 4,000 independent trajectories. Each trajectory is 1,000 time steps long. The samples are drawn from random states on the system's attractor to ensure ergodicity (full phase-space coverage).
Note: These trajectories are discontinuous. Do not attempt to concatenate them.

2. test_history.npy

Purpose: Inference conditioning
Shape: (500, 100, 11)
Description: Contains 500 independent test scenarios drawn from the same physical system.
Usage: Each sample provides a 100-step history window (t=-99 to t=0). This serves as the "warm-up" period to synchronize the internal state of recurrent models or reservoirs.

5. The Inference Task

For each of the 500 test scenarios, your model must:

Synchronize: Ingest the 100-step history window
Generate: Autoregressively predict the subsequent 5,000 time steps (t=1 to t=5000)

The model must operate blindly during the generation phase, relying solely on its learned dynamics to propagate the system forward.

6. Submission Format

Participants must submit a single binary file.

File Format: NumPy binary (.npy)
Required Shape: (500, 5100, 11)

Structure of the Time Dimension (5100 steps):

Indices 0:100: The history/warm-up window. These steps are not scored, but must be present to maintain array shape. You may simply copy the input data here.
Indices 100:5100: The 5,000 generated prediction steps. These are the only steps evaluated.

Python Example:

import numpy as np

# Output shape must be (500, 5100, 11)
# predictions = model.predict(...)
np.save("submission.npy", predictions)

7. Evaluation Methodology

Submissions are evaluated on Statistical Fidelity rather than trajectory Mean Squared Error (MSE). The scoring metric compares the statistical properties of the generated data against a ground-truth climate simulation of 10 million steps.

The Climate Score:

Score = 0.5 × Spatial Metric + 0.5 × Temporal Metric

(Lower scores indicate better performance)

Spatial Metric (Wasserstein Distance): Measures the divergence between the probability distribution of the generated data and the ground truth. This ensures the model captures the correct variance and extreme events (tails of the distribution) without suffering from variance collapse.
Temporal Metric (Power Spectral Density Error): Measures the log-error between the frequency power spectrum of the generated data and the ground truth. This ensures the model replicates the correct oscillatory dynamics and "smoothness" of the physical system, penalizing white noise or unphysical high-frequency artifacts.

Submission Limits

Daily Limit: 5 submissions per day
Only successful submissions (valid files) count toward limits

Beginner's Hypothesis 2026 for Sophomores