Skip to content

Experiments

Experiment utilities.

run_benchmark_suite(suite, output_dir, seeds, config_dir='configs/benchmarks', config_path=None)

Run a benchmark suite defined by YAML configs.

run_benchmarks_to_table(output_dir, num_samples=1000, num_trajectories=200, seed=0)

Run benchmarks and write CSV/JSONL outputs.

Estimand

Policy value under intervention for each benchmark target policy.

Assumptions: Sequential ignorability, overlap, and known behavior propensities (plus Markov for MDP). Inputs: output_dir: Directory for result files. num_samples: Number of bandit samples. num_trajectories: Number of MDP trajectories. seed: RNG seed. Outputs: List of result dictionaries. Failure modes: Small samples can yield unstable estimates.