Experiments¶
Experiment utilities.
run_benchmark_suite(suite, output_dir, seeds, config_dir='configs/benchmarks', config_path=None)
¶
Run a benchmark suite defined by YAML configs.
run_benchmarks_to_table(output_dir, num_samples=1000, num_trajectories=200, seed=0)
¶
Run benchmarks and write CSV/JSONL outputs.
Estimand
Policy value under intervention for each benchmark target policy.
Assumptions: Sequential ignorability, overlap, and known behavior propensities (plus Markov for MDP). Inputs: output_dir: Directory for result files. num_samples: Number of bandit samples. num_trajectories: Number of MDP trajectories. seed: RNG seed. Outputs: List of result dictionaries. Failure modes: Small samples can yield unstable estimates.