Repo Cleanup and Research-Grade Recommendations¶

Scope

This page is maintainer-facing. Researchers should start with Getting Started, the Estimator Selection Guide, and the Estimator Reference.

1. Executive Summary¶

I removed generated build artifacts, duplicate notebook output assets, and archived legacy documentation that was superseded by the current reference docs. The core package and documentation remain intact, and the repo is now cleaner and easier to navigate.

Before (selected):

.
├── causalrl.egg-info/
├── dist/
├── site/
├── docs/
│   ├── api/
│   ├── methods/
│   └── notebooks/docs/assets/
└── notebooks/docs/assets/

After (selected):

.
├── archive/
│   └── docs/
│       ├── api/
│       └── methods/
├── docs/
│   └── assets/
└── notebooks/

Risks / follow-ups: - If you still need the retired docs pages, use archive/docs/api and archive/docs/methods as read-only references. - Any removed notebook outputs can be regenerated by running notebooks from repo root (they already write to docs/assets). - site/ can be re-created with mkdocs build when needed.

2. Cleanup Log (Evidence-Based)¶

Cleanup candidates table (final state)¶

Path	Category	Evidence it is old or superseded	Replacement	Risk	Action
`docs/api/`	duplicate / old docs	Not in `mkdocs.yml` nav; identical content to `docs/reference/api`	`docs/reference/api/`	med	move to `archive/`
`docs/methods/`	old docs	Not in `mkdocs.yml` nav; overlaps with `docs/reference/estimators` and diagnostics pages	`docs/reference/estimators/`, `docs/reference/api/diagnostics.md`	med	move to `archive/`
`docs/notebooks/docs/assets/`	generated / duplicate	No repo references found; created by running notebooks from `docs/notebooks/`	`docs/assets/` (regenerate)	low	delete
`notebooks/docs/assets/`	generated / duplicate	No repo references found; created by running notebooks from `notebooks/`	`docs/assets/` (regenerate)	low	delete
`dist/`	generated	Build output; `docs/how-to/release.md` shows it is generated	`python -m build`	low	delete
`site/`	generated	MkDocs output; `Makefile` uses `mkdocs build`	`mkdocs build`	low	delete
`causalrl.egg-info/`	generated	Generated by editable installs; not referenced in repo	`python -m pip install -e .`	low	delete
`.DS_Store`	generated	OS metadata; no references	none	low	delete
`docs/.DS_Store`	generated	OS metadata; no references	none	low	delete
`docs/assets/.DS_Store`	generated	OS metadata; no references	none	low	delete
`docs/notebooks/docs/`	generated / empty	Empty leftover after asset cleanup; no references	none	low	delete
`notebooks/docs/`	generated / empty	Empty leftover after asset cleanup; no references	none	low	delete
`notebooks/`	documentation (kept)	README and tutorials link to notebooks; contains jupytext .py sources	none	high	keep

Files moved to archive¶

docs/api/ -> archive/docs/api/
Rationale: superseded by docs/reference/api/ and not linked in mkdocs.yml.
Replacement: docs/reference/api/.
docs/methods/ -> archive/docs/methods/
Rationale: older summaries replaced by estimator reference and diagnostics docs; not linked in mkdocs.yml.
Replacement: docs/reference/estimators/ and docs/reference/api/diagnostics.md.

Files deleted (generated or unused)¶

dist/, site/, causalrl.egg-info/
Rationale: build artifacts; safe to regenerate.
Evidence: Makefile and docs/how-to/release.md describe generation workflows; no repo references.
docs/notebooks/docs/assets/, notebooks/docs/assets/
Rationale: duplicate notebook outputs created from non-root working directories.
Evidence: no references found via rg -n "notebooks/docs/assets" or rg -n "docs/notebooks/docs/assets".
.DS_Store, docs/.DS_Store, docs/assets/.DS_Store
Rationale: OS metadata files.
docs/notebooks/docs/, notebooks/docs/
Rationale: empty directories left behind after asset cleanup.

References updated¶

None required (no inbound links to archived paths found).

3. Researcher Review (Real + Fair, Grounded in Repo)¶

Strengths and novelty (evidence-based)¶

Estimand-first API with explicit assumptions (crl/estimands/policy_value.py, crl/assumptions.py, crl/estimators/base.py).
Broad estimator coverage across IS/DR/MRDR/MAGIC/MIS/FQE/DualDICE/Double RL/HCOPE (crl/estimators/).
Diagnostics-first design: overlap, ESS, weight tails, and shift metrics (crl/diagnostics/, crl/estimators/diagnostics.py).
Sensitivity and confounding modules with proximal OPE example (crl/sensitivity/, crl/confounding/proximal.py).
Synthetic benchmarks and experiment harnesses with ground-truth values (crl/benchmarks/, crl/experiments/runner.py).

Unclear or missing (with evidence)¶

CORRECT_MODEL assumption is defined but not required by any estimator (see crl/assumptions_catalog.py vs rg required_assumptions in crl/estimators). This makes model-based identification assumptions implicit rather than explicit.
PolicyContrastEstimand is defined and documented, but no estimator or pipeline consumes it (search hits are docs and imports only). NOT FOUND: an estimator or evaluate variant that computes contrasts directly.
Continuous-action support is unclear: Policy.action_density exists, but estimators only use action probabilities and dataset contracts assume discrete action_space_n (crl/data/datasets.py, crl/ope.py). NOT FOUND: any estimator path that uses action densities.
diagnostics argument in crl/ope.evaluate is effectively a boolean toggle and not a selector for diagnostics families.

Missing theory exposition (specific file targets)¶

NOT FOUND: a dedicated cross-fitting explanation despite make_folds being used by DR/WDR/MRDR/Double RL. Proposed target: docs/explanation/cross_fitting.md and add links from docs/reference/estimators/dr.md, docs/reference/estimators/wdr.md, docs/reference/estimators/mrdr.md, docs/reference/estimators/double_rl.md.
NOT FOUND: a reference page for bootstrap inference beyond scattered mentions. Proposed target: docs/reference/estimators/bootstrap.md and link from docs/tutorials/confidence_intervals.md.
Proximal OPE is implemented but minimally documented in reference form. Proposed target: expand docs/explanation/proximal.md with explicit bridge-function equations aligned to crl/confounding/proximal.py.

Suggested minimum paper-grade theory notes structure¶

Use a single template page (e.g., docs/explanation/theory_notes_template.md) and instantiate it for each estimator family: 1. Problem setup and notation (bandit vs MDP). 2. Estimand definition (tie to PolicyValueEstimand). 3. Identification assumptions (explicit mapping to AssumptionSet). 4. Estimator formula (matching implementation). 5. Diagnostics and failure modes (overlap, ESS, tail weights). 6. Inference (bootstrap, HCOPE) and when it is valid. 7. Implementation details (cross-fitting, model classes, defaults). 8. Reproducibility checklist (seeds, config, version tags).

4. Practitioner Review (Usability + Reproducibility)¶

Installation experience¶

pyproject.toml provides a clean install with extras for docs, notebooks, benchmarks, behavior, and adapters.
Core install depends on torch, which is heavy for users who only need bandit OPE. Consider clarifying minimal install paths.

API clarity and examples¶

crl.evaluate_ope plus crl.api provides a stable surface and notebooks cover the main workflows (examples/quickstart, notebooks/).
CLI exists (crl/cli.py), but it only runs synthetic benchmarks. NOT FOUND: CLI path for loading real datasets described in docs/how-to/trajectory_dataset_from_parquet.md or docs/how-to/logged_bandit_from_dataframe.md.
diagnostics parameter is not actionable beyond on/off; users cannot choose subsets.

Reproducibility checklist¶

Seeds: present (crl/utils/seeding.py, tests/test_seeding.py).
Configs: present (configs/, docs/reference/configs.md).
Environment lockfile: NOT FOUND. Smallest change is to add a requirements-lock.txt or environment.yml in the repo root.
Deterministic torch settings: NOT FOUND. Consider torch.use_deterministic_algorithms opt-in.
Dataset provenance in reports: partially present (dataset metadata is optional), but OpeReport does not include dataset.describe() yet.

5. Concrete Roadmap (PR-sized tasks)¶

Add correct_model assumption to model-based estimators. Why: CORRECT_MODEL exists but is never required; model-based estimators implicitly rely on it. Files: crl/estimators/dr.py, crl/estimators/wdr.py, crl/estimators/mrdr.py, crl/estimators/magic.py, crl/estimators/fqe.py, crl/estimators/double_rl.py. Acceptance: estimators raise ValueError when correct_model is missing from AssumptionSet.
Update estimator docs to reflect correct_model requirement. Why: documentation currently omits this assumption for model-based estimators. Files: docs/reference/estimators/dr.md, docs/reference/estimators/wdr.md, docs/reference/estimators/mrdr.md, docs/reference/estimators/magic.md, docs/reference/estimators/fqe.md, docs/reference/estimators/double_rl.md. Acceptance: each page includes correct_model in assumptions section.
Add tests for missing correct_model assumption. Why: no tests enforce assumption gating on model-based estimators. Files: tests/test_estimators_* (new test module or extend existing). Acceptance: tests fail before change and pass after adding assumption checks.
Add evaluate_contrast helper for PolicyContrastEstimand or mark as conceptual only. Why: PolicyContrastEstimand is defined but unused. Files: crl/ope.py, docs/concepts/estimands.md. Acceptance: contrast helper exists and is documented, or docs explicitly state it is not yet supported.
Add example usage for PolicyContrastEstimand. Why: current notebooks mention it but do not show how to compute contrasts. Files: docs/concepts/estimands.md, notebooks/01_estimands_and_assumptions.py. Acceptance: runnable example shows contrast computation or clearly indicates limitation.
Implement diagnostics selection in crl.ope.evaluate. Why: diagnostics argument is currently a boolean toggle. Files: crl/ope.py, crl/estimators/diagnostics.py. Acceptance: diagnostics=["overlap", "ess"] runs only selected metrics; diagnostics="none" runs none.
Add tests for diagnostics selection. Why: no test coverage for diagnostics behavior. Files: tests/test_ope_pipeline.py. Acceptance: tests verify diagnostics keys and empty output when disabled.
Include dataset summary in OpeReport.metadata. Why: dataset classes provide .describe() but report metadata does not store it. Files: crl/ope.py. Acceptance: OpeReport.metadata["dataset"] contains dataset.describe() output.
Add metadata section to OpeReport.to_html. Why: HTML reports show only summary table and diagnostics. Files: crl/ope.py. Acceptance: HTML includes a metadata table with seed, diagnostics config, and dataset summary.
Add tests for OpeReport.to_html metadata. Why: no tests cover OpeReport export. Files: tests/test_report_exports.py (extend) or new test file. Acceptance: HTML output contains expected metadata keys.
Extend CLI to load datasets from CSV/Parquet. Why: docs describe how to load external data, but CLI only supports synthetic benchmarks. Files: crl/cli.py, docs/how-to/run_ope_pipeline.md. Acceptance: crl ope --dataset path --dataset-type bandit runs and writes report.
Add CLI config schema validation. Why: current CLI uses raw YAML dicts with no validation. Files: crl/cli.py (add dataclasses or pydantic model), docs/reference/configs.md. Acceptance: invalid config fails with clear error messages.
Add CLI tests (Typer runner). Why: no tests exercise crl CLI. Files: tests/test_cli.py (new). Acceptance: tests run crl ope with a temp config and output directory.
Add make clean target. Why: build artifacts and generated outputs are easy to re-introduce. Files: Makefile. Acceptance: make clean removes dist/, site/, *.egg-info, and notebook asset outputs.
Add .gitignore entries for notebook-generated assets. Why: docs/notebooks/docs/assets and notebooks/docs/assets are generated outputs. Files: .gitignore. Acceptance: regenerated assets are ignored by git status.
Add a bootstrap inference reference page. Why: bootstrap is implemented but lacks a dedicated reference doc. Files: docs/reference/estimators/bootstrap.md, mkdocs.yml (nav). Acceptance: page appears in Reference nav and documents BootstrapConfig and usage.
Document cross-fitting usage explicitly. Why: cross-fitting is used in code but not clearly surfaced in docs. Files: docs/explanation/cross_fitting.md or add sections in estimator docs. Acceptance: docs mention cross-fitting strategy and default fold counts.
Expand proximal OPE documentation with equations. Why: crl/confounding/proximal.py implements a simplified estimator but docs are light. Files: docs/explanation/proximal.md. Acceptance: page includes bridge-function equations consistent with code.
Clarify continuous-action limitations. Why: policy interface mentions densities, but estimators and datasets are discrete. Files: docs/concepts/dataset_format.md, docs/reference/api/policies.md. Acceptance: docs explicitly state discrete-action assumptions or document continuous support if added.
Add torch seeding test. Why: set_seed seeds torch but is not tested. Files: tests/test_seeding.py. Acceptance: test passes when torch is available and skips when not.
Add diagnostics export to CLI outputs. Why: OpeReport.diagnostics is computed but CLI only writes HTML and summary CSV. Files: crl/cli.py. Acceptance: diagnostics.json is written next to report.html.
Add behavior policy estimation how-to section. Why: behavior policy fitting exists but docs are light outside API reference. Files: docs/how-to/behavior_policy_estimation.md (new), mkdocs.yml (nav). Acceptance: page includes example with fit_behavior_policy and caveats.
Document benchmark outputs and expected files. Why: crl/experiments/runner.py produces multiple outputs but docs show only commands. Files: docs/tutorials/benchmarks.md. Acceptance: docs include an output tree example matching runner outputs.
Add tests for run_benchmark_suite output files. Why: runner writes CSV/HTML/figures but no tests validate outputs. Files: tests/test_benchmarks_integration.py (extend) or new test. Acceptance: tests verify expected files are created in a temp dir.
Add dataset provenance to reports and JSON exports. Why: EstimatorReport and OpeReport do not expose dataset metadata by default. Files: crl/ope.py, crl/estimators/base.py. Acceptance: report JSON/HTML include dataset provenance fields when available.
Add a small reference page for crl/estimators/diagnostics.py output schema. Why: diagnostics structure is not documented beyond examples. Files: docs/reference/api/diagnostics.md. Acceptance: page lists keys and interpretation for overlap, ess, weights.

6. OPTIONAL: Demo HTML App¶

Not added. The repo already uses MkDocs for the docs site, so a standalone HTML page under docs/demo/ would be unlinked and harder to maintain. A better alternative is a MkDocs page (for example, docs/explanation/method_explorer.md) or a notebook-driven demo that uses the existing notebooks/ sources.