Repo Cleanup and Research-Grade Recommendations¶
Scope
This page is maintainer-facing. Researchers should start with Getting Started, the Estimator Selection Guide, and the Estimator Reference.
1. Executive Summary¶
I removed generated build artifacts, duplicate notebook output assets, and archived legacy documentation that was superseded by the current reference docs. The core package and documentation remain intact, and the repo is now cleaner and easier to navigate.
Before (selected):
.
├── causalrl.egg-info/
├── dist/
├── site/
├── docs/
│ ├── api/
│ ├── methods/
│ └── notebooks/docs/assets/
└── notebooks/docs/assets/
After (selected):
Risks / follow-ups:
- If you still need the retired docs pages, use archive/docs/api and archive/docs/methods as read-only references.
- Any removed notebook outputs can be regenerated by running notebooks from repo root (they already write to docs/assets).
- site/ can be re-created with mkdocs build when needed.
2. Cleanup Log (Evidence-Based)¶
Cleanup candidates table (final state)¶
| Path | Category | Evidence it is old or superseded | Replacement | Risk | Action |
|---|---|---|---|---|---|
docs/api/ |
duplicate / old docs | Not in mkdocs.yml nav; identical content to docs/reference/api |
docs/reference/api/ |
med | move to archive/ |
docs/methods/ |
old docs | Not in mkdocs.yml nav; overlaps with docs/reference/estimators and diagnostics pages |
docs/reference/estimators/, docs/reference/api/diagnostics.md |
med | move to archive/ |
docs/notebooks/docs/assets/ |
generated / duplicate | No repo references found; created by running notebooks from docs/notebooks/ |
docs/assets/ (regenerate) |
low | delete |
notebooks/docs/assets/ |
generated / duplicate | No repo references found; created by running notebooks from notebooks/ |
docs/assets/ (regenerate) |
low | delete |
dist/ |
generated | Build output; docs/how-to/release.md shows it is generated |
python -m build |
low | delete |
site/ |
generated | MkDocs output; Makefile uses mkdocs build |
mkdocs build |
low | delete |
causalrl.egg-info/ |
generated | Generated by editable installs; not referenced in repo | python -m pip install -e . |
low | delete |
.DS_Store |
generated | OS metadata; no references | none | low | delete |
docs/.DS_Store |
generated | OS metadata; no references | none | low | delete |
docs/assets/.DS_Store |
generated | OS metadata; no references | none | low | delete |
docs/notebooks/docs/ |
generated / empty | Empty leftover after asset cleanup; no references | none | low | delete |
notebooks/docs/ |
generated / empty | Empty leftover after asset cleanup; no references | none | low | delete |
notebooks/ |
documentation (kept) | README and tutorials link to notebooks; contains jupytext .py sources | none | high | keep |
Files moved to archive¶
docs/api/->archive/docs/api/- Rationale: superseded by
docs/reference/api/and not linked inmkdocs.yml. - Replacement:
docs/reference/api/. docs/methods/->archive/docs/methods/- Rationale: older summaries replaced by estimator reference and diagnostics docs; not linked in
mkdocs.yml. - Replacement:
docs/reference/estimators/anddocs/reference/api/diagnostics.md.
Files deleted (generated or unused)¶
dist/,site/,causalrl.egg-info/- Rationale: build artifacts; safe to regenerate.
- Evidence:
Makefileanddocs/how-to/release.mddescribe generation workflows; no repo references. docs/notebooks/docs/assets/,notebooks/docs/assets/- Rationale: duplicate notebook outputs created from non-root working directories.
- Evidence: no references found via
rg -n "notebooks/docs/assets"orrg -n "docs/notebooks/docs/assets". .DS_Store,docs/.DS_Store,docs/assets/.DS_Store- Rationale: OS metadata files.
docs/notebooks/docs/,notebooks/docs/- Rationale: empty directories left behind after asset cleanup.
References updated¶
- None required (no inbound links to archived paths found).
3. Researcher Review (Real + Fair, Grounded in Repo)¶
Strengths and novelty (evidence-based)¶
- Estimand-first API with explicit assumptions (
crl/estimands/policy_value.py,crl/assumptions.py,crl/estimators/base.py). - Broad estimator coverage across IS/DR/MRDR/MAGIC/MIS/FQE/DualDICE/Double RL/HCOPE (
crl/estimators/). - Diagnostics-first design: overlap, ESS, weight tails, and shift metrics (
crl/diagnostics/,crl/estimators/diagnostics.py). - Sensitivity and confounding modules with proximal OPE example (
crl/sensitivity/,crl/confounding/proximal.py). - Synthetic benchmarks and experiment harnesses with ground-truth values (
crl/benchmarks/,crl/experiments/runner.py).
Unclear or missing (with evidence)¶
CORRECT_MODELassumption is defined but not required by any estimator (seecrl/assumptions_catalog.pyvsrg required_assumptionsincrl/estimators). This makes model-based identification assumptions implicit rather than explicit.PolicyContrastEstimandis defined and documented, but no estimator or pipeline consumes it (search hits are docs and imports only). NOT FOUND: an estimator orevaluatevariant that computes contrasts directly.- Continuous-action support is unclear:
Policy.action_densityexists, but estimators only use action probabilities and dataset contracts assume discreteaction_space_n(crl/data/datasets.py,crl/ope.py). NOT FOUND: any estimator path that uses action densities. diagnosticsargument incrl/ope.evaluateis effectively a boolean toggle and not a selector for diagnostics families.
Missing theory exposition (specific file targets)¶
- NOT FOUND: a dedicated cross-fitting explanation despite
make_foldsbeing used by DR/WDR/MRDR/Double RL. Proposed target:docs/explanation/cross_fitting.mdand add links fromdocs/reference/estimators/dr.md,docs/reference/estimators/wdr.md,docs/reference/estimators/mrdr.md,docs/reference/estimators/double_rl.md. - NOT FOUND: a reference page for bootstrap inference beyond scattered mentions. Proposed target:
docs/reference/estimators/bootstrap.mdand link fromdocs/tutorials/confidence_intervals.md. - Proximal OPE is implemented but minimally documented in reference form. Proposed target: expand
docs/explanation/proximal.mdwith explicit bridge-function equations aligned tocrl/confounding/proximal.py.
Suggested minimum paper-grade theory notes structure¶
Use a single template page (e.g., docs/explanation/theory_notes_template.md) and instantiate it for each estimator family:
1. Problem setup and notation (bandit vs MDP).
2. Estimand definition (tie to PolicyValueEstimand).
3. Identification assumptions (explicit mapping to AssumptionSet).
4. Estimator formula (matching implementation).
5. Diagnostics and failure modes (overlap, ESS, tail weights).
6. Inference (bootstrap, HCOPE) and when it is valid.
7. Implementation details (cross-fitting, model classes, defaults).
8. Reproducibility checklist (seeds, config, version tags).
4. Practitioner Review (Usability + Reproducibility)¶
Installation experience¶
pyproject.tomlprovides a clean install with extras for docs, notebooks, benchmarks, behavior, and adapters.- Core install depends on
torch, which is heavy for users who only need bandit OPE. Consider clarifying minimal install paths.
API clarity and examples¶
crl.evaluate_opepluscrl.apiprovides a stable surface and notebooks cover the main workflows (examples/quickstart,notebooks/).- CLI exists (
crl/cli.py), but it only runs synthetic benchmarks. NOT FOUND: CLI path for loading real datasets described indocs/how-to/trajectory_dataset_from_parquet.mdordocs/how-to/logged_bandit_from_dataframe.md. diagnosticsparameter is not actionable beyond on/off; users cannot choose subsets.
Reproducibility checklist¶
- Seeds: present (
crl/utils/seeding.py,tests/test_seeding.py). - Configs: present (
configs/,docs/reference/configs.md). - Environment lockfile: NOT FOUND. Smallest change is to add a
requirements-lock.txtorenvironment.ymlin the repo root. - Deterministic torch settings: NOT FOUND. Consider
torch.use_deterministic_algorithmsopt-in. - Dataset provenance in reports: partially present (dataset metadata is optional), but
OpeReportdoes not includedataset.describe()yet.
5. Concrete Roadmap (PR-sized tasks)¶
-
Add
correct_modelassumption to model-based estimators. Why:CORRECT_MODELexists but is never required; model-based estimators implicitly rely on it. Files:crl/estimators/dr.py,crl/estimators/wdr.py,crl/estimators/mrdr.py,crl/estimators/magic.py,crl/estimators/fqe.py,crl/estimators/double_rl.py. Acceptance: estimators raiseValueErrorwhencorrect_modelis missing fromAssumptionSet. -
Update estimator docs to reflect
correct_modelrequirement. Why: documentation currently omits this assumption for model-based estimators. Files:docs/reference/estimators/dr.md,docs/reference/estimators/wdr.md,docs/reference/estimators/mrdr.md,docs/reference/estimators/magic.md,docs/reference/estimators/fqe.md,docs/reference/estimators/double_rl.md. Acceptance: each page includescorrect_modelin assumptions section. -
Add tests for missing
correct_modelassumption. Why: no tests enforce assumption gating on model-based estimators. Files:tests/test_estimators_*(new test module or extend existing). Acceptance: tests fail before change and pass after adding assumption checks. -
Add
evaluate_contrasthelper forPolicyContrastEstimandor mark as conceptual only. Why:PolicyContrastEstimandis defined but unused. Files:crl/ope.py,docs/concepts/estimands.md. Acceptance: contrast helper exists and is documented, or docs explicitly state it is not yet supported. -
Add example usage for
PolicyContrastEstimand. Why: current notebooks mention it but do not show how to compute contrasts. Files:docs/concepts/estimands.md,notebooks/01_estimands_and_assumptions.py. Acceptance: runnable example shows contrast computation or clearly indicates limitation. -
Implement diagnostics selection in
crl.ope.evaluate. Why:diagnosticsargument is currently a boolean toggle. Files:crl/ope.py,crl/estimators/diagnostics.py. Acceptance:diagnostics=["overlap", "ess"]runs only selected metrics;diagnostics="none"runs none. -
Add tests for diagnostics selection. Why: no test coverage for
diagnosticsbehavior. Files:tests/test_ope_pipeline.py. Acceptance: tests verify diagnostics keys and empty output when disabled. -
Include dataset summary in
OpeReport.metadata. Why: dataset classes provide.describe()but report metadata does not store it. Files:crl/ope.py. Acceptance:OpeReport.metadata["dataset"]containsdataset.describe()output. -
Add metadata section to
OpeReport.to_html. Why: HTML reports show only summary table and diagnostics. Files:crl/ope.py. Acceptance: HTML includes a metadata table with seed, diagnostics config, and dataset summary. -
Add tests for
OpeReport.to_htmlmetadata. Why: no tests coverOpeReportexport. Files:tests/test_report_exports.py(extend) or new test file. Acceptance: HTML output contains expected metadata keys. -
Extend CLI to load datasets from CSV/Parquet. Why: docs describe how to load external data, but CLI only supports synthetic benchmarks. Files:
crl/cli.py,docs/how-to/run_ope_pipeline.md. Acceptance:crl ope --dataset path --dataset-type banditruns and writes report. -
Add CLI config schema validation. Why: current CLI uses raw YAML dicts with no validation. Files:
crl/cli.py(add dataclasses or pydantic model),docs/reference/configs.md. Acceptance: invalid config fails with clear error messages. -
Add CLI tests (Typer runner). Why: no tests exercise
crlCLI. Files:tests/test_cli.py(new). Acceptance: tests runcrl opewith a temp config and output directory. -
Add
make cleantarget. Why: build artifacts and generated outputs are easy to re-introduce. Files:Makefile. Acceptance:make cleanremovesdist/,site/,*.egg-info, and notebook asset outputs. -
Add
.gitignoreentries for notebook-generated assets. Why:docs/notebooks/docs/assetsandnotebooks/docs/assetsare generated outputs. Files:.gitignore. Acceptance: regenerated assets are ignored by git status. -
Add a bootstrap inference reference page. Why: bootstrap is implemented but lacks a dedicated reference doc. Files:
docs/reference/estimators/bootstrap.md,mkdocs.yml(nav). Acceptance: page appears in Reference nav and documentsBootstrapConfigand usage. -
Document cross-fitting usage explicitly. Why: cross-fitting is used in code but not clearly surfaced in docs. Files:
docs/explanation/cross_fitting.mdor add sections in estimator docs. Acceptance: docs mention cross-fitting strategy and default fold counts. -
Expand proximal OPE documentation with equations. Why:
crl/confounding/proximal.pyimplements a simplified estimator but docs are light. Files:docs/explanation/proximal.md. Acceptance: page includes bridge-function equations consistent with code. -
Clarify continuous-action limitations. Why: policy interface mentions densities, but estimators and datasets are discrete. Files:
docs/concepts/dataset_format.md,docs/reference/api/policies.md. Acceptance: docs explicitly state discrete-action assumptions or document continuous support if added. -
Add torch seeding test. Why:
set_seedseeds torch but is not tested. Files:tests/test_seeding.py. Acceptance: test passes when torch is available and skips when not. -
Add diagnostics export to CLI outputs. Why:
OpeReport.diagnosticsis computed but CLI only writes HTML and summary CSV. Files:crl/cli.py. Acceptance:diagnostics.jsonis written next toreport.html. -
Add behavior policy estimation how-to section. Why: behavior policy fitting exists but docs are light outside API reference. Files:
docs/how-to/behavior_policy_estimation.md(new),mkdocs.yml(nav). Acceptance: page includes example withfit_behavior_policyand caveats. -
Document benchmark outputs and expected files. Why:
crl/experiments/runner.pyproduces multiple outputs but docs show only commands. Files:docs/tutorials/benchmarks.md. Acceptance: docs include an output tree example matching runner outputs. -
Add tests for
run_benchmark_suiteoutput files. Why: runner writes CSV/HTML/figures but no tests validate outputs. Files:tests/test_benchmarks_integration.py(extend) or new test. Acceptance: tests verify expected files are created in a temp dir. -
Add dataset provenance to reports and JSON exports. Why:
EstimatorReportandOpeReportdo not expose dataset metadata by default. Files:crl/ope.py,crl/estimators/base.py. Acceptance: report JSON/HTML include dataset provenance fields when available. -
Add a small reference page for
crl/estimators/diagnostics.pyoutput schema. Why: diagnostics structure is not documented beyond examples. Files:docs/reference/api/diagnostics.md. Acceptance: page lists keys and interpretation foroverlap,ess,weights.
6. OPTIONAL: Demo HTML App¶
Not added. The repo already uses MkDocs for the docs site, so a standalone HTML page under docs/demo/ would be unlinked and harder to maintain. A better alternative is a MkDocs page (for example, docs/explanation/method_explorer.md) or a notebook-driven demo that uses the existing notebooks/ sources.