Skip to content

Integrate Logged Data

This guide shows the shortest path from real logs to a usable OPE report.

Step 1: Build a dataset

For contextual bandits, use LoggedBanditDataset.from_dataframe:

from crl.data.datasets import LoggedBanditDataset

bandit = LoggedBanditDataset.from_dataframe(
    df,
    context_columns=["x1", "x2"],
    action_column="action",
    reward_column="reward",
    behavior_prob_column="propensity",
)

For trajectories, use TrajectoryDataset.from_dataframe with long-form rows:

from crl.data.datasets import TrajectoryDataset

traj = TrajectoryDataset.from_dataframe(
    df,
    observation_columns=["obs"],
    next_observation_columns=["next_obs"],
    action_column="action",
    reward_column="reward",
    behavior_prob_column="propensity",
    discount=0.99,
    action_space_n=4,
)

If you only have transitions (s, a, r, s'), build a TransitionDataset and include episode_id and timestep so it can be converted to trajectories for evaluation. evaluate currently assumes discrete action spaces.

Step 2: Wrap your policy

If you already have a model that outputs probabilities:

from crl.policies.base import Policy

policy = Policy.from_sklearn(model, action_space_n=4)

If you have a torch model returning logits:

from crl.policies.base import Policy

policy = Policy.from_torch(model, action_space_n=4, device="cpu")

Step 3: Run evaluation

from crl.ope import evaluate_ope

report = evaluate_ope(dataset=bandit, policy=policy)
summary = report.to_dataframe()

Missing propensities

If behavior_action_probs are unavailable, evaluate will skip propensity-based estimators and fall back to estimators that do not require them (for MDPs, FQE). Use diagnostics cautiously and prefer logging propensities in production.