Skip to content

Data

Data contracts for CRL.

BanditDataset dataclass

Logged contextual bandit dataset.

Estimand

Not applicable.

Assumptions: None. Inputs: contexts: Array with shape (n, d) or (n,). actions: Array with shape (n,) of integer action indices. rewards: Array with shape (n,) of observed rewards. behavior_action_probs: Array with shape (n,) of propensities for actions. action_space_n: Number of discrete actions. metadata: Optional dictionary for provenance. Outputs: Dataset instance with validated fields. Failure modes: Raises ValueError if shapes mismatch or probabilities are invalid.

discount property

Return discount factor for bandits (1.0).

dones property

Bandit data is terminal after each sample.

horizon property

Return horizon length for bandits (1).

next_states property

Bandit data has no next-state field.

num_samples property

Return the number of logged samples.

states property

Alias for contexts to match the core Dataset interface.

describe()

Return summary statistics for the dataset.

fingerprint()

Return a stable fingerprint for the dataset.

from_dataframe(df, *, context_columns, action_column='action', reward_column='reward', behavior_prob_column=None, action_space_n=None, metadata=None) classmethod

Create a LoggedBanditDataset from a pandas DataFrame.

from_dict(data) classmethod

Create a dataset from a serialized dictionary.

from_numpy(*, contexts, actions, rewards, behavior_action_probs=None, action_space_n=None, metadata=None) classmethod

Create a LoggedBanditDataset from numpy arrays.

from_parquet(path, *, context_columns, action_column='action', reward_column='reward', behavior_prob_column=None, action_space_n=None, metadata=None) classmethod

Create a LoggedBanditDataset from a parquet file.

summary()

Alias for describe().

to_dict()

Serialize dataset to a dictionary of arrays.

validate()

Validate shapes and value ranges.

TrajectoryDataset dataclass

Logged finite-horizon trajectory dataset.

Estimand

Not applicable.

Assumptions: None. Inputs: observations: Array with shape (n, t, d) or (n, t). actions: Array with shape (n, t) of integer action indices. rewards: Array with shape (n, t) of rewards. next_observations: Array with shape matching observations. behavior_action_probs: Array with shape (n, t) of propensities. mask: Boolean array with shape (n, t) indicating valid steps. discount: Discount factor in [0, 1]. action_space_n: Number of discrete actions. state_space_n: Optional number of discrete states for one-hot features. metadata: Optional dictionary for provenance. Outputs: Dataset instance with validated fields. Failure modes: Raises ValueError if shapes mismatch or probabilities are invalid.

dones property

Infer terminal flags from the mask (last valid step per trajectory).

horizon property

Return the horizon length (max steps).

next_states property

Alias for next_observations to match the core Dataset interface.

num_steps property

Return the number of valid steps (mask True).

num_trajectories property

Return the number of trajectories.

states property

Alias for observations to match the core Dataset interface.

describe()

Return summary statistics for the dataset.

fingerprint()

Return a stable fingerprint for the dataset.

from_dataframe(df, *, episode_id_column='episode_id', timestep_column='timestep', observation_columns, next_observation_columns, action_column='action', reward_column='reward', behavior_prob_column=None, discount=1.0, action_space_n=None, state_space_n=None, metadata=None) classmethod

Create a TrajectoryDataset from a long-form pandas DataFrame.

from_dict(data) classmethod

Create a dataset from a serialized dictionary.

from_numpy(*, observations, actions, rewards, next_observations, mask=None, discount, action_space_n=None, behavior_action_probs=None, state_space_n=None, metadata=None) classmethod

Create a TrajectoryDataset from numpy arrays.

from_parquet(path, *, episode_id_column='episode_id', timestep_column='timestep', observation_columns, next_observation_columns, action_column='action', reward_column='reward', behavior_prob_column=None, discount=1.0, action_space_n=None, state_space_n=None, metadata=None) classmethod

Create a TrajectoryDataset from a parquet file.

summary()

Alias for describe().

to_dict()

Serialize dataset to a dictionary of arrays.

validate()

Validate shapes and value ranges.

TransitionDataset dataclass

Transition dataset with optional episode ids and timesteps.

horizon property

Return horizon inferred from timesteps if available.

num_steps property

Return the number of transitions.

describe()

Return summary statistics for the dataset.

fingerprint()

Return a stable fingerprint for the dataset.

from_dataframe(df, *, state_columns, next_state_columns, action_column='action', reward_column='reward', done_column='done', behavior_prob_column=None, episode_id_column=None, timestep_column=None, discount=1.0, action_space_n=None, metadata=None) classmethod

Create a TransitionDataset from a pandas DataFrame.

from_parquet(path, *, state_columns, next_state_columns, action_column='action', reward_column='reward', done_column='done', behavior_prob_column=None, episode_id_column=None, timestep_column=None, discount=1.0, action_space_n=None, metadata=None) classmethod

Create a TransitionDataset from a parquet file.

summary()

Alias for describe().

to_dict()

Serialize dataset to a dictionary of arrays.

to_trajectory()

Convert transitions to a TrajectoryDataset if episodes are known.

validate()

Validate shapes, types, and ranges.

fingerprint_dataset(dataset, *, max_bytes=1000000)

Return a stable fingerprint for a dataset.

The fingerprint hashes shapes, dtypes, and a deterministic sample of values.