Data¶
Data contracts for CRL.
BanditDataset
dataclass
¶
Logged contextual bandit dataset.
Estimand
Not applicable.
Assumptions: None. Inputs: contexts: Array with shape (n, d) or (n,). actions: Array with shape (n,) of integer action indices. rewards: Array with shape (n,) of observed rewards. behavior_action_probs: Array with shape (n,) of propensities for actions. action_space_n: Number of discrete actions. metadata: Optional dictionary for provenance. Outputs: Dataset instance with validated fields. Failure modes: Raises ValueError if shapes mismatch or probabilities are invalid.
discount
property
¶
Return discount factor for bandits (1.0).
dones
property
¶
Bandit data is terminal after each sample.
horizon
property
¶
Return horizon length for bandits (1).
next_states
property
¶
Bandit data has no next-state field.
num_samples
property
¶
Return the number of logged samples.
states
property
¶
Alias for contexts to match the core Dataset interface.
describe()
¶
Return summary statistics for the dataset.
fingerprint()
¶
Return a stable fingerprint for the dataset.
from_dataframe(df, *, context_columns, action_column='action', reward_column='reward', behavior_prob_column=None, action_space_n=None, metadata=None)
classmethod
¶
Create a LoggedBanditDataset from a pandas DataFrame.
from_dict(data)
classmethod
¶
Create a dataset from a serialized dictionary.
from_numpy(*, contexts, actions, rewards, behavior_action_probs=None, action_space_n=None, metadata=None)
classmethod
¶
Create a LoggedBanditDataset from numpy arrays.
from_parquet(path, *, context_columns, action_column='action', reward_column='reward', behavior_prob_column=None, action_space_n=None, metadata=None)
classmethod
¶
Create a LoggedBanditDataset from a parquet file.
summary()
¶
Alias for describe().
to_dict()
¶
Serialize dataset to a dictionary of arrays.
validate()
¶
Validate shapes and value ranges.
TrajectoryDataset
dataclass
¶
Logged finite-horizon trajectory dataset.
Estimand
Not applicable.
Assumptions: None. Inputs: observations: Array with shape (n, t, d) or (n, t). actions: Array with shape (n, t) of integer action indices. rewards: Array with shape (n, t) of rewards. next_observations: Array with shape matching observations. behavior_action_probs: Array with shape (n, t) of propensities. mask: Boolean array with shape (n, t) indicating valid steps. discount: Discount factor in [0, 1]. action_space_n: Number of discrete actions. state_space_n: Optional number of discrete states for one-hot features. metadata: Optional dictionary for provenance. Outputs: Dataset instance with validated fields. Failure modes: Raises ValueError if shapes mismatch or probabilities are invalid.
dones
property
¶
Infer terminal flags from the mask (last valid step per trajectory).
horizon
property
¶
Return the horizon length (max steps).
next_states
property
¶
Alias for next_observations to match the core Dataset interface.
num_steps
property
¶
Return the number of valid steps (mask True).
num_trajectories
property
¶
Return the number of trajectories.
states
property
¶
Alias for observations to match the core Dataset interface.
describe()
¶
Return summary statistics for the dataset.
fingerprint()
¶
Return a stable fingerprint for the dataset.
from_dataframe(df, *, episode_id_column='episode_id', timestep_column='timestep', observation_columns, next_observation_columns, action_column='action', reward_column='reward', behavior_prob_column=None, discount=1.0, action_space_n=None, state_space_n=None, metadata=None)
classmethod
¶
Create a TrajectoryDataset from a long-form pandas DataFrame.
from_dict(data)
classmethod
¶
Create a dataset from a serialized dictionary.
from_numpy(*, observations, actions, rewards, next_observations, mask=None, discount, action_space_n=None, behavior_action_probs=None, state_space_n=None, metadata=None)
classmethod
¶
Create a TrajectoryDataset from numpy arrays.
from_parquet(path, *, episode_id_column='episode_id', timestep_column='timestep', observation_columns, next_observation_columns, action_column='action', reward_column='reward', behavior_prob_column=None, discount=1.0, action_space_n=None, state_space_n=None, metadata=None)
classmethod
¶
Create a TrajectoryDataset from a parquet file.
summary()
¶
Alias for describe().
to_dict()
¶
Serialize dataset to a dictionary of arrays.
validate()
¶
Validate shapes and value ranges.
TransitionDataset
dataclass
¶
Transition dataset with optional episode ids and timesteps.
horizon
property
¶
Return horizon inferred from timesteps if available.
num_steps
property
¶
Return the number of transitions.
describe()
¶
Return summary statistics for the dataset.
fingerprint()
¶
Return a stable fingerprint for the dataset.
from_dataframe(df, *, state_columns, next_state_columns, action_column='action', reward_column='reward', done_column='done', behavior_prob_column=None, episode_id_column=None, timestep_column=None, discount=1.0, action_space_n=None, metadata=None)
classmethod
¶
Create a TransitionDataset from a pandas DataFrame.
from_parquet(path, *, state_columns, next_state_columns, action_column='action', reward_column='reward', done_column='done', behavior_prob_column=None, episode_id_column=None, timestep_column=None, discount=1.0, action_space_n=None, metadata=None)
classmethod
¶
Create a TransitionDataset from a parquet file.
summary()
¶
Alias for describe().
to_dict()
¶
Serialize dataset to a dictionary of arrays.
to_trajectory()
¶
Convert transitions to a TrajectoryDataset if episodes are known.
validate()
¶
Validate shapes, types, and ranges.
fingerprint_dataset(dataset, *, max_bytes=1000000)
¶
Return a stable fingerprint for a dataset.
The fingerprint hashes shapes, dtypes, and a deterministic sample of values.