src.asqi.loaders¶
Attributes¶
Functions¶
|
Load and validate test cases from a dataset, yielding one typed instance per row. |
Module Contents¶
- src.asqi.loaders.logger¶
- src.asqi.loaders.load_test_cases(path: str | pathlib.Path | dict | asqi.schemas.HFDatasetDefinition, test_case_class: type[load_test_cases.T], *, input_mount_path: pathlib.Path | None = None) collections.abc.Generator[load_test_cases.T, None, None]¶
Load and validate test cases from a dataset, yielding one typed instance per row.
Accepts either a plain file path or an
HFDatasetDefinitionconfig dict (the same format produced byrag_datasets.yaml). Dataset configs require no changes when migrating containers to use this function.Supported file formats (when
pathis a file path): JSONL (.jsonl), JSON (.json), CSV (.csv), Parquet (.parquet).When
pathis anHFDatasetDefinition/ dict, all formats and sources are available (HuggingFace Hub, local files, column mapping via the config’smappingfield).Multiple schemas from one dataset
A single dataset row can be valid for more than one TestCase schema. For example, a RAG dataset with
query,answer, andcontextcolumns can validate against bothAnsweredRAGTestCase(needs answer only) andContextualizedRAGTestCase(needs context only). Callload_test_casesonce per schema — each call is independent and validates only the fields its schema requires:p = Path("/input") answered = list(load_test_cases(config, AnsweredRAGTestCase, input_mount_path=p)) contextual = list(load_test_cases(config, ContextualizedRAGTestCase, input_mount_path=p))
If a row is missing a field required by the schema, validation fails for that row and a
ValueErroris raised with the row index and the failing field name.- Args:
path: One of:
str | Path— direct path to a local dataset file.dict | HFDatasetDefinition— dataset config fromrag_datasets.yaml. Columnmappingand Hub/local loading are handled automatically.
- test_case_class: Pydantic model to validate each row against, e.g.
AnsweredRAGTestCase,LLMTestCase, or a per-container schema.- input_mount_path: Optional base directory prepended to relative file
paths.
- Yields:
Validated instances of
test_case_class, one per dataset row.- Raises:
FileNotFoundError: If the resolved file path does not exist. ValueError: If a row fails schema validation (includes row index and
field names).
Examples:
# Plain file path from asqi.loaders import load_test_cases from asqi.schemas import AnsweredRAGTestCase for tc in load_test_cases("dataset.jsonl", AnsweredRAGTestCase): print(tc.query, tc.answer) # HFDatasetDefinition config dict — rag_datasets.yaml stays unchanged accuracy_rows = list( load_test_cases( dataset_config["accuracy_dataset"], AnsweredRAGTestCase, input_mount_path=input_mount_path, ) )