Validation & Compatibility¶
DataSchema¶
quprep.validation.schema.DataSchema(features)
¶
Declare expected features, types, and value ranges for pipeline input.
Attach to a Pipeline via schema= to enforce the contract at entry.
Also usable standalone via :meth:validate.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
features
|
list of FeatureSpec
|
One spec per expected feature, in column order. |
required |
Examples:
>>> schema = DataSchema([
... FeatureSpec("age", dtype="continuous", min_value=0, max_value=120),
... FeatureSpec("income", dtype="continuous", min_value=0),
... ])
>>> schema.validate(dataset) # raises SchemaViolationError on mismatch
Functions¶
from_dict(data)
classmethod
¶
Build a DataSchema from a list of dicts (e.g. loaded from JSON).
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
data
|
list[dict]
|
Each dict must have |
required |
Returns:
| Type | Description |
|---|---|
DataSchema
|
|
from_json(s)
classmethod
¶
Build a DataSchema from a JSON string.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
s
|
str
|
JSON string produced by :meth: |
required |
Returns:
| Type | Description |
|---|---|
DataSchema
|
|
infer(dataset)
classmethod
¶
Infer a DataSchema from an existing Dataset.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
dataset
|
Dataset
|
Reference dataset to infer schema from. |
required |
Returns:
| Type | Description |
|---|---|
DataSchema
|
Schema with inferred names, types, and value ranges. |
to_dict()
¶
Serialise this schema to a plain list of dicts.
Each dict has keys name, dtype, and optionally min_value,
max_value, and nullable (only included when non-default so the
output stays terse).
Returns:
| Type | Description |
|---|---|
list[dict]
|
|
to_json(indent=2)
¶
Serialise this schema to a JSON string.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
indent
|
int
|
JSON indentation level (default 2). |
2
|
Returns:
| Type | Description |
|---|---|
str
|
|
validate(dataset)
¶
Validate a Dataset against this schema.
All violations are collected and reported together so the caller gets the full picture in a single error.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
dataset
|
Dataset
|
|
required |
Raises:
| Type | Description |
|---|---|
SchemaViolationError
|
If any violations are found. |
FeatureSpec¶
quprep.validation.schema.FeatureSpec(name, dtype, min_value=None, max_value=None, nullable=False)
dataclass
¶
Specification for a single feature column.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
name
|
str
|
Expected column name. |
required |
dtype
|
str
|
Expected feature type: |
required |
min_value
|
float
|
Minimum allowed value (inclusive). |
None
|
max_value
|
float
|
Maximum allowed value (inclusive). |
None
|
nullable
|
bool
|
Whether NaN is permitted. Default |
False
|
SchemaViolationError¶
quprep.validation.schema.SchemaViolationError
¶
Bases: ValueError
Raised when a Dataset violates a DataSchema contract.
validate_dataset¶
quprep.validation.validate_dataset(dataset, *, context=...)
¶
warn_qubit_mismatch¶
quprep.validation.warn_qubit_mismatch(n_features, n_qubits, encoding)
¶
CostEstimate¶
quprep.validation.cost.CostEstimate(encoding, n_features, n_qubits, gate_count, circuit_depth, two_qubit_gates, nisq_safe, warning)
dataclass
¶
Gate count and circuit depth estimate for an encoder configuration.
Attributes:
| Name | Type | Description |
|---|---|---|
encoding |
str
|
Name of the encoding method. |
n_features |
int
|
Number of input features. |
n_qubits |
int
|
Number of qubits required by this encoding. |
gate_count |
int
|
Total gate count (1-qubit + 2-qubit) per circuit. |
circuit_depth |
int
|
Critical-path depth estimate. |
two_qubit_gates |
int
|
Number of 2-qubit gates (CNOTs). Most relevant for NISQ hardware. |
nisq_safe |
bool
|
|
warning |
str or None
|
Human-readable warning if the depth is prohibitively high. |
estimate_cost¶
quprep.validation.cost.estimate_cost(encoder, n_features)
¶
Estimate gate count and circuit depth for an encoder configuration.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
encoder
|
BaseEncoder
|
A configured encoder instance. |
required |
n_features
|
int
|
Number of features in the dataset (after any reduction). |
required |
Returns:
| Type | Description |
|---|---|
CostEstimate
|
|
check_compatibility¶
quprep.validation.compatibility.check_compatibility(encoder, dataset)
¶
Check a dataset for compatibility with an encoder before encoding runs.
Catches hard failures (NaNs) and soft issues (wrong value ranges, missing fit, padding side-effects) upfront so users get a clear message rather than a cryptic error inside the encoder.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
encoder
|
BaseEncoder
|
A configured encoder instance. |
required |
dataset
|
Dataset
|
Dataset to check. |
required |
Returns:
| Type | Description |
|---|---|
CompatibilityReport
|
|
CompatibilityReport¶
quprep.validation.compatibility.CompatibilityReport(is_compatible, errors=list(), warnings=list())
dataclass
¶
Result of :func:check_compatibility.
Attributes:
| Name | Type | Description |
|---|---|---|
is_compatible |
bool
|
|
errors |
list[str]
|
Hard failures — encoding will fail or produce wrong results. |
warnings |
list[str]
|
Soft issues — encoding will run but results may be suboptimal. |
verify_encoding¶
quprep.validation.compatibility.verify_encoding(encoded, encoder)
¶
Verify post-encoding invariants for a batch of EncodedResult objects.
Checks encoding-specific invariants that, if violated, indicate a silent pipeline misconfiguration such as a wrong normalizer being applied.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
encoded
|
list[EncodedResult]
|
Output of |
required |
encoder
|
BaseEncoder
|
The encoder used to produce |
required |
Returns:
| Type | Description |
|---|---|
VerificationReport
|
|
VerificationReport¶
quprep.validation.compatibility.VerificationReport(passed, checks=list())
dataclass
¶
Result of :func:verify_encoding.
Attributes:
| Name | Type | Description |
|---|---|---|
passed |
bool
|
|
checks |
list[dict]
|
One dict per check: |
Examples¶
Check encoder compatibility before encoding¶
import quprep as qd
report = qd.check_compatibility(qd.AngleEncoder(rotation="ry"), dataset)
if not report.is_compatible:
print("Errors:", report.errors) # e.g. ["NaN values detected"]
print("Warnings:", report.warnings) # e.g. ["suggest minmax_pi normalizer"]
Verify encoding invariants after encoding¶
import quprep as qd
enc = qd.AmplitudeEncoder()
encoded = enc.encode_batch(dataset)
report = qd.verify_encoding(encoded, enc)
if not report.passed:
for check in report.checks:
print(check["name"], check["detail"])
Define and validate a schema¶
import quprep as qd
schema = qd.DataSchema([
qd.FeatureSpec("age", dtype="continuous", min_value=0, max_value=120),
qd.FeatureSpec("income", dtype="continuous", min_value=0),
qd.FeatureSpec("label", dtype="discrete"),
])
try:
schema.validate(dataset)
except qd.SchemaViolationError as e:
print(e)
Infer schema from data and save¶
import quprep as qd
schema = qd.DataSchema.infer(dataset)
schema.to_json() # JSON string
schema.to_json() # save to file
restored = qd.DataSchema.from_json(s) # reload