Skip to content

Validation & Compatibility


DataSchema

quprep.validation.schema.DataSchema(features)

Declare expected features, types, and value ranges for pipeline input.

Attach to a Pipeline via schema= to enforce the contract at entry. Also usable standalone via :meth:validate.

Parameters:

Name Type Description Default
features list of FeatureSpec

One spec per expected feature, in column order.

required

Examples:

>>> schema = DataSchema([
...     FeatureSpec("age", dtype="continuous", min_value=0, max_value=120),
...     FeatureSpec("income", dtype="continuous", min_value=0),
... ])
>>> schema.validate(dataset)  # raises SchemaViolationError on mismatch

Functions

from_dict(data) classmethod

Build a DataSchema from a list of dicts (e.g. loaded from JSON).

Parameters:

Name Type Description Default
data list[dict]

Each dict must have name and dtype; min_value, max_value, and nullable are optional.

required

Returns:

Type Description
DataSchema

from_json(s) classmethod

Build a DataSchema from a JSON string.

Parameters:

Name Type Description Default
s str

JSON string produced by :meth:to_json.

required

Returns:

Type Description
DataSchema

infer(dataset) classmethod

Infer a DataSchema from an existing Dataset.

Parameters:

Name Type Description Default
dataset Dataset

Reference dataset to infer schema from.

required

Returns:

Type Description
DataSchema

Schema with inferred names, types, and value ranges.

to_dict()

Serialise this schema to a plain list of dicts.

Each dict has keys name, dtype, and optionally min_value, max_value, and nullable (only included when non-default so the output stays terse).

Returns:

Type Description
list[dict]

to_json(indent=2)

Serialise this schema to a JSON string.

Parameters:

Name Type Description Default
indent int

JSON indentation level (default 2).

2

Returns:

Type Description
str

validate(dataset)

Validate a Dataset against this schema.

All violations are collected and reported together so the caller gets the full picture in a single error.

Parameters:

Name Type Description Default
dataset Dataset
required

Raises:

Type Description
SchemaViolationError

If any violations are found.


FeatureSpec

quprep.validation.schema.FeatureSpec(name, dtype, min_value=None, max_value=None, nullable=False) dataclass

Specification for a single feature column.

Parameters:

Name Type Description Default
name str

Expected column name.

required
dtype str

Expected feature type: 'continuous', 'discrete', or 'binary'.

required
min_value float

Minimum allowed value (inclusive). None means no lower bound.

None
max_value float

Maximum allowed value (inclusive). None means no upper bound.

None
nullable bool

Whether NaN is permitted. Default False.

False

SchemaViolationError

quprep.validation.schema.SchemaViolationError

Bases: ValueError

Raised when a Dataset violates a DataSchema contract.


validate_dataset

quprep.validation.validate_dataset(dataset, *, context=...)


warn_qubit_mismatch

quprep.validation.warn_qubit_mismatch(n_features, n_qubits, encoding)


CostEstimate

quprep.validation.cost.CostEstimate(encoding, n_features, n_qubits, gate_count, circuit_depth, two_qubit_gates, nisq_safe, warning) dataclass

Gate count and circuit depth estimate for an encoder configuration.

Attributes:

Name Type Description
encoding str

Name of the encoding method.

n_features int

Number of input features.

n_qubits int

Number of qubits required by this encoding.

gate_count int

Total gate count (1-qubit + 2-qubit) per circuit.

circuit_depth int

Critical-path depth estimate.

two_qubit_gates int

Number of 2-qubit gates (CNOTs). Most relevant for NISQ hardware.

nisq_safe bool

True if circuit_depth < 200 and two_qubit_gates < 50.

warning str or None

Human-readable warning if the depth is prohibitively high.


estimate_cost

quprep.validation.cost.estimate_cost(encoder, n_features)

Estimate gate count and circuit depth for an encoder configuration.

Parameters:

Name Type Description Default
encoder BaseEncoder

A configured encoder instance.

required
n_features int

Number of features in the dataset (after any reduction).

required

Returns:

Type Description
CostEstimate

check_compatibility

quprep.validation.compatibility.check_compatibility(encoder, dataset)

Check a dataset for compatibility with an encoder before encoding runs.

Catches hard failures (NaNs) and soft issues (wrong value ranges, missing fit, padding side-effects) upfront so users get a clear message rather than a cryptic error inside the encoder.

Parameters:

Name Type Description Default
encoder BaseEncoder

A configured encoder instance.

required
dataset Dataset

Dataset to check.

required

Returns:

Type Description
CompatibilityReport

is_compatible is False if any hard errors were found.


CompatibilityReport

quprep.validation.compatibility.CompatibilityReport(is_compatible, errors=list(), warnings=list()) dataclass

Result of :func:check_compatibility.

Attributes:

Name Type Description
is_compatible bool

True if no hard errors were found.

errors list[str]

Hard failures — encoding will fail or produce wrong results.

warnings list[str]

Soft issues — encoding will run but results may be suboptimal.


verify_encoding

quprep.validation.compatibility.verify_encoding(encoded, encoder)

Verify post-encoding invariants for a batch of EncodedResult objects.

Checks encoding-specific invariants that, if violated, indicate a silent pipeline misconfiguration such as a wrong normalizer being applied.

Parameters:

Name Type Description Default
encoded list[EncodedResult]

Output of encoder.encode_batch() or PipelineResult.encoded.

required
encoder BaseEncoder

The encoder used to produce encoded.

required

Returns:

Type Description
VerificationReport

VerificationReport

quprep.validation.compatibility.VerificationReport(passed, checks=list()) dataclass

Result of :func:verify_encoding.

Attributes:

Name Type Description
passed bool

True if all invariant checks passed.

checks list[dict]

One dict per check: {'name', 'passed', 'detail'}.


Examples

Check encoder compatibility before encoding

import quprep as qd

report = qd.check_compatibility(qd.AngleEncoder(rotation="ry"), dataset)
if not report.is_compatible:
    print("Errors:", report.errors)        # e.g. ["NaN values detected"]
print("Warnings:", report.warnings)       # e.g. ["suggest minmax_pi normalizer"]

Verify encoding invariants after encoding

import quprep as qd

enc = qd.AmplitudeEncoder()
encoded = enc.encode_batch(dataset)
report = qd.verify_encoding(encoded, enc)
if not report.passed:
    for check in report.checks:
        print(check["name"], check["detail"])

Define and validate a schema

import quprep as qd

schema = qd.DataSchema([
    qd.FeatureSpec("age",    dtype="continuous", min_value=0, max_value=120),
    qd.FeatureSpec("income", dtype="continuous", min_value=0),
    qd.FeatureSpec("label",  dtype="discrete"),
])

try:
    schema.validate(dataset)
except qd.SchemaViolationError as e:
    print(e)

Infer schema from data and save

import quprep as qd

schema = qd.DataSchema.infer(dataset)
schema.to_json()                          # JSON string
schema.to_json()                          # save to file
restored = qd.DataSchema.from_json(s)    # reload

Cost estimation

import quprep as qd

cost = qd.estimate_cost(qd.IQPEncoder(), n_features=8)
print(cost.n_qubits)       # 8
print(cost.circuit_depth)  # depends on reps
print(cost.nisq_safe)      # True / False
print(cost.warning)        # str | None