API Reference¶
QuPrep's public API has four top-level entry points (use import quprep as qd):
| Name | Description |
|---|---|
qd.prepare() |
One-liner: source → circuits |
qd.Pipeline |
Composable pipeline with full stage control |
qd.recommend() |
Encoding recommendation engine |
qd.compare_encodings() |
Side-by-side cost comparison of all encoders |
Everything else is accessed via submodules.
Top-level¶
import quprep as qd
qd.__version__ # "0.10.0"
qd.prepare(...)
qd.Pipeline(...)
qd.recommend(...)
qd.draw_ascii(...)
qd.draw_matplotlib(...)
# All classes are on the top-level namespace — no sub-imports needed:
qd.AngleEncoder()
qd.Imputer()
qd.PCAReducer()
qd.DataSchema(...)
qd.estimate_cost(...)
qd.compare_encodings(...)
qd.ComparisonResult
qd.fingerprint_pipeline(...)
qd.FingerprintResult
# Ingest
qd.CSVIngester()
qd.NumpyIngester()
# Noise-aware preprocessing (v0.9.0)
qd.NoiseProfile(...)
qd.NoiseAwarePreprocessor(...)
# Encoding quality metrics (v0.9.0)
qd.expressibility(...)
qd.entanglement_capability(...)
qd.kernel_alignment(...)
qd.score_encoding(...)
qd.EncoderMetrics
qd.detect_barren_plateau(...)
qd.BarrenPlateauReport
# Class imbalance (v0.9.0)
qd.ImbalanceHandler()
# New encoders (v0.10.0)
qd.DenseAngleEncoder()
qd.DiscretizedEncoder()
# Quantum-specific preprocessing (v0.10.0)
qd.check_compatibility(...)
qd.CompatibilityReport
qd.verify_encoding(...)
qd.VerificationReport
qd.encoding_sensitivity(...)
qd.SensitivityResult
qd.suggest_pipeline(...)
qd.PipelineSuggestion
qd.preprocessing_report(...)
qd.PreprocessingReport
qd.inspect_encoding(...)
qd.EncodingParams
qd.GateParam
Submodules¶
| Module | Contents |
|---|---|
quprep.encode.angle |
AngleEncoder |
quprep.encode.amplitude |
AmplitudeEncoder |
quprep.encode.basis |
BasisEncoder |
quprep.encode.entangled_angle |
EntangledAngleEncoder |
quprep.encode.iqp |
IQPEncoder |
quprep.encode.reupload |
ReUploadEncoder |
quprep.encode.hamiltonian |
HamiltonianEncoder |
quprep.encode.zz_feature_map |
ZZFeatureMapEncoder |
quprep.encode.pauli_feature_map |
PauliFeatureMapEncoder |
quprep.encode.random_fourier |
RandomFourierEncoder |
quprep.encode.tensor_product |
TensorProductEncoder |
quprep.encode.qaoa_problem |
QAOAProblemEncoder |
quprep.encode.dense_angle |
DenseAngleEncoder |
quprep.encode.discretized |
DiscretizedEncoder |
quprep.encode.inspector |
inspect_encoding, EncodingParams, GateParam |
quprep.encode.base |
BaseEncoder, EncodedResult |
quprep.export.qasm_export |
QASMExporter |
quprep.export.qiskit_export |
QiskitExporter |
quprep.export.pennylane_export |
PennyLaneExporter |
quprep.export.cirq_export |
CirqExporter |
quprep.export.tket_export |
TKETExporter |
quprep.export.visualize |
draw_ascii, draw_matplotlib |
quprep.normalize.scalers |
Scaler, auto_normalizer |
quprep.clean.imputer |
Imputer |
quprep.clean.outlier |
OutlierHandler |
quprep.clean.categorical |
CategoricalEncoder |
quprep.clean.selector |
FeatureSelector |
quprep.clean.imbalance |
ImbalanceHandler |
quprep.ingest.csv_ingester |
CSVIngester |
quprep.ingest.numpy_ingester |
NumpyIngester |
quprep.ingest.huggingface_ingester |
HuggingFaceIngester |
quprep.ingest.kaggle_ingester |
KaggleIngester |
quprep.ingest.openml_ingester |
OpenMLIngester |
quprep.ingest.profiler |
profile, DatasetProfile, preprocessing_report, PreprocessingReport |
quprep.preprocess.noise_aware |
NoiseProfile, NoiseAwarePreprocessor |
quprep.metrics |
expressibility, entanglement_capability, kernel_alignment, EncoderMetrics, score_encoding, detect_barren_plateau, BarrenPlateauReport, encoding_sensitivity, SensitivityResult |
quprep.preprocess.window |
WindowTransformer |
quprep.core.dataset |
Dataset |
quprep.core.fingerprint |
fingerprint_pipeline, FingerprintResult |
quprep.core.recommender |
recommend, suggest_pipeline, PipelineSuggestion |
quprep.core.qubit_suggestion |
suggest_qubits, QubitSuggestion |
quprep.compare |
compare_encodings, ComparisonResult |
quprep.validation |
DataSchema, FeatureSpec, SchemaViolationError, validate_dataset, warn_qubit_mismatch, QuPrepWarning |
quprep.validation.compatibility |
check_compatibility, CompatibilityReport, verify_encoding, VerificationReport |
quprep.validation.cost |
CostEstimate, estimate_cost |
quprep.qubo |
to_qubo, QUBOResult, qubo_to_ising, ising_to_qubo, IsingResult |
quprep.qubo.problems |
max_cut, knapsack, tsp, portfolio, graph_color, scheduling, number_partition |
quprep.qubo.solver |
solve_brute, solve_sa, SolveResult (classical reference utilities — not in quprep.qubo.__all__) |
quprep.qubo.qaoa |
qaoa_circuit |
quprep.qubo.constraints |
equality_penalty, inequality_penalty |
quprep.qubo.ising |
qubo_to_ising, ising_to_qubo |
quprep.qubo.utils |
add_qubo |
quprep.qubo.visualize |
draw_qubo, draw_ising |
quprep
¶
QuPrep — Quantum Data Preparation.
The missing preprocessing layer between classical datasets and quantum computing frameworks.
Both import styles are supported::
import quprep
import quprep as qd # "quantum data" — preferred short alias
One-liner::
circuit = qd.prepare("data.csv", encoding="angle", framework="qiskit")
Full pipeline::
pipeline = qd.Pipeline(
cleaner=qd.Imputer(strategy="knn"),
reducer=qd.LDAReducer(n_components=4),
encoder=qd.AngleEncoder(),
exporter=qd.QiskitExporter(),
)
result = pipeline.fit_transform(df)
result.summary()
Schema-validated pipeline::
schema = qd.DataSchema([
qd.FeatureSpec("age", dtype="continuous", min_value=0, max_value=120),
qd.FeatureSpec("score", dtype="continuous", min_value=0.0, max_value=1.0),
])
pipeline = qd.Pipeline(encoder=qd.AngleEncoder(), schema=schema)
Recommendation::
rec = qd.recommend(df, task="classification", qubits=8)
Classes¶
Pipeline(ingester=None, preprocessor=None, cleaner=None, reducer=None, normalizer=None, encoder=None, exporter=None, schema=None, drift_detector=None)
¶
Composable preprocessing pipeline for quantum data preparation.
Each stage is optional and works independently. You can use just the encoder, just the reducer, or any combination without touching the rest.
sklearn-compatible: supports fit(), transform(), get_params(),
and set_params() in addition to the native fit_transform().
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
ingester
|
optional
|
Data ingestion component. Auto-detected from source type if omitted. |
None
|
preprocessor
|
optional
|
Preprocessing step applied after ingestion. Accepts a single transformer
or a list of transformers applied in order (e.g. |
None
|
cleaner
|
optional
|
Data cleaning component (Imputer, OutlierHandler, CategoricalEncoder). |
None
|
reducer
|
optional
|
Dimensionality reduction component (PCA, LDA, etc.). |
None
|
normalizer
|
optional
|
Normalization component. Auto-selected per encoding if omitted. |
None
|
encoder
|
optional
|
Quantum encoding component. Returns a processed Dataset if omitted. |
None
|
exporter
|
optional
|
Framework export component. Returns EncodedResult list if omitted. |
None
|
schema
|
DataSchema
|
Input schema to validate at pipeline entry. Raises SchemaViolationError on mismatch. |
None
|
Examples:
>>> pipeline = Pipeline(
... encoder=AngleEncoder(),
... exporter=QASMExporter(),
... )
>>> result = pipeline.fit_transform(df)
>>> print(result.circuits[0])
Functions¶
fingerprint()
¶
Compute a reproducibility fingerprint for this pipeline.
Returns a :class:~quprep.core.fingerprint.FingerprintResult containing
a deterministic SHA-256 hash of the full pipeline configuration (stage
classes, parameters, and dependency versions). The hash is stable across
runs for the same configuration and suitable for paper methods sections.
Returns:
| Type | Description |
|---|---|
FingerprintResult
|
|
Examples:
fit(source, y=None)
¶
Fit all pipeline stages on training data.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
source
|
str, Path, np.ndarray, pd.DataFrame, or Dataset
|
Training data. |
required |
y
|
ndarray or array - like
|
Target labels. Stored in |
None
|
Returns:
| Type | Description |
|---|---|
Pipeline
|
Returns |
fit_transform(source, y=None)
¶
Fit all stages and transform in a single pass.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
source
|
str, Path, np.ndarray, pd.DataFrame, or Dataset
|
Input data. |
required |
y
|
ndarray or array - like
|
Target labels. Stored in |
None
|
Returns:
| Type | Description |
|---|---|
PipelineResult
|
Contains |
get_params(deep=True)
¶
Return pipeline parameters (sklearn convention).
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
deep
|
bool
|
Ignored — included for sklearn API compatibility. |
True
|
Returns:
| Type | Description |
|---|---|
dict
|
|
load(path)
classmethod
¶
Load a previously saved pipeline from a file.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
path
|
str or Path
|
Path to a file created by :meth: |
required |
Returns:
| Type | Description |
|---|---|
Pipeline
|
|
Raises:
| Type | Description |
|---|---|
TypeError
|
If the file does not contain a Pipeline object. |
save(path)
¶
Persist the pipeline (configuration and fitted state) to a file.
Uses Python's pickle protocol. The saved file can be reloaded
with :meth:Pipeline.load and applied to new data without re-fitting.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
path
|
str or Path
|
Destination file path (e.g. |
required |
set_params(**params)
¶
Set pipeline parameters (sklearn convention).
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
**params
|
object
|
Parameter names and values. |
{}
|
Returns:
| Type | Description |
|---|---|
Pipeline
|
Returns |
Raises:
| Type | Description |
|---|---|
ValueError
|
If an unknown parameter name is given. |
stream(source, chunksize=1000)
¶
Apply a fitted pipeline to a large source in chunks without loading it fully into RAM.
The pipeline must be fitted first (via :meth:fit or
:meth:fit_transform). Normaliser statistics and all other fitted
parameters are reused for every chunk — only transform is called
per chunk, not fit.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
source
|
str, Path, or np.ndarray
|
|
required |
chunksize
|
int
|
Rows per chunk. |
1000
|
Yields:
| Type | Description |
|---|---|
PipelineResult
|
One result per chunk. |
Raises:
| Type | Description |
|---|---|
RuntimeError
|
If the pipeline has not been fitted. |
Examples:
>>> import numpy as np
>>> import quprep as qd
>>> X = np.random.default_rng(0).uniform(0, 1, (1000, 4))
>>> pipeline = qd.Pipeline(encoder=qd.AngleEncoder(), exporter=qd.QASMExporter())
>>> _ = pipeline.fit(X[:100])
>>> for result in pipeline.stream(X, chunksize=200):
... print(len(result.circuits))
summary()
¶
Return a human-readable snapshot of the pipeline configuration.
Shows which stages are configured, whether the pipeline has been fitted, the resolved normalizer, and the last cost estimate (if available).
Returns:
| Type | Description |
|---|---|
str
|
|
transform(source)
¶
Apply fitted pipeline stages to data.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
source
|
str, Path, np.ndarray, pd.DataFrame, or Dataset
|
Input data. |
required |
Returns:
| Type | Description |
|---|---|
PipelineResult
|
|
Raises:
| Type | Description |
|---|---|
RuntimeError
|
If the pipeline has not been fitted yet. |
EncoderMetrics(encoding, expressibility, entanglement_capability, kernel_alignment, n_qubits)
dataclass
¶
Data-driven quality metrics for a parameterized encoding on a dataset.
Attributes:
| Name | Type | Description |
|---|---|---|
encoding |
str
|
Encoder name (e.g. |
expressibility |
float or None
|
KL divergence from the Haar distribution. Lower = more expressive.
|
entanglement_capability |
float or None
|
Average Meyer-Wallach entanglement measure ∈ [0, 1].
Higher = more entangled. 0 for product-state encodings.
|
kernel_alignment |
float or None
|
Normalised Frobenius alignment of the quantum kernel with class labels,
∈ [−1, 1]. Higher = better class separation.
|
n_qubits |
int
|
Qubit count used for simulation. |
BarrenPlateauReport(encoding, n_qubits, circuit_depth, gradient_variance, risk_level, mitigations=list())
dataclass
¶
Barren plateau risk report for a quantum encoding.
Attributes:
| Name | Type | Description |
|---|---|---|
encoding |
str
|
Encoder name (lower-case, without "Encoder" suffix). |
n_qubits |
int
|
Number of qubits determined by cost estimation. |
circuit_depth |
int
|
Estimated circuit depth. |
gradient_variance |
float
|
Analytical upper bound on the gradient variance for the given cost type. Derived from the formula for the specified cost_type — no simulation is performed. |
risk_level |
str
|
One of |
mitigations |
list[str]
|
Suggested mitigation strategies (empty when risk is "none"). |
ImbalanceHandler(strategy='oversample', sampling_strategy='auto', k_neighbors=5, random_state=42)
¶
Balance class distributions before quantum encoding.
Supports four strategies:
"oversample"— random duplication of minority samples (no extra deps)."undersample"— random removal of majority samples (no extra deps)."smote"— Synthetic Minority Over-sampling Technique; interpolates in feature space using k-nearest neighbours (requires scikit-learn, already a core dependency)."adasyn"— Adaptive Density-based Synthetic sampling; focuses synthetic samples on harder-to-learn regions (requiresimbalanced-learn:pip install quprep[imbalance]).
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
strategy
|
('oversample', 'undersample', 'smote', 'adasyn')
|
Resampling strategy. |
"oversample"
|
sampling_strategy
|
float or 'auto'
|
|
'auto'
|
k_neighbors
|
int
|
Number of nearest neighbours for SMOTE and ADASYN. |
5
|
random_state
|
int
|
Seed for reproducibility. |
42
|
Examples:
>>> import numpy as np
>>> import quprep as qd
>>> from quprep.core.dataset import Dataset
>>> rng = np.random.default_rng(0)
>>> X = rng.uniform(0, 1, (110, 4))
>>> y = np.array([0] * 100 + [1] * 10)
>>> ds = Dataset(data=X, labels=y)
>>> handler = qd.ImbalanceHandler(strategy="smote")
>>> ds_bal = handler.fit_transform(ds)
>>> from collections import Counter
>>> print(Counter(ds_bal.labels))
Counter({0: 100, 1: 100})
Functions¶
fit(dataset)
¶
Compute class distribution and target count from dataset.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
dataset
|
Dataset
|
Must have |
required |
fit_transform(dataset)
¶
Fit and transform in one step.
transform(dataset)
¶
Apply the fitted resampling strategy to dataset.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
dataset
|
Dataset
|
|
required |
Returns:
| Type | Description |
|---|---|
Dataset
|
New Dataset with resampled data and labels (shuffled). |
Functions¶
prepare(source, *, encoding='angle', framework='qasm', ingester=None, preprocessor=None, **kwargs)
¶
Convert a dataset to quantum circuits in one call.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
source
|
str, Path, np.ndarray, pd.DataFrame, or Dataset
|
Input data — file path, in-memory array/frame, or a pre-loaded Dataset.
For image directories, text files, or graph data pass a modality
ingester via the |
required |
encoding
|
str
|
Encoding method. One of: 'angle' (default), 'entangled_angle', 'amplitude',
'basis', 'iqp', 'reupload', 'hamiltonian', 'zz_feature_map',
'pauli_feature_map', 'random_fourier', 'tensor_product', 'qaoa_problem'.
Plugin encoders registered via :func: |
'angle'
|
framework
|
str
|
Export target. One of: 'qasm' (default, no deps), 'qiskit', 'pennylane',
'cirq', 'tket', 'braket', 'qsharp', 'iqm'.
Plugin exporters registered via :func: |
'qasm'
|
ingester
|
ingester object
|
A modality ingester instance whose When |
None
|
**kwargs
|
Extra keyword arguments forwarded to the encoder/exporter constructor.
Common options: |
{}
|
Returns:
| Type | Description |
|---|---|
PipelineResult
|
Object with |
recommend(source, *, task='classification', qubits=None, use_metrics=False, **kwargs)
¶
Recommend the best encoding for a dataset and task.
Scores all encodings against the dataset profile (feature count, binary/ continuous fraction, missing rate, sparsity, correlations, sample count) and the target task, then returns the highest-scoring option with ranked alternatives.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
source
|
str, Path, np.ndarray, pd.DataFrame, or Dataset
|
Input data. Accepts anything the pipeline ingester accepts. |
required |
task
|
str
|
Target task: |
'classification'
|
qubits
|
int
|
Maximum qubit budget. Encodings that exceed this are heavily penalised. |
None
|
use_metrics
|
bool
|
When |
False
|
**kwargs
|
Reserved for future use (e.g. |
{}
|
Returns:
| Type | Description |
|---|---|
EncodingRecommendation
|
Top recommendation with alternatives list. |
Raises:
| Type | Description |
|---|---|
ValueError
|
If |
compare_encodings(source, *, include=None, exclude=None, task=None, qubits=None)
¶
Compare all (or selected) encoding methods on source and return side-by-side stats.
No circuits are generated — costs are estimated analytically from the dataset shape, so this is fast even for large datasets.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
source
|
str, numpy.ndarray, pandas.DataFrame, or Dataset
|
Input data — same formats accepted by :class: |
required |
include
|
list[str] or None
|
Encoder names to include. If |
None
|
exclude
|
list[str] or None
|
Encoder names to exclude. Applied after include. |
None
|
task
|
str or None
|
If provided, the recommended encoder for this task is starred in the output
table. Passed to :func: |
None
|
qubits
|
int or None
|
Maximum qubit budget. Encoders requiring more qubits have |
None
|
Returns:
| Type | Description |
|---|---|
ComparisonResult
|
|
Examples:
Compare all encoders on a CSV, highlight the best for classification:
>>> import quprep as qd
>>> result = qd.compare_encodings("data.csv", task="classification", qubits=8)
>>> print(result)
>>> result.best(prefer="nisq")
Compare a subset:
draw_ascii(encoded, width=80)
¶
Return an ASCII circuit diagram for an EncodedResult.
No additional dependencies required.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
encoded
|
EncodedResult
|
|
required |
width
|
int
|
Reserved for future use (target line width hint). Default 80. |
80
|
Returns:
| Type | Description |
|---|---|
str
|
Multi-line ASCII string. Print with |
draw_matplotlib(encoded, filename=None)
¶
Draw a matplotlib circuit diagram.
Requires: pip install quprep[viz]
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
encoded
|
EncodedResult
|
|
required |
filename
|
str or Path
|
Save to file if provided (PNG, PDF, SVG). Returns None. If None, returns the matplotlib Figure object. |
None
|
Returns:
| Type | Description |
|---|---|
Figure or None
|
Figure object if filename is None; None after saving to file. |
entanglement_capability(encoder, dataset, *, n_samples=200, seed=None)
¶
Estimate the entanglement capability of an encoding.
Returns the average Meyer-Wallach measure over randomly sampled data points. Ranges from 0 (product state, e.g. plain angle encoding) to 1 (maximally entangled).
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
encoder
|
encoder instance
|
|
required |
dataset
|
Dataset
|
|
required |
n_samples
|
int
|
Default 200. |
200
|
seed
|
int
|
|
None
|
Returns:
| Type | Description |
|---|---|
float or None
|
Average MW measure ∈ [0, 1]. |
References
Sim et al. (2019) https://doi.org/10.1002/qute.201900070
kernel_alignment(encoder, dataset, *, max_samples=300, seed=None)
¶
Compute the normalised kernel alignment between the quantum kernel and labels.
Measures how well the encoding separates classes by comparing the quantum kernel matrix K (where K[i,j] = |⟨ψ(xᵢ)|ψ(xⱼ)⟩|²) to the ideal label kernel K_y (where K_y[i,j] = yᵢ·yⱼ).
The alignment is:
.. math::
A(K, K_y) = \frac{\langle K, K_y \rangle_F}{\|K\|_F \|K_y\|_F}
Higher values indicate the encoding separates classes better.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
encoder
|
encoder instance
|
A fitted QuPrep encoder. |
required |
dataset
|
Dataset
|
Must have |
required |
max_samples
|
int
|
Subsample the dataset to at most this many points for efficiency. Default 300. |
300
|
seed
|
int
|
|
None
|
Returns:
| Type | Description |
|---|---|
float or None
|
Alignment score ∈ [−1, 1]. |
score_encoding(encoder, dataset, *, n_samples=200, seed=None)
¶
Compute all data-driven quality metrics for one encoder on a dataset.
Encoders that require fitting (e.g. RandomFourierEncoder) are
automatically fitted on the dataset before metric computation.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
encoder
|
encoder instance
|
|
required |
dataset
|
Dataset
|
|
required |
n_samples
|
int
|
Samples used for expressibility and entanglement. Default 200. |
200
|
seed
|
int
|
|
None
|
Returns:
| Type | Description |
|---|---|
EncoderMetrics
|
|
detect_barren_plateau(encoder, dataset, *, cost_type='global')
¶
Analytically estimate barren plateau risk for a quantum encoding.
No circuit simulation is performed. Risk is derived from qubit count using the theoretical gradient variance bounds:
- Global cost (McClean et al. 2018):
Var[∂C/∂θ] ≤ 2^(1−n)— exponential decay with qubit count. - Local cost (Cerezo et al. 2021):
Var[∂C/∂θ] ≈ 1/n²— polynomial decay; strongly preferred for large circuits.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
encoder
|
BaseEncoder
|
A QuPrep encoder. Does not need to be fitted. |
required |
dataset
|
Dataset
|
Used only to determine qubit count and circuit depth via cost estimation. |
required |
cost_type
|
('global', local)
|
Cost function type used during training. |
"global"
|
Returns:
| Type | Description |
|---|---|
BarrenPlateauReport
|
|
Examples:
>>> import numpy as np
>>> import quprep as qd
>>> from quprep.core.dataset import Dataset
>>> ds = Dataset(data=np.random.default_rng(0).uniform(0, 1, (50, 8)))
>>> report = qd.detect_barren_plateau(qd.IQPEncoder(), ds)
>>> print(report.risk_level)
mild
References
McClean J.R. et al. "Barren plateaus in quantum neural network training landscapes." Nature Communications 9, 4812 (2018).
Cerezo M. et al. "Cost function dependent barren plateaus in shallow parametrized quantum circuits." Nature Communications 12, 1791 (2021).