Skip to content

API Reference

QuPrep's public API has four top-level entry points (use import quprep as qd):

Name Description
qd.prepare() One-liner: source → circuits
qd.Pipeline Composable pipeline with full stage control
qd.recommend() Encoding recommendation engine
qd.compare_encodings() Side-by-side cost comparison of all encoders

Everything else is accessed via submodules.


Top-level

import quprep as qd

qd.__version__         # "0.10.0"
qd.prepare(...)
qd.Pipeline(...)
qd.recommend(...)
qd.draw_ascii(...)
qd.draw_matplotlib(...)

# All classes are on the top-level namespace — no sub-imports needed:
qd.AngleEncoder()
qd.Imputer()
qd.PCAReducer()
qd.DataSchema(...)
qd.estimate_cost(...)
qd.compare_encodings(...)
qd.ComparisonResult
qd.fingerprint_pipeline(...)
qd.FingerprintResult

# Ingest
qd.CSVIngester()
qd.NumpyIngester()

# Noise-aware preprocessing (v0.9.0)
qd.NoiseProfile(...)
qd.NoiseAwarePreprocessor(...)

# Encoding quality metrics (v0.9.0)
qd.expressibility(...)
qd.entanglement_capability(...)
qd.kernel_alignment(...)
qd.score_encoding(...)
qd.EncoderMetrics
qd.detect_barren_plateau(...)
qd.BarrenPlateauReport

# Class imbalance (v0.9.0)
qd.ImbalanceHandler()

# New encoders (v0.10.0)
qd.DenseAngleEncoder()
qd.DiscretizedEncoder()

# Quantum-specific preprocessing (v0.10.0)
qd.check_compatibility(...)
qd.CompatibilityReport
qd.verify_encoding(...)
qd.VerificationReport
qd.encoding_sensitivity(...)
qd.SensitivityResult
qd.suggest_pipeline(...)
qd.PipelineSuggestion
qd.preprocessing_report(...)
qd.PreprocessingReport
qd.inspect_encoding(...)
qd.EncodingParams
qd.GateParam

Submodules

Module Contents
quprep.encode.angle AngleEncoder
quprep.encode.amplitude AmplitudeEncoder
quprep.encode.basis BasisEncoder
quprep.encode.entangled_angle EntangledAngleEncoder
quprep.encode.iqp IQPEncoder
quprep.encode.reupload ReUploadEncoder
quprep.encode.hamiltonian HamiltonianEncoder
quprep.encode.zz_feature_map ZZFeatureMapEncoder
quprep.encode.pauli_feature_map PauliFeatureMapEncoder
quprep.encode.random_fourier RandomFourierEncoder
quprep.encode.tensor_product TensorProductEncoder
quprep.encode.qaoa_problem QAOAProblemEncoder
quprep.encode.dense_angle DenseAngleEncoder
quprep.encode.discretized DiscretizedEncoder
quprep.encode.inspector inspect_encoding, EncodingParams, GateParam
quprep.encode.base BaseEncoder, EncodedResult
quprep.export.qasm_export QASMExporter
quprep.export.qiskit_export QiskitExporter
quprep.export.pennylane_export PennyLaneExporter
quprep.export.cirq_export CirqExporter
quprep.export.tket_export TKETExporter
quprep.export.visualize draw_ascii, draw_matplotlib
quprep.normalize.scalers Scaler, auto_normalizer
quprep.clean.imputer Imputer
quprep.clean.outlier OutlierHandler
quprep.clean.categorical CategoricalEncoder
quprep.clean.selector FeatureSelector
quprep.clean.imbalance ImbalanceHandler
quprep.ingest.csv_ingester CSVIngester
quprep.ingest.numpy_ingester NumpyIngester
quprep.ingest.huggingface_ingester HuggingFaceIngester
quprep.ingest.kaggle_ingester KaggleIngester
quprep.ingest.openml_ingester OpenMLIngester
quprep.ingest.profiler profile, DatasetProfile, preprocessing_report, PreprocessingReport
quprep.preprocess.noise_aware NoiseProfile, NoiseAwarePreprocessor
quprep.metrics expressibility, entanglement_capability, kernel_alignment, EncoderMetrics, score_encoding, detect_barren_plateau, BarrenPlateauReport, encoding_sensitivity, SensitivityResult
quprep.preprocess.window WindowTransformer
quprep.core.dataset Dataset
quprep.core.fingerprint fingerprint_pipeline, FingerprintResult
quprep.core.recommender recommend, suggest_pipeline, PipelineSuggestion
quprep.core.qubit_suggestion suggest_qubits, QubitSuggestion
quprep.compare compare_encodings, ComparisonResult
quprep.validation DataSchema, FeatureSpec, SchemaViolationError, validate_dataset, warn_qubit_mismatch, QuPrepWarning
quprep.validation.compatibility check_compatibility, CompatibilityReport, verify_encoding, VerificationReport
quprep.validation.cost CostEstimate, estimate_cost
quprep.qubo to_qubo, QUBOResult, qubo_to_ising, ising_to_qubo, IsingResult
quprep.qubo.problems max_cut, knapsack, tsp, portfolio, graph_color, scheduling, number_partition
quprep.qubo.solver solve_brute, solve_sa, SolveResult (classical reference utilities — not in quprep.qubo.__all__)
quprep.qubo.qaoa qaoa_circuit
quprep.qubo.constraints equality_penalty, inequality_penalty
quprep.qubo.ising qubo_to_ising, ising_to_qubo
quprep.qubo.utils add_qubo
quprep.qubo.visualize draw_qubo, draw_ising

quprep

QuPrep — Quantum Data Preparation.

The missing preprocessing layer between classical datasets and quantum computing frameworks.

Both import styles are supported::

import quprep
import quprep as qd   # "quantum data" — preferred short alias

One-liner::

circuit = qd.prepare("data.csv", encoding="angle", framework="qiskit")

Full pipeline::

pipeline = qd.Pipeline(
    cleaner=qd.Imputer(strategy="knn"),
    reducer=qd.LDAReducer(n_components=4),
    encoder=qd.AngleEncoder(),
    exporter=qd.QiskitExporter(),
)
result = pipeline.fit_transform(df)
result.summary()

Schema-validated pipeline::

schema = qd.DataSchema([
    qd.FeatureSpec("age", dtype="continuous", min_value=0, max_value=120),
    qd.FeatureSpec("score", dtype="continuous", min_value=0.0, max_value=1.0),
])
pipeline = qd.Pipeline(encoder=qd.AngleEncoder(), schema=schema)

Recommendation::

rec = qd.recommend(df, task="classification", qubits=8)

Classes

Pipeline(ingester=None, preprocessor=None, cleaner=None, reducer=None, normalizer=None, encoder=None, exporter=None, schema=None, drift_detector=None)

Composable preprocessing pipeline for quantum data preparation.

Each stage is optional and works independently. You can use just the encoder, just the reducer, or any combination without touching the rest.

sklearn-compatible: supports fit(), transform(), get_params(), and set_params() in addition to the native fit_transform().

Parameters:

Name Type Description Default
ingester optional

Data ingestion component. Auto-detected from source type if omitted.

None
preprocessor optional

Preprocessing step applied after ingestion. Accepts a single transformer or a list of transformers applied in order (e.g. [WindowTransformer(), ...]).

None
cleaner optional

Data cleaning component (Imputer, OutlierHandler, CategoricalEncoder).

None
reducer optional

Dimensionality reduction component (PCA, LDA, etc.).

None
normalizer optional

Normalization component. Auto-selected per encoding if omitted.

None
encoder optional

Quantum encoding component. Returns a processed Dataset if omitted.

None
exporter optional

Framework export component. Returns EncodedResult list if omitted.

None
schema DataSchema

Input schema to validate at pipeline entry. Raises SchemaViolationError on mismatch.

None

Examples:

>>> pipeline = Pipeline(
...     encoder=AngleEncoder(),
...     exporter=QASMExporter(),
... )
>>> result = pipeline.fit_transform(df)
>>> print(result.circuits[0])
Functions
fingerprint()

Compute a reproducibility fingerprint for this pipeline.

Returns a :class:~quprep.core.fingerprint.FingerprintResult containing a deterministic SHA-256 hash of the full pipeline configuration (stage classes, parameters, and dependency versions). The hash is stable across runs for the same configuration and suitable for paper methods sections.

Returns:

Type Description
FingerprintResult

Examples:

>>> fp = pipeline.fingerprint()
>>> print(fp.hash)
>>> fp.save("experiment.json")
fit(source, y=None)

Fit all pipeline stages on training data.

Parameters:

Name Type Description Default
source str, Path, np.ndarray, pd.DataFrame, or Dataset

Training data.

required
y ndarray or array - like

Target labels. Stored in Dataset.labels and passed to FeatureSelector when using the 'mutual_info' method. Ignored if labels are already embedded in the Dataset (e.g. via CSVIngester(target_columns=...)).

None

Returns:

Type Description
Pipeline

Returns self for chaining (sklearn convention).

fit_transform(source, y=None)

Fit all stages and transform in a single pass.

Parameters:

Name Type Description Default
source str, Path, np.ndarray, pd.DataFrame, or Dataset

Input data.

required
y ndarray or array - like

Target labels. Stored in Dataset.labels and passed to FeatureSelector when using the 'mutual_info' method. Ignored if labels are already embedded in the Dataset.

None

Returns:

Type Description
PipelineResult

Contains dataset (processed), encoded (list of EncodedResult or None), and circuits (framework-specific circuit objects or None).

get_params(deep=True)

Return pipeline parameters (sklearn convention).

Parameters:

Name Type Description Default
deep bool

Ignored — included for sklearn API compatibility.

True

Returns:

Type Description
dict
load(path) classmethod

Load a previously saved pipeline from a file.

Parameters:

Name Type Description Default
path str or Path

Path to a file created by :meth:Pipeline.save.

required

Returns:

Type Description
Pipeline

Raises:

Type Description
TypeError

If the file does not contain a Pipeline object.

save(path)

Persist the pipeline (configuration and fitted state) to a file.

Uses Python's pickle protocol. The saved file can be reloaded with :meth:Pipeline.load and applied to new data without re-fitting.

Parameters:

Name Type Description Default
path str or Path

Destination file path (e.g. 'pipeline.pkl'). Parent directories are created automatically.

required
set_params(**params)

Set pipeline parameters (sklearn convention).

Parameters:

Name Type Description Default
**params object

Parameter names and values.

{}

Returns:

Type Description
Pipeline

Returns self.

Raises:

Type Description
ValueError

If an unknown parameter name is given.

stream(source, chunksize=1000)

Apply a fitted pipeline to a large source in chunks without loading it fully into RAM.

The pipeline must be fitted first (via :meth:fit or :meth:fit_transform). Normaliser statistics and all other fitted parameters are reused for every chunk — only transform is called per chunk, not fit.

Parameters:

Name Type Description Default
source str, Path, or np.ndarray
  • A file path is read in CSV chunks via :class:~quprep.ingest.csv_ingester.CSVIngester.
  • A NumPy array is sliced in row chunks via :class:~quprep.ingest.numpy_ingester.NumpyIngester.
required
chunksize int

Rows per chunk.

1000

Yields:

Type Description
PipelineResult

One result per chunk.

Raises:

Type Description
RuntimeError

If the pipeline has not been fitted.

Examples:

>>> import numpy as np
>>> import quprep as qd
>>> X = np.random.default_rng(0).uniform(0, 1, (1000, 4))
>>> pipeline = qd.Pipeline(encoder=qd.AngleEncoder(), exporter=qd.QASMExporter())
>>> _ = pipeline.fit(X[:100])
>>> for result in pipeline.stream(X, chunksize=200):
...     print(len(result.circuits))
summary()

Return a human-readable snapshot of the pipeline configuration.

Shows which stages are configured, whether the pipeline has been fitted, the resolved normalizer, and the last cost estimate (if available).

Returns:

Type Description
str
transform(source)

Apply fitted pipeline stages to data.

Parameters:

Name Type Description Default
source str, Path, np.ndarray, pd.DataFrame, or Dataset

Input data.

required

Returns:

Type Description
PipelineResult

Raises:

Type Description
RuntimeError

If the pipeline has not been fitted yet.

EncoderMetrics(encoding, expressibility, entanglement_capability, kernel_alignment, n_qubits) dataclass

Data-driven quality metrics for a parameterized encoding on a dataset.

Attributes:

Name Type Description
encoding str

Encoder name (e.g. 'iqp').

expressibility float or None

KL divergence from the Haar distribution. Lower = more expressive. None if the circuit is too large to simulate.

entanglement_capability float or None

Average Meyer-Wallach entanglement measure ∈ [0, 1]. Higher = more entangled. 0 for product-state encodings. None if unsupported.

kernel_alignment float or None

Normalised Frobenius alignment of the quantum kernel with class labels, ∈ [−1, 1]. Higher = better class separation. None if labels are not available.

n_qubits int

Qubit count used for simulation.

BarrenPlateauReport(encoding, n_qubits, circuit_depth, gradient_variance, risk_level, mitigations=list()) dataclass

Barren plateau risk report for a quantum encoding.

Attributes:

Name Type Description
encoding str

Encoder name (lower-case, without "Encoder" suffix).

n_qubits int

Number of qubits determined by cost estimation.

circuit_depth int

Estimated circuit depth.

gradient_variance float

Analytical upper bound on the gradient variance for the given cost type. Derived from the formula for the specified cost_type — no simulation is performed.

risk_level str

One of "none", "mild", "high", "severe".

mitigations list[str]

Suggested mitigation strategies (empty when risk is "none").

ImbalanceHandler(strategy='oversample', sampling_strategy='auto', k_neighbors=5, random_state=42)

Balance class distributions before quantum encoding.

Supports four strategies:

  • "oversample" — random duplication of minority samples (no extra deps).
  • "undersample" — random removal of majority samples (no extra deps).
  • "smote" — Synthetic Minority Over-sampling Technique; interpolates in feature space using k-nearest neighbours (requires scikit-learn, already a core dependency).
  • "adasyn" — Adaptive Density-based Synthetic sampling; focuses synthetic samples on harder-to-learn regions (requires imbalanced-learn: pip install quprep[imbalance]).

Parameters:

Name Type Description Default
strategy ('oversample', 'undersample', 'smote', 'adasyn')

Resampling strategy.

"oversample"
sampling_strategy float or 'auto'
  • "auto" balances all classes to the majority class count (oversampling) or the minority class count (undersampling).
  • A float r targets majority_count × r samples per class for oversampling, or minority_count / r for undersampling.
'auto'
k_neighbors int

Number of nearest neighbours for SMOTE and ADASYN.

5
random_state int

Seed for reproducibility.

42

Examples:

>>> import numpy as np
>>> import quprep as qd
>>> from quprep.core.dataset import Dataset
>>> rng = np.random.default_rng(0)
>>> X = rng.uniform(0, 1, (110, 4))
>>> y = np.array([0] * 100 + [1] * 10)
>>> ds = Dataset(data=X, labels=y)
>>> handler = qd.ImbalanceHandler(strategy="smote")
>>> ds_bal = handler.fit_transform(ds)
>>> from collections import Counter
>>> print(Counter(ds_bal.labels))
Counter({0: 100, 1: 100})
Functions
fit(dataset)

Compute class distribution and target count from dataset.

Parameters:

Name Type Description Default
dataset Dataset

Must have labels set (1-D array, single-target only).

required
fit_transform(dataset)

Fit and transform in one step.

transform(dataset)

Apply the fitted resampling strategy to dataset.

Parameters:

Name Type Description Default
dataset Dataset
required

Returns:

Type Description
Dataset

New Dataset with resampled data and labels (shuffled).

Functions

prepare(source, *, encoding='angle', framework='qasm', ingester=None, preprocessor=None, **kwargs)

Convert a dataset to quantum circuits in one call.

Parameters:

Name Type Description Default
source str, Path, np.ndarray, pd.DataFrame, or Dataset

Input data — file path, in-memory array/frame, or a pre-loaded Dataset. For image directories, text files, or graph data pass a modality ingester via the ingester parameter.

required
encoding str

Encoding method. One of: 'angle' (default), 'entangled_angle', 'amplitude', 'basis', 'iqp', 'reupload', 'hamiltonian', 'zz_feature_map', 'pauli_feature_map', 'random_fourier', 'tensor_product', 'qaoa_problem'. Plugin encoders registered via :func:register_encoder are also accepted.

'angle'
framework str

Export target. One of: 'qasm' (default, no deps), 'qiskit', 'pennylane', 'cirq', 'tket', 'braket', 'qsharp', 'iqm'. Plugin exporters registered via :func:register_exporter are also accepted.

'qasm'
ingester ingester object

A modality ingester instance whose load(source) method is called before encoding. Use this for non-tabular data::

qd.prepare("images/", encoding="angle", ingester=qd.ImageIngester())
qd.prepare(texts, encoding="angle", ingester=qd.TextIngester())
qd.prepare(adj, encoding="angle", ingester=qd.GraphIngester(n_features=8))

When None (default) the pipeline auto-detects CSV, NumPy arrays, and DataFrames.

None
**kwargs

Extra keyword arguments forwarded to the encoder/exporter constructor. Common options: rotation ('ry'/'rx'/'rz'), pad (amplitude), threshold (basis), reps, layers, p, connectivity.

{}

Returns:

Type Description
PipelineResult

Object with .circuits, .encoded, .dataset, and .circuit (first sample).

recommend(source, *, task='classification', qubits=None, use_metrics=False, **kwargs)

Recommend the best encoding for a dataset and task.

Scores all encodings against the dataset profile (feature count, binary/ continuous fraction, missing rate, sparsity, correlations, sample count) and the target task, then returns the highest-scoring option with ranked alternatives.

Parameters:

Name Type Description Default
source str, Path, np.ndarray, pd.DataFrame, or Dataset

Input data. Accepts anything the pipeline ingester accepts.

required
task str

Target task: 'classification', 'regression', 'qaoa', 'kernel', or 'simulation'. Default 'classification'.

'classification'
qubits int

Maximum qubit budget. Encodings that exceed this are heavily penalised.

None
use_metrics bool

When True and n_features ≤ 12, augment heuristic scores with data-driven circuit metrics (expressibility, entanglement capability, and — for labelled datasets — kernel alignment). Adds a few seconds of simulation time; disabled by default.

False
**kwargs

Reserved for future use (e.g. backend='ibm_brisbane').

{}

Returns:

Type Description
EncodingRecommendation

Top recommendation with alternatives list.

Raises:

Type Description
ValueError

If task is not one of the supported values.

compare_encodings(source, *, include=None, exclude=None, task=None, qubits=None)

Compare all (or selected) encoding methods on source and return side-by-side stats.

No circuits are generated — costs are estimated analytically from the dataset shape, so this is fast even for large datasets.

Parameters:

Name Type Description Default
source str, numpy.ndarray, pandas.DataFrame, or Dataset

Input data — same formats accepted by :class:~quprep.Pipeline.

required
include list[str] or None

Encoder names to include. If None, all 14 encoders are compared. Valid names: "angle", "amplitude", "basis", "iqp", "reupload", "entangled_angle", "hamiltonian", "qaoa_problem", "zz_feature_map", "pauli_feature_map", "random_fourier", "tensor_product", "dense_angle", "discretized".

None
exclude list[str] or None

Encoder names to exclude. Applied after include.

None
task str or None

If provided, the recommended encoder for this task is starred in the output table. Passed to :func:~quprep.recommend. Valid: "classification", "regression", "qaoa", "kernel", "simulation".

None
qubits int or None

Maximum qubit budget. Encoders requiring more qubits have nisq_safe set to False and a budget warning added to their row.

None

Returns:

Type Description
ComparisonResult

Examples:

Compare all encoders on a CSV, highlight the best for classification:

>>> import quprep as qd
>>> result = qd.compare_encodings("data.csv", task="classification", qubits=8)
>>> print(result)
>>> result.best(prefer="nisq")

Compare a subset:

>>> result = qd.compare_encodings(X, include=["angle", "iqp", "amplitude"])

draw_ascii(encoded, width=80)

Return an ASCII circuit diagram for an EncodedResult.

No additional dependencies required.

Parameters:

Name Type Description Default
encoded EncodedResult
required
width int

Reserved for future use (target line width hint). Default 80.

80

Returns:

Type Description
str

Multi-line ASCII string. Print with print(draw_ascii(encoded)).

draw_matplotlib(encoded, filename=None)

Draw a matplotlib circuit diagram.

Requires: pip install quprep[viz]

Parameters:

Name Type Description Default
encoded EncodedResult
required
filename str or Path

Save to file if provided (PNG, PDF, SVG). Returns None. If None, returns the matplotlib Figure object.

None

Returns:

Type Description
Figure or None

Figure object if filename is None; None after saving to file.

entanglement_capability(encoder, dataset, *, n_samples=200, seed=None)

Estimate the entanglement capability of an encoding.

Returns the average Meyer-Wallach measure over randomly sampled data points. Ranges from 0 (product state, e.g. plain angle encoding) to 1 (maximally entangled).

Parameters:

Name Type Description Default
encoder encoder instance
required
dataset Dataset
required
n_samples int

Default 200.

200
seed int
None

Returns:

Type Description
float or None

Average MW measure ∈ [0, 1]. None if unsupported or too large.

References

Sim et al. (2019) https://doi.org/10.1002/qute.201900070

kernel_alignment(encoder, dataset, *, max_samples=300, seed=None)

Compute the normalised kernel alignment between the quantum kernel and labels.

Measures how well the encoding separates classes by comparing the quantum kernel matrix K (where K[i,j] = |⟨ψ(xᵢ)|ψ(xⱼ)⟩|²) to the ideal label kernel K_y (where K_y[i,j] = yᵢ·yⱼ).

The alignment is:

.. math::

A(K, K_y) = \frac{\langle K, K_y \rangle_F}{\|K\|_F \|K_y\|_F}

Higher values indicate the encoding separates classes better.

Parameters:

Name Type Description Default
encoder encoder instance

A fitted QuPrep encoder.

required
dataset Dataset

Must have dataset.labels populated.

required
max_samples int

Subsample the dataset to at most this many points for efficiency. Default 300.

300
seed int
None

Returns:

Type Description
float or None

Alignment score ∈ [−1, 1]. None if labels are missing, encoding unsupported, or n_qubits > metrics.MAX_QUBITS.

score_encoding(encoder, dataset, *, n_samples=200, seed=None)

Compute all data-driven quality metrics for one encoder on a dataset.

Encoders that require fitting (e.g. RandomFourierEncoder) are automatically fitted on the dataset before metric computation.

Parameters:

Name Type Description Default
encoder encoder instance
required
dataset Dataset
required
n_samples int

Samples used for expressibility and entanglement. Default 200.

200
seed int
None

Returns:

Type Description
EncoderMetrics

detect_barren_plateau(encoder, dataset, *, cost_type='global')

Analytically estimate barren plateau risk for a quantum encoding.

No circuit simulation is performed. Risk is derived from qubit count using the theoretical gradient variance bounds:

  • Global cost (McClean et al. 2018): Var[∂C/∂θ] ≤ 2^(1−n) — exponential decay with qubit count.
  • Local cost (Cerezo et al. 2021): Var[∂C/∂θ] ≈ 1/n² — polynomial decay; strongly preferred for large circuits.

Parameters:

Name Type Description Default
encoder BaseEncoder

A QuPrep encoder. Does not need to be fitted.

required
dataset Dataset

Used only to determine qubit count and circuit depth via cost estimation.

required
cost_type ('global', local)

Cost function type used during training.

"global"

Returns:

Type Description
BarrenPlateauReport

Examples:

>>> import numpy as np
>>> import quprep as qd
>>> from quprep.core.dataset import Dataset
>>> ds = Dataset(data=np.random.default_rng(0).uniform(0, 1, (50, 8)))
>>> report = qd.detect_barren_plateau(qd.IQPEncoder(), ds)
>>> print(report.risk_level)
mild
References

McClean J.R. et al. "Barren plateaus in quantum neural network training landscapes." Nature Communications 9, 4812 (2018).

Cerezo M. et al. "Cost function dependent barren plateaus in shallow parametrized quantum circuits." Nature Communications 12, 1791 (2021).