Pipeline¶

The Pipeline class chains all preprocessing stages. Each stage is optional — use only the stages you need.

Pipeline¶

`quprep.core.pipeline.Pipeline(ingester=None, preprocessor=None, cleaner=None, reducer=None, normalizer=None, encoder=None, exporter=None, schema=None, drift_detector=None)` ¶

Composable preprocessing pipeline for quantum data preparation.

Each stage is optional and works independently. You can use just the encoder, just the reducer, or any combination without touching the rest.

sklearn-compatible: supports fit(), transform(), get_params(), and set_params() in addition to the native fit_transform().

Parameters:

Name	Type	Description	Default
`ingester`	`optional`	Data ingestion component. Auto-detected from source type if omitted.	`None`
`preprocessor`	`optional`	Preprocessing step applied after ingestion. Accepts a single transformer or a list of transformers applied in order (e.g. `[WindowTransformer(), ...]`).	`None`
`cleaner`	`optional`	Data cleaning component (Imputer, OutlierHandler, CategoricalEncoder).	`None`
`reducer`	`optional`	Dimensionality reduction component (PCA, LDA, etc.).	`None`
`normalizer`	`optional`	Normalization component. Auto-selected per encoding if omitted.	`None`
`encoder`	`optional`	Quantum encoding component. Returns a processed Dataset if omitted.	`None`
`exporter`	`optional`	Framework export component. Returns EncodedResult list if omitted.	`None`
`schema`	`DataSchema`	Input schema to validate at pipeline entry. Raises SchemaViolationError on mismatch.	`None`

Examples:

>>> pipeline = Pipeline(
...     encoder=AngleEncoder(),
...     exporter=QASMExporter(),
... )
>>> result = pipeline.fit_transform(df)
>>> print(result.circuits[0])

Source code in quprep/core/pipeline.py

def __init__(
    self,
    ingester=None,
    preprocessor=None,
    cleaner=None,
    reducer=None,
    normalizer=None,
    encoder=None,
    exporter=None,
    schema=None,
    drift_detector=None,
):
    self.ingester = ingester
    self.preprocessor = preprocessor
    self.cleaner = cleaner
    self.reducer = reducer
    self.normalizer = normalizer
    self.encoder = encoder
    self.exporter = exporter
    self.schema = schema
    self.drift_detector = drift_detector
    self._fitted = False
    self._resolved_normalizer = None
    self._last_cost = None
    self._last_audit_log = None
    self._last_drift_report = None

Functions¶

`fingerprint()` ¶

Compute a reproducibility fingerprint for this pipeline.

Returns a :class:~quprep.core.fingerprint.FingerprintResult containing a deterministic SHA-256 hash of the full pipeline configuration (stage classes, parameters, and dependency versions). The hash is stable across runs for the same configuration and suitable for paper methods sections.

Returns:

Type	Description
`FingerprintResult`

Examples:

>>> fp = pipeline.fingerprint()
>>> print(fp.hash)
>>> fp.save("experiment.json")

Source code in quprep/core/pipeline.py

def fingerprint(self):
    """
    Compute a reproducibility fingerprint for this pipeline.

    Returns a :class:`~quprep.core.fingerprint.FingerprintResult` containing
    a deterministic SHA-256 hash of the full pipeline configuration (stage
    classes, parameters, and dependency versions). The hash is stable across
    runs for the same configuration and suitable for paper methods sections.

    Returns
    -------
    FingerprintResult

    Examples
    --------
    >>> fp = pipeline.fingerprint()
    >>> print(fp.hash)
    >>> fp.save("experiment.json")
    """
    from quprep.core.fingerprint import fingerprint_pipeline
    return fingerprint_pipeline(self)

`fit(source, y=None)` ¶

Fit all pipeline stages on training data.

Parameters:

Name	Type	Description	Default
`source`	`str, Path, np.ndarray, pd.DataFrame, or Dataset`	Training data.	required
`y`	`ndarray or array - like`	Target labels. Stored in `Dataset.labels` and passed to `FeatureSelector` when using the `'mutual_info'` method. Ignored if labels are already embedded in the Dataset (e.g. via `CSVIngester(target_columns=...)`).	`None`

Returns:

Type	Description
`Pipeline`	Returns `self` for chaining (sklearn convention).

Source code in quprep/core/pipeline.py

def fit(self, source, y=None) -> Pipeline:
    """
    Fit all pipeline stages on training data.

    Parameters
    ----------
    source : str, Path, np.ndarray, pd.DataFrame, or Dataset
        Training data.
    y : np.ndarray or array-like, optional
        Target labels. Stored in ``Dataset.labels`` and passed to
        ``FeatureSelector`` when using the ``'mutual_info'`` method.
        Ignored if labels are already embedded in the Dataset (e.g. via
        ``CSVIngester(target_columns=...)``).

    Returns
    -------
    Pipeline
        Returns ``self`` for chaining (sklearn convention).
    """
    import numpy as np
    dataset = self._ingest(source)
    if y is not None and dataset.labels is None:
        dataset.labels = np.asarray(y)
    self._validate_entry(dataset)
    self._fit_stages(dataset)
    self._fitted = True
    return self

`fit_transform(source, y=None)` ¶

Fit all stages and transform in a single pass.

Parameters:

Name	Type	Description	Default
`source`	`str, Path, np.ndarray, pd.DataFrame, or Dataset`	Input data.	required
`y`	`ndarray or array - like`	Target labels. Stored in `Dataset.labels` and passed to `FeatureSelector` when using the `'mutual_info'` method. Ignored if labels are already embedded in the Dataset.	`None`

Returns:

Type	Description
`PipelineResult`	Contains `dataset` (processed), `encoded` (list of EncodedResult or None), and `circuits` (framework-specific circuit objects or None).

Source code in quprep/core/pipeline.py

def fit_transform(self, source, y=None) -> PipelineResult:
    """
    Fit all stages and transform in a single pass.

    Parameters
    ----------
    source : str, Path, np.ndarray, pd.DataFrame, or Dataset
        Input data.
    y : np.ndarray or array-like, optional
        Target labels. Stored in ``Dataset.labels`` and passed to
        ``FeatureSelector`` when using the ``'mutual_info'`` method.
        Ignored if labels are already embedded in the Dataset.

    Returns
    -------
    PipelineResult
        Contains ``dataset`` (processed), ``encoded`` (list of EncodedResult
        or None), and ``circuits`` (framework-specific circuit objects or None).
    """
    import numpy as np
    dataset = self._ingest(source)
    if y is not None and dataset.labels is None:
        dataset.labels = np.asarray(y)
    self._validate_entry(dataset)
    dataset = self._fit_stages(dataset)
    self._fitted = True
    return self._encode_export(dataset)

`get_params(deep=True)` ¶

Return pipeline parameters (sklearn convention).

Parameters:

Name	Type	Description	Default
`deep`	`bool`	Ignored — included for sklearn API compatibility.	`True`

Returns:

Type	Description
`dict`

Source code in quprep/core/pipeline.py

def get_params(self, deep: bool = True) -> dict:
    """
    Return pipeline parameters (sklearn convention).

    Parameters
    ----------
    deep : bool
        Ignored — included for sklearn API compatibility.

    Returns
    -------
    dict
    """
    return {
        "ingester": self.ingester,
        "preprocessor": self.preprocessor,
        "cleaner": self.cleaner,
        "reducer": self.reducer,
        "normalizer": self.normalizer,
        "encoder": self.encoder,
        "exporter": self.exporter,
        "schema": self.schema,
        "drift_detector": self.drift_detector,
    }

`load(path)` `classmethod` ¶

Load a previously saved pipeline from a file.

Parameters:

Name	Type	Description	Default
`path`	`str or Path`	Path to a file created by :meth:`Pipeline.save`.	required

Returns:

Type	Description
`Pipeline`

Raises:

Type	Description
`TypeError`	If the file does not contain a Pipeline object.

Source code in quprep/core/pipeline.py

@classmethod
def load(cls, path: str | Path) -> Pipeline:
    """
    Load a previously saved pipeline from a file.

    Parameters
    ----------
    path : str or Path
        Path to a file created by :meth:`Pipeline.save`.

    Returns
    -------
    Pipeline

    Raises
    ------
    TypeError
        If the file does not contain a Pipeline object.
    """
    import pickle

    with open(Path(path), "rb") as f:
        obj = pickle.load(f)  # noqa: S301
    if not isinstance(obj, cls):
        raise TypeError(
            f"Expected a Pipeline object, got {type(obj).__name__}."
        )
    return obj

`save(path)` ¶

Persist the pipeline (configuration and fitted state) to a file.

Uses Python's pickle protocol. The saved file can be reloaded with :meth:Pipeline.load and applied to new data without re-fitting.

Parameters:

Name	Type	Description	Default
`path`	`str or Path`	Destination file path (e.g. `'pipeline.pkl'`). Parent directories are created automatically.	required

Source code in quprep/core/pipeline.py

def save(self, path: str | Path) -> None:
    """
    Persist the pipeline (configuration and fitted state) to a file.

    Uses Python's ``pickle`` protocol. The saved file can be reloaded
    with :meth:`Pipeline.load` and applied to new data without re-fitting.

    Parameters
    ----------
    path : str or Path
        Destination file path (e.g. ``'pipeline.pkl'``). Parent
        directories are created automatically.
    """
    import pickle

    path = Path(path)
    path.parent.mkdir(parents=True, exist_ok=True)
    with open(path, "wb") as f:
        pickle.dump(self, f, protocol=pickle.HIGHEST_PROTOCOL)

`set_params(**params)` ¶

Set pipeline parameters (sklearn convention).

Parameters:

Name	Type	Description	Default
`**params`	`object`	Parameter names and values.	`{}`

Returns:

Type	Description
`Pipeline`	Returns `self`.

Raises:

Type	Description
`ValueError`	If an unknown parameter name is given.

Source code in quprep/core/pipeline.py

def set_params(self, **params) -> Pipeline:
    """
    Set pipeline parameters (sklearn convention).

    Parameters
    ----------
    **params
        Parameter names and values.

    Returns
    -------
    Pipeline
        Returns ``self``.

    Raises
    ------
    ValueError
        If an unknown parameter name is given.
    """
    valid = set(self.get_params())
    for key, value in params.items():
        if key not in valid:
            raise ValueError(
                f"Invalid parameter '{key}'. Valid parameters: {sorted(valid)}"
            )
        setattr(self, key, value)
    return self

`stream(source, chunksize=1000)` ¶

Apply a fitted pipeline to a large source in chunks without loading it fully into RAM.

The pipeline must be fitted first (via :meth:fit or :meth:fit_transform). Normaliser statistics and all other fitted parameters are reused for every chunk — only transform is called per chunk, not fit.

Parameters:

Name	Type	Description	Default
`source`	`str, Path, or np.ndarray`	A file path is read in CSV chunks via :class:`~quprep.ingest.csv_ingester.CSVIngester`. A NumPy array is sliced in row chunks via :class:`~quprep.ingest.numpy_ingester.NumpyIngester`.	required
`chunksize`	`int`	Rows per chunk.	`1000`

Yields:

Type	Description
`PipelineResult`	One result per chunk.

Raises:

Type	Description
`RuntimeError`	If the pipeline has not been fitted.

Examples:

>>> import numpy as np
>>> import quprep as qd
>>> X = np.random.default_rng(0).uniform(0, 1, (1000, 4))
>>> pipeline = qd.Pipeline(encoder=qd.AngleEncoder(), exporter=qd.QASMExporter())
>>> _ = pipeline.fit(X[:100])
>>> for result in pipeline.stream(X, chunksize=200):
...     print(len(result.circuits))

Source code in quprep/core/pipeline.py

def stream(self, source, chunksize: int = 1000):
    """
    Apply a fitted pipeline to a large source in chunks without loading
    it fully into RAM.

    The pipeline **must be fitted first** (via :meth:`fit` or
    :meth:`fit_transform`).  Normaliser statistics and all other fitted
    parameters are reused for every chunk — only ``transform`` is called
    per chunk, not ``fit``.

    Parameters
    ----------
    source : str, Path, or np.ndarray
        - A file path is read in CSV chunks via
          :class:`~quprep.ingest.csv_ingester.CSVIngester`.
        - A NumPy array is sliced in row chunks via
          :class:`~quprep.ingest.numpy_ingester.NumpyIngester`.
    chunksize : int
        Rows per chunk.

    Yields
    ------
    PipelineResult
        One result per chunk.

    Raises
    ------
    RuntimeError
        If the pipeline has not been fitted.

    Examples
    --------
    >>> import numpy as np
    >>> import quprep as qd
    >>> X = np.random.default_rng(0).uniform(0, 1, (1000, 4))
    >>> pipeline = qd.Pipeline(encoder=qd.AngleEncoder(), exporter=qd.QASMExporter())
    >>> _ = pipeline.fit(X[:100])
    >>> for result in pipeline.stream(X, chunksize=200):
    ...     print(len(result.circuits))
    """
    if not self._fitted:
        raise RuntimeError(
            "Pipeline has not been fitted. Call fit() or fit_transform() first."
        )
    from pathlib import Path

    import numpy as np

    if isinstance(source, (str, Path)):
        from quprep.ingest.csv_ingester import CSVIngester
        chunk_gen = CSVIngester().stream(source, chunksize=chunksize)
    elif isinstance(source, np.ndarray):
        from quprep.ingest.numpy_ingester import NumpyIngester
        chunk_gen = NumpyIngester().stream(source, chunksize=chunksize)
    else:
        raise TypeError(
            f"source must be a file path or np.ndarray, got {type(source).__name__}"
        )

    for chunk in chunk_gen:
        yield self._apply_stages(chunk)

`summary()` ¶

Return a human-readable snapshot of the pipeline configuration.

Shows which stages are configured, whether the pipeline has been fitted, the resolved normalizer, and the last cost estimate (if available).

Returns:

Type	Description
`str`

Source code in quprep/core/pipeline.py

def summary(self) -> str:
    """
    Return a human-readable snapshot of the pipeline configuration.

    Shows which stages are configured, whether the pipeline has been
    fitted, the resolved normalizer, and the last cost estimate (if
    available).

    Returns
    -------
    str
    """
    lines = ["Pipeline"]
    lines.append(f"  fitted       : {'yes' if self._fitted else 'no'}")

    stage_names = [
        ("ingester",     self.ingester),
        ("preprocessor", self.preprocessor),
        ("cleaner",      self.cleaner),
        ("reducer",      self.reducer),
        ("normalizer",   self._resolved_normalizer or self.normalizer),
        ("encoder",      self.encoder),
        ("exporter",     self.exporter),
    ]
    for name, stage in stage_names:
        if stage is not None:
            lines.append(f"  {name:<12} : {type(stage).__name__}")

    if self.schema is not None:
        lines.append(f"  schema       : {len(self.schema.features)} feature(s)")

    if self._last_cost is not None:
        c = self._last_cost
        lines.append(
            f"  cost         : {c.encoding} | "
            f"{c.n_qubits} qubits | "
            f"depth {c.circuit_depth} | "
            f"gates {c.gate_count} | "
            f"NISQ-safe {'yes' if c.nisq_safe else 'NO'}"
        )

    return "\n".join(lines)

`transform(source)` ¶

Apply fitted pipeline stages to data.

Parameters:

Name	Type	Description	Default
`source`	`str, Path, np.ndarray, pd.DataFrame, or Dataset`	Input data.	required

Returns:

Type	Description
`PipelineResult`

Raises:

Type	Description
`RuntimeError`	If the pipeline has not been fitted yet.

Source code in quprep/core/pipeline.py

def transform(self, source) -> PipelineResult:
    """
    Apply fitted pipeline stages to data.

    Parameters
    ----------
    source : str, Path, np.ndarray, pd.DataFrame, or Dataset
        Input data.

    Returns
    -------
    PipelineResult

    Raises
    ------
    RuntimeError
        If the pipeline has not been fitted yet.
    """
    if not self._fitted:
        raise RuntimeError(
            "Pipeline has not been fitted. Call fit() or fit_transform() first."
        )
    dataset = self._ingest(source)
    return self._apply_stages(dataset)

PipelineResult¶

`quprep.core.pipeline.PipelineResult(dataset, encoded, circuits, cost=None, audit_log=None, drift_report=None, stages=None)` ¶

Output of Pipeline.fit_transform().

Attributes:

Name	Type	Description
`dataset`	`Dataset`	The processed Dataset after all pipeline stages (post-normalization).
`encoded`	`list[EncodedResult] or None`	One EncodedResult per sample. None if no encoder was configured.
`circuits`	`list or None`	Exported circuit objects (framework-specific). None if no exporter was configured.
`cost`	`CostEstimate or None`	Gate-count and NISQ-safety estimate for the chosen encoder. None if no encoder was configured.
`audit_log`	`list[dict] or None`	One entry per preprocessing stage that ran, in order. Each dict has keys: `stage`, `n_samples_in`, `n_features_in`, `n_samples_out`, `n_features_out`. None if no preprocessing stages ran.
`stages`	`dict[str, Dataset]`	Intermediate datasets keyed by stage: `'input'`, `'after_cleaner'`, `'after_reducer'`, `'after_normalizer'`. Only stages that ran are included. Useful for debugging individual pipeline steps.

Attributes¶

`circuit` `property` ¶

First item in the batch — convenience for single-sample use.

Returns the first exported circuit if an exporter was configured, otherwise the first EncodedResult if only an encoder was configured, otherwise None.

Functions¶

`summary()` ¶

Return a human-readable report of the pipeline result.

Includes the audit log as a formatted table (if any preprocessing stages ran) and the cost estimate breakdown (if an encoder was used).

Returns:

Type	Description
`str`

FingerprintResult¶

`quprep.core.fingerprint.FingerprintResult(config, hash_hex)` ¶

Output of :func:fingerprint_pipeline.

Attributes:

Name	Type	Description
`config`	`dict`	Full pipeline configuration (stages + dependency versions). This is the dict that was hashed — no timestamp, fully deterministic.
`hash`	`str`	SHA-256 hex digest of the canonical JSON serialisation of `config`.

Functions¶

`save(path, format='json')` ¶

Write the fingerprint to a file.

Parameters:

Name	Type	Description	Default
`path`	`str`	Destination file path.	required
`format`	`('json', 'yaml')`	Output format.	`"json"`

`to_dict()` ¶

Return the config augmented with the hash and a UTC timestamp.

`to_json(indent=2)` ¶

Return a JSON string (hash + timestamp + config).

`to_yaml()` ¶

Return a YAML string (requires pyyaml).

fingerprint_pipeline¶

`quprep.core.fingerprint.fingerprint_pipeline(pipeline)` ¶

Compute a reproducibility fingerprint for pipeline.

The fingerprint captures the class name and constructor parameters of every configured stage (ingester, preprocessor, cleaner, reducer, normalizer, encoder, exporter, schema, drift_detector) plus the installed versions of key dependencies. The resulting SHA-256 hash is deterministic: the same configuration always produces the same hash regardless of when or where the pipeline runs.

Parameters:

Name	Type	Description	Default
`pipeline`	`Pipeline`	A `Pipeline` instance (fitted or unfitted).	required

Returns:

Type	Description
`FingerprintResult`	Contains `config` (serialisable dict) and `hash` (SHA-256 hex string).

Examples:

>>> import quprep as qd
>>> pipeline = qd.Pipeline(encoder=qd.AngleEncoder(), exporter=qd.QASMExporter())
>>> fp = qd.fingerprint_pipeline(pipeline)
>>> print(fp.hash)
>>> fp.save("experiment.json")

Examples¶

Minimal — encode only¶

import quprep as qd

pipeline = qd.Pipeline(encoder=qd.AngleEncoder())
result = pipeline.fit_transform(data)

result.encoded       # list[EncodedResult]
result.encoded[0].parameters   # rotation angles for first sample
result.encoded[0].metadata     # {"n_qubits": 4, "depth": 1, ...}

Full — clean + encode + export¶

import quprep as qd

pipeline = qd.Pipeline(
    cleaner=qd.Imputer(strategy="knn"),
    encoder=qd.AngleEncoder(rotation="ry"),
    exporter=qd.QASMExporter(),
)
result = pipeline.fit_transform("data.csv")
result.circuits[0]   # QASM string for first sample

With schema validation¶

import quprep as qd

schema = qd.DataSchema([
    qd.FeatureSpec("age",    dtype="continuous", min_value=0, max_value=120),
    qd.FeatureSpec("income", dtype="continuous", min_value=0),
])
pipeline = qd.Pipeline(encoder=qd.AngleEncoder(), schema=schema)
result = pipeline.fit_transform("data.csv")

print(result.cost.nisq_safe)   # True / False
result.summary()               # audit table + cost breakdown

sklearn-style fit / transform split¶

import quprep as qd

pipeline = qd.Pipeline(
    reducer=qd.PCAReducer(n_components=4),
    encoder=qd.AngleEncoder(),
)
pipeline.fit(X_train)
r_train = pipeline.transform(X_train)
r_test  = pipeline.transform(X_test)

Explicit normalizer¶

import quprep as qd

pipeline = qd.Pipeline(
    encoder=qd.AngleEncoder(),
    normalizer=qd.Scaler("zscore"),  # override auto-selection
)

Saving and loading a fitted pipeline¶

import quprep as qd

pipeline = qd.Pipeline(
    reducer=qd.PCAReducer(n_components=4),
    encoder=qd.AngleEncoder(),
)
pipeline.fit(X_train)
pipeline.save("pipeline.pkl")

# Later — in a different process or deployment
loaded = qd.Pipeline.load("pipeline.pkl")
result = loaded.transform(X_new)

The parent directory is created automatically. All fitted state (reducer, normalizer, encoder) is preserved.

With drift detection¶

import quprep as qd

det = qd.DriftDetector(mean_threshold=3.0, std_threshold=2.0)

pipeline = qd.Pipeline(
    encoder=qd.AngleEncoder(),
    drift_detector=det,
)
pipeline.fit(X_train)
result = pipeline.transform(X_test)

print(result.drift_report.overall_drift)      # True / False
print(result.drift_report.drifted_features)   # list of feature names

Drift is checked automatically on every transform() call. A QuPrepWarning is issued when drift is detected. The drift detector state is preserved through save()/load().

Time series pipeline (v0.7.0)¶

import quprep as qd

pipeline = qd.Pipeline(
    ingester=qd.TimeSeriesIngester(time_column="date"),
    preprocessor=qd.WindowTransformer(window_size=8, step=1),
    encoder=qd.AngleEncoder(),
)
result = pipeline.fit_transform("sensor_data.csv")

print(len(result.encoded))                        # n_windows
print(result.encoded[0].metadata["n_qubits"])     # window_size × n_features

The preprocessor stage runs after ingestion and before cleaning/reduction. It is designed for shape-changing transforms like WindowTransformer.

Sparse data (v0.7.0)¶

import scipy.sparse as sp
import quprep as qd

sparse_matrix = sp.csr_matrix(X)
result = qd.Pipeline(encoder=qd.AngleEncoder()).fit_transform(sparse_matrix)

scipy.sparse matrices are accepted anywhere a NumPy array is expected. They are converted to dense at ingestion.

Labels and multi-label (v0.7.0)¶

import quprep as qd

# Attach labels at fit_transform time
result = qd.Pipeline(encoder=qd.AngleEncoder()).fit_transform(X, y=y)
print(result.dataset.labels)   # preserved through all stages

# Or embed labels in the Dataset via CSVIngester
from quprep.ingest.csv_ingester import CSVIngester

pipeline = qd.Pipeline(
    ingester=CSVIngester(target_columns="label"),
    encoder=qd.AngleEncoder(),
)
result = pipeline.fit_transform("data.csv")
print(result.dataset.labels.shape)   # (n_samples,)

For FeatureSelector(method="mutual_info"), labels in dataset.labels are used automatically — no separate labels= argument needed.

Inspecting intermediate stages (v0.10.0)¶

PipelineResult.stages gives access to the Dataset after each pipeline step:

result = qd.Pipeline(
    cleaner=qd.OutlierHandler(),
    reducer=qd.PCAReducer(n_components=4),
    encoder=qd.AngleEncoder(),
).fit_transform(df)

print(result.stages["input"].data.shape)           # raw input
print(result.stages["after_cleaner"].data.shape)   # post outlier removal
print(result.stages["after_reducer"].data.shape)   # post PCA
print(result.stages["after_normalizer"].data.shape) # pre-encoding

API consistency additions (v0.10.0)¶

Feature names after selection:

selector = qd.FeatureSelector(method="variance", threshold=0.01)
selector.fit(dataset)
print(selector.get_feature_names_out())  # ['age', 'income', ...]

Outlier mask:

handler = qd.OutlierHandler(method="iqr", action="remove")
handler.fit_transform(dataset)
print(handler.outlier_mask_)  # bool array, True = outlier row

Reverse normalisation:

scaler = qd.Scaler("zscore")
scaled = scaler.fit_transform(dataset)
original = scaler.inverse_transform(scaled)  # back to original scale
# Supported: minmax, minmax_pi, minmax_2pi, minmax_pm_pi, zscore
# Not supported: l2, binary, pm_one (not invertible)

Categorical cardinality control:

# Warn when a column has > 20 unique categories
# Group categories appearing fewer than 5 times as "_other"
encoder = qd.CategoricalEncoder(
    strategy="onehot",
    cardinality_threshold=20,
    min_frequency=5,
)
encoder.fit_transform(dataset)

Explained variance (LDA):

pipeline = qd.Pipeline(reducer=qd.LDAReducer(n_components=3, labels=y))
pipeline.fit(dataset)
print(pipeline.reducer.explained_variance_ratio_)  # also available on PCAReducer

Reproducibility fingerprinting (v0.8.0)¶

import quprep as qd

pipeline = qd.Pipeline(
    cleaner=qd.Imputer(strategy="knn"),
    reducer=qd.PCAReducer(n_components=4),
    encoder=qd.AngleEncoder(rotation="ry"),
    exporter=qd.QASMExporter(),
)

fp = pipeline.fingerprint()

print(fp.hash)       # sha256 hex — stable across runs for the same config
fp.save("experiment.json")          # JSON (default)
fp.save("experiment.yaml", format="yaml")   # YAML (requires pyyaml)
print(fp.to_json())  # full JSON string including hash and UTC timestamp

The hash captures every stage class, all constructor parameters, and installed dependency versions. It is deterministic — the same configuration always produces the same hash regardless of when or where it runs. Include it in paper methods sections to make experiments exactly reproducible.

Pipeline¶

Pipeline¶

quprep.core.pipeline.Pipeline(ingester=None, preprocessor=None, cleaner=None, reducer=None, normalizer=None, encoder=None, exporter=None, schema=None, drift_detector=None) ¶

Functions¶

fingerprint() ¶

fit(source, y=None) ¶

fit_transform(source, y=None) ¶

get_params(deep=True) ¶

load(path) classmethod ¶

save(path) ¶

set_params(**params) ¶

stream(source, chunksize=1000) ¶

summary() ¶

transform(source) ¶

PipelineResult¶

quprep.core.pipeline.PipelineResult(dataset, encoded, circuits, cost=None, audit_log=None, drift_report=None, stages=None) ¶

Attributes¶

circuit property ¶

Functions¶

summary() ¶

FingerprintResult¶

quprep.core.fingerprint.FingerprintResult(config, hash_hex) ¶

Functions¶

save(path, format='json') ¶

to_dict() ¶

to_json(indent=2) ¶

to_yaml() ¶

fingerprint_pipeline¶

quprep.core.fingerprint.fingerprint_pipeline(pipeline) ¶

Examples¶

Minimal — encode only¶

Full — clean + encode + export¶

With schema validation¶

sklearn-style fit / transform split¶

Explicit normalizer¶

Saving and loading a fitted pipeline¶

With drift detection¶

Time series pipeline (v0.7.0)¶

Sparse data (v0.7.0)¶

Labels and multi-label (v0.7.0)¶

Inspecting intermediate stages (v0.10.0)¶

API consistency additions (v0.10.0)¶

Reproducibility fingerprinting (v0.8.0)¶

`quprep.core.pipeline.Pipeline(ingester=None, preprocessor=None, cleaner=None, reducer=None, normalizer=None, encoder=None, exporter=None, schema=None, drift_detector=None)` ¶

`fingerprint()` ¶

`fit(source, y=None)` ¶

`fit_transform(source, y=None)` ¶

`get_params(deep=True)` ¶

`load(path)` `classmethod` ¶

`save(path)` ¶

`set_params(**params)` ¶

`stream(source, chunksize=1000)` ¶

`summary()` ¶

`transform(source)` ¶

`quprep.core.pipeline.PipelineResult(dataset, encoded, circuits, cost=None, audit_log=None, drift_report=None, stages=None)` ¶

`circuit` `property` ¶

`summary()` ¶

`quprep.core.fingerprint.FingerprintResult(config, hash_hex)` ¶

`save(path, format='json')` ¶

`to_dict()` ¶

`to_json(indent=2)` ¶

`to_yaml()` ¶

`quprep.core.fingerprint.fingerprint_pipeline(pipeline)` ¶