Skip to content

Pipeline

The Pipeline class chains all preprocessing stages. Each stage is optional — use only the stages you need.


Pipeline

quprep.core.pipeline.Pipeline(ingester=None, preprocessor=None, cleaner=None, reducer=None, normalizer=None, encoder=None, exporter=None, schema=None, drift_detector=None)

Composable preprocessing pipeline for quantum data preparation.

Each stage is optional and works independently. You can use just the encoder, just the reducer, or any combination without touching the rest.

sklearn-compatible: supports fit(), transform(), get_params(), and set_params() in addition to the native fit_transform().

Parameters:

Name Type Description Default
ingester optional

Data ingestion component. Auto-detected from source type if omitted.

None
preprocessor optional

Preprocessing step applied after ingestion. Accepts a single transformer or a list of transformers applied in order (e.g. [WindowTransformer(), ...]).

None
cleaner optional

Data cleaning component (Imputer, OutlierHandler, CategoricalEncoder).

None
reducer optional

Dimensionality reduction component (PCA, LDA, etc.).

None
normalizer optional

Normalization component. Auto-selected per encoding if omitted.

None
encoder optional

Quantum encoding component. Returns a processed Dataset if omitted.

None
exporter optional

Framework export component. Returns EncodedResult list if omitted.

None
schema DataSchema

Input schema to validate at pipeline entry. Raises SchemaViolationError on mismatch.

None

Examples:

>>> pipeline = Pipeline(
...     encoder=AngleEncoder(),
...     exporter=QASMExporter(),
... )
>>> result = pipeline.fit_transform(df)
>>> print(result.circuits[0])
Source code in quprep/core/pipeline.py
def __init__(
    self,
    ingester=None,
    preprocessor=None,
    cleaner=None,
    reducer=None,
    normalizer=None,
    encoder=None,
    exporter=None,
    schema=None,
    drift_detector=None,
):
    self.ingester = ingester
    self.preprocessor = preprocessor
    self.cleaner = cleaner
    self.reducer = reducer
    self.normalizer = normalizer
    self.encoder = encoder
    self.exporter = exporter
    self.schema = schema
    self.drift_detector = drift_detector
    self._fitted = False
    self._resolved_normalizer = None
    self._last_cost = None
    self._last_audit_log = None
    self._last_drift_report = None

Functions

fingerprint()

Compute a reproducibility fingerprint for this pipeline.

Returns a :class:~quprep.core.fingerprint.FingerprintResult containing a deterministic SHA-256 hash of the full pipeline configuration (stage classes, parameters, and dependency versions). The hash is stable across runs for the same configuration and suitable for paper methods sections.

Returns:

Type Description
FingerprintResult

Examples:

>>> fp = pipeline.fingerprint()
>>> print(fp.hash)
>>> fp.save("experiment.json")
Source code in quprep/core/pipeline.py
def fingerprint(self):
    """
    Compute a reproducibility fingerprint for this pipeline.

    Returns a :class:`~quprep.core.fingerprint.FingerprintResult` containing
    a deterministic SHA-256 hash of the full pipeline configuration (stage
    classes, parameters, and dependency versions). The hash is stable across
    runs for the same configuration and suitable for paper methods sections.

    Returns
    -------
    FingerprintResult

    Examples
    --------
    >>> fp = pipeline.fingerprint()
    >>> print(fp.hash)
    >>> fp.save("experiment.json")
    """
    from quprep.core.fingerprint import fingerprint_pipeline
    return fingerprint_pipeline(self)

fit(source, y=None)

Fit all pipeline stages on training data.

Parameters:

Name Type Description Default
source str, Path, np.ndarray, pd.DataFrame, or Dataset

Training data.

required
y ndarray or array - like

Target labels. Stored in Dataset.labels and passed to FeatureSelector when using the 'mutual_info' method. Ignored if labels are already embedded in the Dataset (e.g. via CSVIngester(target_columns=...)).

None

Returns:

Type Description
Pipeline

Returns self for chaining (sklearn convention).

Source code in quprep/core/pipeline.py
def fit(self, source, y=None) -> Pipeline:
    """
    Fit all pipeline stages on training data.

    Parameters
    ----------
    source : str, Path, np.ndarray, pd.DataFrame, or Dataset
        Training data.
    y : np.ndarray or array-like, optional
        Target labels. Stored in ``Dataset.labels`` and passed to
        ``FeatureSelector`` when using the ``'mutual_info'`` method.
        Ignored if labels are already embedded in the Dataset (e.g. via
        ``CSVIngester(target_columns=...)``).

    Returns
    -------
    Pipeline
        Returns ``self`` for chaining (sklearn convention).
    """
    import numpy as np
    dataset = self._ingest(source)
    if y is not None and dataset.labels is None:
        dataset.labels = np.asarray(y)
    self._validate_entry(dataset)
    self._fit_stages(dataset)
    self._fitted = True
    return self

fit_transform(source, y=None)

Fit all stages and transform in a single pass.

Parameters:

Name Type Description Default
source str, Path, np.ndarray, pd.DataFrame, or Dataset

Input data.

required
y ndarray or array - like

Target labels. Stored in Dataset.labels and passed to FeatureSelector when using the 'mutual_info' method. Ignored if labels are already embedded in the Dataset.

None

Returns:

Type Description
PipelineResult

Contains dataset (processed), encoded (list of EncodedResult or None), and circuits (framework-specific circuit objects or None).

Source code in quprep/core/pipeline.py
def fit_transform(self, source, y=None) -> PipelineResult:
    """
    Fit all stages and transform in a single pass.

    Parameters
    ----------
    source : str, Path, np.ndarray, pd.DataFrame, or Dataset
        Input data.
    y : np.ndarray or array-like, optional
        Target labels. Stored in ``Dataset.labels`` and passed to
        ``FeatureSelector`` when using the ``'mutual_info'`` method.
        Ignored if labels are already embedded in the Dataset.

    Returns
    -------
    PipelineResult
        Contains ``dataset`` (processed), ``encoded`` (list of EncodedResult
        or None), and ``circuits`` (framework-specific circuit objects or None).
    """
    import numpy as np
    dataset = self._ingest(source)
    if y is not None and dataset.labels is None:
        dataset.labels = np.asarray(y)
    self._validate_entry(dataset)
    dataset = self._fit_stages(dataset)
    self._fitted = True
    return self._encode_export(dataset)

get_params(deep=True)

Return pipeline parameters (sklearn convention).

Parameters:

Name Type Description Default
deep bool

Ignored — included for sklearn API compatibility.

True

Returns:

Type Description
dict
Source code in quprep/core/pipeline.py
def get_params(self, deep: bool = True) -> dict:
    """
    Return pipeline parameters (sklearn convention).

    Parameters
    ----------
    deep : bool
        Ignored — included for sklearn API compatibility.

    Returns
    -------
    dict
    """
    return {
        "ingester": self.ingester,
        "preprocessor": self.preprocessor,
        "cleaner": self.cleaner,
        "reducer": self.reducer,
        "normalizer": self.normalizer,
        "encoder": self.encoder,
        "exporter": self.exporter,
        "schema": self.schema,
        "drift_detector": self.drift_detector,
    }

load(path) classmethod

Load a previously saved pipeline from a file.

Parameters:

Name Type Description Default
path str or Path

Path to a file created by :meth:Pipeline.save.

required

Returns:

Type Description
Pipeline

Raises:

Type Description
TypeError

If the file does not contain a Pipeline object.

Source code in quprep/core/pipeline.py
@classmethod
def load(cls, path: str | Path) -> Pipeline:
    """
    Load a previously saved pipeline from a file.

    Parameters
    ----------
    path : str or Path
        Path to a file created by :meth:`Pipeline.save`.

    Returns
    -------
    Pipeline

    Raises
    ------
    TypeError
        If the file does not contain a Pipeline object.
    """
    import pickle

    with open(Path(path), "rb") as f:
        obj = pickle.load(f)  # noqa: S301
    if not isinstance(obj, cls):
        raise TypeError(
            f"Expected a Pipeline object, got {type(obj).__name__}."
        )
    return obj

save(path)

Persist the pipeline (configuration and fitted state) to a file.

Uses Python's pickle protocol. The saved file can be reloaded with :meth:Pipeline.load and applied to new data without re-fitting.

Parameters:

Name Type Description Default
path str or Path

Destination file path (e.g. 'pipeline.pkl'). Parent directories are created automatically.

required
Source code in quprep/core/pipeline.py
def save(self, path: str | Path) -> None:
    """
    Persist the pipeline (configuration and fitted state) to a file.

    Uses Python's ``pickle`` protocol. The saved file can be reloaded
    with :meth:`Pipeline.load` and applied to new data without re-fitting.

    Parameters
    ----------
    path : str or Path
        Destination file path (e.g. ``'pipeline.pkl'``). Parent
        directories are created automatically.
    """
    import pickle

    path = Path(path)
    path.parent.mkdir(parents=True, exist_ok=True)
    with open(path, "wb") as f:
        pickle.dump(self, f, protocol=pickle.HIGHEST_PROTOCOL)

set_params(**params)

Set pipeline parameters (sklearn convention).

Parameters:

Name Type Description Default
**params object

Parameter names and values.

{}

Returns:

Type Description
Pipeline

Returns self.

Raises:

Type Description
ValueError

If an unknown parameter name is given.

Source code in quprep/core/pipeline.py
def set_params(self, **params) -> Pipeline:
    """
    Set pipeline parameters (sklearn convention).

    Parameters
    ----------
    **params
        Parameter names and values.

    Returns
    -------
    Pipeline
        Returns ``self``.

    Raises
    ------
    ValueError
        If an unknown parameter name is given.
    """
    valid = set(self.get_params())
    for key, value in params.items():
        if key not in valid:
            raise ValueError(
                f"Invalid parameter '{key}'. Valid parameters: {sorted(valid)}"
            )
        setattr(self, key, value)
    return self

stream(source, chunksize=1000)

Apply a fitted pipeline to a large source in chunks without loading it fully into RAM.

The pipeline must be fitted first (via :meth:fit or :meth:fit_transform). Normaliser statistics and all other fitted parameters are reused for every chunk — only transform is called per chunk, not fit.

Parameters:

Name Type Description Default
source str, Path, or np.ndarray
  • A file path is read in CSV chunks via :class:~quprep.ingest.csv_ingester.CSVIngester.
  • A NumPy array is sliced in row chunks via :class:~quprep.ingest.numpy_ingester.NumpyIngester.
required
chunksize int

Rows per chunk.

1000

Yields:

Type Description
PipelineResult

One result per chunk.

Raises:

Type Description
RuntimeError

If the pipeline has not been fitted.

Examples:

>>> import numpy as np
>>> import quprep as qd
>>> X = np.random.default_rng(0).uniform(0, 1, (1000, 4))
>>> pipeline = qd.Pipeline(encoder=qd.AngleEncoder(), exporter=qd.QASMExporter())
>>> _ = pipeline.fit(X[:100])
>>> for result in pipeline.stream(X, chunksize=200):
...     print(len(result.circuits))
Source code in quprep/core/pipeline.py
def stream(self, source, chunksize: int = 1000):
    """
    Apply a fitted pipeline to a large source in chunks without loading
    it fully into RAM.

    The pipeline **must be fitted first** (via :meth:`fit` or
    :meth:`fit_transform`).  Normaliser statistics and all other fitted
    parameters are reused for every chunk — only ``transform`` is called
    per chunk, not ``fit``.

    Parameters
    ----------
    source : str, Path, or np.ndarray
        - A file path is read in CSV chunks via
          :class:`~quprep.ingest.csv_ingester.CSVIngester`.
        - A NumPy array is sliced in row chunks via
          :class:`~quprep.ingest.numpy_ingester.NumpyIngester`.
    chunksize : int
        Rows per chunk.

    Yields
    ------
    PipelineResult
        One result per chunk.

    Raises
    ------
    RuntimeError
        If the pipeline has not been fitted.

    Examples
    --------
    >>> import numpy as np
    >>> import quprep as qd
    >>> X = np.random.default_rng(0).uniform(0, 1, (1000, 4))
    >>> pipeline = qd.Pipeline(encoder=qd.AngleEncoder(), exporter=qd.QASMExporter())
    >>> _ = pipeline.fit(X[:100])
    >>> for result in pipeline.stream(X, chunksize=200):
    ...     print(len(result.circuits))
    """
    if not self._fitted:
        raise RuntimeError(
            "Pipeline has not been fitted. Call fit() or fit_transform() first."
        )
    from pathlib import Path

    import numpy as np

    if isinstance(source, (str, Path)):
        from quprep.ingest.csv_ingester import CSVIngester
        chunk_gen = CSVIngester().stream(source, chunksize=chunksize)
    elif isinstance(source, np.ndarray):
        from quprep.ingest.numpy_ingester import NumpyIngester
        chunk_gen = NumpyIngester().stream(source, chunksize=chunksize)
    else:
        raise TypeError(
            f"source must be a file path or np.ndarray, got {type(source).__name__}"
        )

    for chunk in chunk_gen:
        yield self._apply_stages(chunk)

summary()

Return a human-readable snapshot of the pipeline configuration.

Shows which stages are configured, whether the pipeline has been fitted, the resolved normalizer, and the last cost estimate (if available).

Returns:

Type Description
str
Source code in quprep/core/pipeline.py
def summary(self) -> str:
    """
    Return a human-readable snapshot of the pipeline configuration.

    Shows which stages are configured, whether the pipeline has been
    fitted, the resolved normalizer, and the last cost estimate (if
    available).

    Returns
    -------
    str
    """
    lines = ["Pipeline"]
    lines.append(f"  fitted       : {'yes' if self._fitted else 'no'}")

    stage_names = [
        ("ingester",     self.ingester),
        ("preprocessor", self.preprocessor),
        ("cleaner",      self.cleaner),
        ("reducer",      self.reducer),
        ("normalizer",   self._resolved_normalizer or self.normalizer),
        ("encoder",      self.encoder),
        ("exporter",     self.exporter),
    ]
    for name, stage in stage_names:
        if stage is not None:
            lines.append(f"  {name:<12} : {type(stage).__name__}")

    if self.schema is not None:
        lines.append(f"  schema       : {len(self.schema.features)} feature(s)")

    if self._last_cost is not None:
        c = self._last_cost
        lines.append(
            f"  cost         : {c.encoding} | "
            f"{c.n_qubits} qubits | "
            f"depth {c.circuit_depth} | "
            f"gates {c.gate_count} | "
            f"NISQ-safe {'yes' if c.nisq_safe else 'NO'}"
        )

    return "\n".join(lines)

transform(source)

Apply fitted pipeline stages to data.

Parameters:

Name Type Description Default
source str, Path, np.ndarray, pd.DataFrame, or Dataset

Input data.

required

Returns:

Type Description
PipelineResult

Raises:

Type Description
RuntimeError

If the pipeline has not been fitted yet.

Source code in quprep/core/pipeline.py
def transform(self, source) -> PipelineResult:
    """
    Apply fitted pipeline stages to data.

    Parameters
    ----------
    source : str, Path, np.ndarray, pd.DataFrame, or Dataset
        Input data.

    Returns
    -------
    PipelineResult

    Raises
    ------
    RuntimeError
        If the pipeline has not been fitted yet.
    """
    if not self._fitted:
        raise RuntimeError(
            "Pipeline has not been fitted. Call fit() or fit_transform() first."
        )
    dataset = self._ingest(source)
    return self._apply_stages(dataset)

PipelineResult

quprep.core.pipeline.PipelineResult(dataset, encoded, circuits, cost=None, audit_log=None, drift_report=None, stages=None)

Output of Pipeline.fit_transform().

Attributes:

Name Type Description
dataset Dataset

The processed Dataset after all pipeline stages (post-normalization).

encoded list[EncodedResult] or None

One EncodedResult per sample. None if no encoder was configured.

circuits list or None

Exported circuit objects (framework-specific). None if no exporter was configured.

cost CostEstimate or None

Gate-count and NISQ-safety estimate for the chosen encoder. None if no encoder was configured.

audit_log list[dict] or None

One entry per preprocessing stage that ran, in order. Each dict has keys: stage, n_samples_in, n_features_in, n_samples_out, n_features_out. None if no preprocessing stages ran.

stages dict[str, Dataset]

Intermediate datasets keyed by stage: 'input', 'after_cleaner', 'after_reducer', 'after_normalizer'. Only stages that ran are included. Useful for debugging individual pipeline steps.

Attributes

circuit property

First item in the batch — convenience for single-sample use.

Returns the first exported circuit if an exporter was configured, otherwise the first EncodedResult if only an encoder was configured, otherwise None.

Functions

summary()

Return a human-readable report of the pipeline result.

Includes the audit log as a formatted table (if any preprocessing stages ran) and the cost estimate breakdown (if an encoder was used).

Returns:

Type Description
str

FingerprintResult

quprep.core.fingerprint.FingerprintResult(config, hash_hex)

Output of :func:fingerprint_pipeline.

Attributes:

Name Type Description
config dict

Full pipeline configuration (stages + dependency versions). This is the dict that was hashed — no timestamp, fully deterministic.

hash str

SHA-256 hex digest of the canonical JSON serialisation of config.

Functions

save(path, format='json')

Write the fingerprint to a file.

Parameters:

Name Type Description Default
path str

Destination file path.

required
format ('json', 'yaml')

Output format.

"json"

to_dict()

Return the config augmented with the hash and a UTC timestamp.

to_json(indent=2)

Return a JSON string (hash + timestamp + config).

to_yaml()

Return a YAML string (requires pyyaml).


fingerprint_pipeline

quprep.core.fingerprint.fingerprint_pipeline(pipeline)

Compute a reproducibility fingerprint for pipeline.

The fingerprint captures the class name and constructor parameters of every configured stage (ingester, preprocessor, cleaner, reducer, normalizer, encoder, exporter, schema, drift_detector) plus the installed versions of key dependencies. The resulting SHA-256 hash is deterministic: the same configuration always produces the same hash regardless of when or where the pipeline runs.

Parameters:

Name Type Description Default
pipeline Pipeline

A Pipeline instance (fitted or unfitted).

required

Returns:

Type Description
FingerprintResult

Contains config (serialisable dict) and hash (SHA-256 hex string).

Examples:

>>> import quprep as qd
>>> pipeline = qd.Pipeline(encoder=qd.AngleEncoder(), exporter=qd.QASMExporter())
>>> fp = qd.fingerprint_pipeline(pipeline)
>>> print(fp.hash)
>>> fp.save("experiment.json")

Examples

Minimal — encode only

import quprep as qd

pipeline = qd.Pipeline(encoder=qd.AngleEncoder())
result = pipeline.fit_transform(data)

result.encoded       # list[EncodedResult]
result.encoded[0].parameters   # rotation angles for first sample
result.encoded[0].metadata     # {"n_qubits": 4, "depth": 1, ...}

Full — clean + encode + export

import quprep as qd

pipeline = qd.Pipeline(
    cleaner=qd.Imputer(strategy="knn"),
    encoder=qd.AngleEncoder(rotation="ry"),
    exporter=qd.QASMExporter(),
)
result = pipeline.fit_transform("data.csv")
result.circuits[0]   # QASM string for first sample

With schema validation

import quprep as qd

schema = qd.DataSchema([
    qd.FeatureSpec("age",    dtype="continuous", min_value=0, max_value=120),
    qd.FeatureSpec("income", dtype="continuous", min_value=0),
])
pipeline = qd.Pipeline(encoder=qd.AngleEncoder(), schema=schema)
result = pipeline.fit_transform("data.csv")

print(result.cost.nisq_safe)   # True / False
result.summary()               # audit table + cost breakdown

sklearn-style fit / transform split

import quprep as qd

pipeline = qd.Pipeline(
    reducer=qd.PCAReducer(n_components=4),
    encoder=qd.AngleEncoder(),
)
pipeline.fit(X_train)
r_train = pipeline.transform(X_train)
r_test  = pipeline.transform(X_test)

Explicit normalizer

import quprep as qd

pipeline = qd.Pipeline(
    encoder=qd.AngleEncoder(),
    normalizer=qd.Scaler("zscore"),  # override auto-selection
)

Saving and loading a fitted pipeline

import quprep as qd

pipeline = qd.Pipeline(
    reducer=qd.PCAReducer(n_components=4),
    encoder=qd.AngleEncoder(),
)
pipeline.fit(X_train)
pipeline.save("pipeline.pkl")

# Later — in a different process or deployment
loaded = qd.Pipeline.load("pipeline.pkl")
result = loaded.transform(X_new)

The parent directory is created automatically. All fitted state (reducer, normalizer, encoder) is preserved.

With drift detection

import quprep as qd

det = qd.DriftDetector(mean_threshold=3.0, std_threshold=2.0)

pipeline = qd.Pipeline(
    encoder=qd.AngleEncoder(),
    drift_detector=det,
)
pipeline.fit(X_train)
result = pipeline.transform(X_test)

print(result.drift_report.overall_drift)      # True / False
print(result.drift_report.drifted_features)   # list of feature names

Drift is checked automatically on every transform() call. A QuPrepWarning is issued when drift is detected. The drift detector state is preserved through save()/load().

Time series pipeline (v0.7.0)

import quprep as qd

pipeline = qd.Pipeline(
    ingester=qd.TimeSeriesIngester(time_column="date"),
    preprocessor=qd.WindowTransformer(window_size=8, step=1),
    encoder=qd.AngleEncoder(),
)
result = pipeline.fit_transform("sensor_data.csv")

print(len(result.encoded))                        # n_windows
print(result.encoded[0].metadata["n_qubits"])     # window_size × n_features

The preprocessor stage runs after ingestion and before cleaning/reduction. It is designed for shape-changing transforms like WindowTransformer.

Sparse data (v0.7.0)

import scipy.sparse as sp
import quprep as qd

sparse_matrix = sp.csr_matrix(X)
result = qd.Pipeline(encoder=qd.AngleEncoder()).fit_transform(sparse_matrix)

scipy.sparse matrices are accepted anywhere a NumPy array is expected. They are converted to dense at ingestion.

Labels and multi-label (v0.7.0)

import quprep as qd

# Attach labels at fit_transform time
result = qd.Pipeline(encoder=qd.AngleEncoder()).fit_transform(X, y=y)
print(result.dataset.labels)   # preserved through all stages

# Or embed labels in the Dataset via CSVIngester
from quprep.ingest.csv_ingester import CSVIngester

pipeline = qd.Pipeline(
    ingester=CSVIngester(target_columns="label"),
    encoder=qd.AngleEncoder(),
)
result = pipeline.fit_transform("data.csv")
print(result.dataset.labels.shape)   # (n_samples,)

For FeatureSelector(method="mutual_info"), labels in dataset.labels are used automatically — no separate labels= argument needed.

Inspecting intermediate stages (v0.10.0)

PipelineResult.stages gives access to the Dataset after each pipeline step:

result = qd.Pipeline(
    cleaner=qd.OutlierHandler(),
    reducer=qd.PCAReducer(n_components=4),
    encoder=qd.AngleEncoder(),
).fit_transform(df)

print(result.stages["input"].data.shape)           # raw input
print(result.stages["after_cleaner"].data.shape)   # post outlier removal
print(result.stages["after_reducer"].data.shape)   # post PCA
print(result.stages["after_normalizer"].data.shape) # pre-encoding

API consistency additions (v0.10.0)

Feature names after selection:

selector = qd.FeatureSelector(method="variance", threshold=0.01)
selector.fit(dataset)
print(selector.get_feature_names_out())  # ['age', 'income', ...]

Outlier mask:

handler = qd.OutlierHandler(method="iqr", action="remove")
handler.fit_transform(dataset)
print(handler.outlier_mask_)  # bool array, True = outlier row

Reverse normalisation:

scaler = qd.Scaler("zscore")
scaled = scaler.fit_transform(dataset)
original = scaler.inverse_transform(scaled)  # back to original scale
# Supported: minmax, minmax_pi, minmax_2pi, minmax_pm_pi, zscore
# Not supported: l2, binary, pm_one (not invertible)

Categorical cardinality control:

# Warn when a column has > 20 unique categories
# Group categories appearing fewer than 5 times as "_other"
encoder = qd.CategoricalEncoder(
    strategy="onehot",
    cardinality_threshold=20,
    min_frequency=5,
)
encoder.fit_transform(dataset)

Explained variance (LDA):

pipeline = qd.Pipeline(reducer=qd.LDAReducer(n_components=3, labels=y))
pipeline.fit(dataset)
print(pipeline.reducer.explained_variance_ratio_)  # also available on PCAReducer

Reproducibility fingerprinting (v0.8.0)

import quprep as qd

pipeline = qd.Pipeline(
    cleaner=qd.Imputer(strategy="knn"),
    reducer=qd.PCAReducer(n_components=4),
    encoder=qd.AngleEncoder(rotation="ry"),
    exporter=qd.QASMExporter(),
)

fp = pipeline.fingerprint()

print(fp.hash)       # sha256 hex — stable across runs for the same config
fp.save("experiment.json")          # JSON (default)
fp.save("experiment.yaml", format="yaml")   # YAML (requires pyyaml)
print(fp.to_json())  # full JSON string including hash and UTC timestamp

The hash captures every stage class, all constructor parameters, and installed dependency versions. It is deterministic — the same configuration always produces the same hash regardless of when or where it runs. Include it in paper methods sections to make experiments exactly reproducible.