Skip to content

Encoding Comparison & Smart Recommendations

QuPrep can compare all encoding methods side-by-side and recommend the best one for your dataset — analytically, before a single circuit is built.


Encoding comparison

compare_encodings() profiles your dataset and runs cost estimation for every encoder, returning a table of qubit count, gate count, circuit depth, 2-qubit gates, and NISQ safety.

import quprep as qd

result = qd.compare_encodings("data.csv")
print(result)
Encoding            Qubits    Gate Count  Depth    2Q Gates   NISQ Safe
------------------  --------  -----------  -------  ----------  ----------
angle               8         8            1        0           Yes
amplitude           3         16           8        8           Yes
basis               8         8            1        0           Yes
iqp                 8         84           64       28          Yes
reupload            8         24           24       0           Yes
entangled_angle     8         15           8        7           Yes
hamiltonian         8         8            8        0           Yes

No circuits are generated — costs are estimated analytically, so comparison is fast even for large datasets.


Filtering encoders

# Only compare a subset
result = qd.compare_encodings(X, include=["angle", "iqp", "amplitude"])

# Exclude encoders you know won't fit
result = qd.compare_encodings(X, exclude=["amplitude", "hamiltonian"])

Task-aware recommendation

Pass task= to highlight the best encoding for your use case:

result = qd.compare_encodings("data.csv", task="classification")
print(result)
# The recommended encoding is starred in the table

Valid tasks: classification, regression, qaoa, kernel, simulation.


Qubit budget

result = qd.compare_encodings("data.csv", qubits=8)
# Encoders requiring more than 8 qubits have nisq_safe=False and a budget warning

Picking the best

best = result.best(prefer="nisq")    # NISQ-safe, then lowest depth (default)
best = result.best(prefer="depth")   # Globally shallowest circuit
best = result.best(prefer="gates")   # Fewest total gates
best = result.best(prefer="qubits")  # Fewest qubits
print(best.encoding, best.n_qubits, best.circuit_depth)

Programmatic access

for row in result.to_dict():
    print(row["encoding"], row["nisq_safe"], row["circuit_depth"])

CLI

quprep compare data.csv
quprep compare data.csv --task classification --qubits 8
quprep compare data.csv --include angle,iqp,amplitude
quprep compare data.csv --exclude amplitude,hamiltonian

Smart recommendation engine

recommend() goes beyond a fixed lookup table — it adapts its scores based on what it finds in your data:

Signal How it affects the recommendation
n_samples > 500 Penalises amplitude (expensive state prep per sample); rewards reupload
n_samples < 20 Penalises reupload (high expressivity → overfitting risk)
missing_rate > 10% Penalises amplitude (requires exact unit norm)
Negative values in data Rewards amplitude (handles negatives naturally via superposition); penalises basis (all negatives → 0 after binarization)
Sparse data (many zeros) Boosts basis encoding
Correlated features Boosts IQP and entangled angle (entanglement captures inter-feature structure)
n_features > 15 Penalises IQP (depth grows as O(d²))
rec = qd.recommend("data.csv", task="classification", qubits=8)
print(rec)
# Recommended encoding : iqp
# Qubits needed        : 8
# Circuit depth        : O(d²·reps)
# NISQ safe            : yes
# Score                : 54.0
# Reason               : best fit for classification tasks; continuous features
#                        map naturally to rotation angles; NISQ-safe (shallow circuit).
# Alternatives         :
#   angle            score=45.0  O(d)
#   reupload         score=45.0  O(d·layers)
#   ...

The reason field always explains which dataset signals drove the recommendation.


Simulation-based re-ranking (use_metrics=True)

Pass use_metrics=True to augment the heuristic scores with circuit-level metrics computed by simulating each encoding on samples from your data:

rec = qd.recommend("data.csv", task="classification", use_metrics=True)
print(rec)

When n_features ≤ 12, QuPrep simulates the candidate encodings with a lightweight numpy statevector backend and adds the following bonuses to each encoding's heuristic score:

Metric Bonus range Direction
Expressibility (KL divergence) up to +8 lower KL → more expressive → higher bonus
Entanglement capability up to +6 classification / kernel tasks only
Kernel alignment up to ±12 higher alignment → better class separation

The recommendation is then re-ranked by the combined score. For datasets with more than 12 features the metrics pass is skipped and the heuristic scores are used unchanged.


Combining both

# Compare first, then apply the best one
result = qd.compare_encodings("data.csv", task="classification", qubits=8)
best = result.best(prefer="nisq")

pipeline = qd.Pipeline(
    encoder=getattr(qd, f"{best.encoding.title().replace('_', '')}Encoder")(),
    exporter=qd.QASMExporter(),
)
pipeline_result = pipeline.fit_transform("data.csv")

Or use recommend() directly with .apply():

rec = qd.recommend("data.csv", task="classification")
pipeline_result = rec.apply("data.csv")

Reproducibility fingerprinting

Once you've chosen an encoding, lock in the exact configuration with a deterministic hash — useful for paper methods sections and experiment logs.

import quprep as qd

pipeline = qd.Pipeline(
    cleaner=qd.Imputer(strategy="knn"),
    reducer=qd.PCAReducer(n_components=4),
    encoder=qd.AngleEncoder(rotation="ry"),
    exporter=qd.QASMExporter(),
)

fp = pipeline.fingerprint()
print(fp.hash)   # sha256 hex — same config always produces the same hash

Or via the standalone function:

fp = qd.fingerprint_pipeline(pipeline)

What the hash captures

  • Class name and all constructor parameters for every configured stage
  • Installed versions of key dependencies (numpy, scikit-learn, scipy, qiskit, pennylane, etc.)
  • QuPrep version and Python version

The hash is stable across runs — the timestamp is excluded so that the same configuration always produces the same hash regardless of when or where it runs.

Exporting for a paper

# JSON (default) — attach to supplementary material
fp.save("experiment.json")

# YAML — requires pyyaml
fp.save("experiment.yaml", format="yaml")

# Inline JSON string
print(fp.to_json())

Example JSON output:

{
  "hash": "sha256:a3f7c1...",
  "timestamp": "2026-04-10T10:23:00+00:00",
  "quprep_version": "0.9.0",
  "python_version": "3.12.0",
  "stages": {
    "cleaner":  { "class": "Imputer",    "params": { "strategy": "knn" } },
    "reducer":  { "class": "PCAReducer", "params": { "n_components": 4 } },
    "encoder":  { "class": "AngleEncoder","params": { "rotation": "ry" } },
    "exporter": { "class": "QASMExporter","params": {} }
  },
  "dependencies": {
    "numpy": "1.26.4",
    "scikit-learn": "1.4.2"
  }
}

Include the hash value in your paper's methods section. Readers can reproduce your exact setup by inspecting the JSON and recreating the pipeline.