QuPrep¶
The missing preprocessing layer between classical datasets and quantum computing frameworks.
QuPrep converts classical datasets into quantum-circuit-ready format. It is not a quantum computing framework, simulator, or training tool — it is the preprocessing step that feeds into Qiskit, PennyLane, Cirq, TKET, and any other quantum workflow.
-
Install in seconds
No quantum framework required for the core install.
-
Zero to circuit in one line
-
Not sure which encoding?
-
QUBO & quantum optimization
Pipeline stages¶
| Stage | Since | Description |
|---|---|---|
| Ingest | v0.1.0 | CSV, TSV, NumPy arrays, Pandas DataFrames |
| Clean | v0.1.0 | Missing values, outliers, categoricals, feature selection |
| Normalize | v0.1.0 | Auto-selected per encoding (L2, MinMax, Z-score, binary) |
| Encode | v0.1.0 | Angle, Amplitude, Basis |
| Export | v0.1.0 | OpenQASM 3.0, Qiskit |
| Reduce | v0.2.0 | PCA, LDA, DFT, t-SNE, UMAP, hardware-aware |
| Encode+ | v0.2.0 | IQP, Entangled Angle, Data re-uploading, Hamiltonian |
| Export+ | v0.2.0 | PennyLane, Cirq, TKET, ASCII + matplotlib visualization |
| Recommend | v0.2.0 | Automatic encoding selection for your dataset and task |
| QUBO | v0.3.0 | QUBO/Ising, 7 problem formulations, solvers, QAOA, D-Wave export |
| Validate | v0.4.0 | Input validation, schema enforcement, cost estimation, sklearn fit/transform, import quprep as qd |
| Intelligence | v0.5.0 | Qubit suggestion, encoding comparison, data drift detection, pipeline save/load, batch QASM export |
| Encode++ | v0.6.0 | ZZFeatureMap, PauliFeatureMap, RandomFourier, TensorProduct, QAOAProblem encoders |
| Export++ | v0.6.0 | Amazon Braket, Q# (Azure Quantum), IQM native format |
| Plugins | v0.6.0 | register_encoder / register_exporter — custom encoders/exporters via prepare() |
| Modalities | v0.7.0 | Time series, sparse matrices, multi-label, image, text (TF-IDF + sentence-transformers), graph (lossy feature extraction + lossless graph state encoding) |
| Connectors | v0.8.0 | HuggingFace datasets, OpenML, Kaggle — load any public dataset in one line |
| CLI tools | v0.8.0 | quprep inspect (dataset profile), quprep benchmark (encoder comparison table) |
| Reproducibility | v0.8.0 | fingerprint_pipeline() — deterministic SHA-256 hash of pipeline config for paper methods sections |
| Noise-aware preprocessing | v0.9.0 | Assign high-variance features to least-noisy qubits; minimise SWAP count given hardware topology; remap angles away from 0/π poles |
| Encoding quality metrics | v0.9.0 | Simulation-based expressibility, entanglement capability, and kernel alignment scores; use_metrics=True in recommend() for data-driven re-ranking |
| Class imbalance | v0.9.0 | ImbalanceHandler — random oversample/undersample, SMOTE, ADASYN as a clean/ stage |
| Barren plateau detection | v0.9.0 | detect_barren_plateau() — analytical gradient variance bound before training; risk levels + mitigation suggestions |
| Streaming ingestion | v0.9.0 | CSVIngester.stream(), NumpyIngester.stream(), Pipeline.stream() — process datasets larger than RAM in chunks |
| API polish | v0.10.0 | Scaler.inverse_transform(), OutlierHandler.outlier_mask_, FeatureSelector.get_feature_names_out(), LDAReducer.explained_variance_ratio_, CategoricalEncoder high-cardinality grouping, PipelineResult.stages per-step snapshots |
| Quantum preprocessing | v0.10.0 | check_compatibility(), verify_encoding(), encoding_sensitivity(), suggest_pipeline(), preprocessing_report(), inspect_encoding() — quantum-aware dataset audit and circuit inspection |
| New encoders | v0.10.0 | DenseAngleEncoder (2 features/qubit via Ry+Rz), DiscretizedEncoder (continuous → binary, QUBO-ready) |
Supported frameworks¶
| Framework | Install | Output type |
|---|---|---|
| OpenQASM 3.0 | (no extra deps) | str |
| Qiskit | quprep[qiskit] |
QuantumCircuit |
| PennyLane | quprep[pennylane] |
qml.QNode |
| Cirq | quprep[cirq] |
cirq.Circuit |
| TKET | quprep[tket] |
pytket.Circuit |
| Amazon Braket | quprep[braket] |
braket.circuits.Circuit |
| Q# / Azure Quantum | quprep[qsharp] |
str (Q# 1.0 source) |
| IQM | quprep[iqm] |
dict (PRX+CZ JSON) |
| D-Wave Ocean | (via .to_dwave()) |
BQM dict |
What QuPrep does NOT do¶
QuPrep is intentionally narrow in scope. It does not:
- Train quantum machine learning models
- Simulate quantum circuits
- Execute on quantum hardware
- Optimize variational parameters
- Replace Qiskit, PennyLane, Cirq, or any other framework
It prepares your data. Everything else is your framework's job.
CLI¶
# Profile a dataset (shape, types, missing, sparsity, recommendation)
quprep inspect data.csv
quprep inspect data.csv --task kernel --qubits 8
# Benchmark all encoders (gate count, depth, timing)
quprep benchmark data.csv --task classification
quprep benchmark data.csv --include angle,iqp,amplitude --output bench.json
# Encode a CSV to OpenQASM 3.0
quprep convert data.csv --encoding angle
# Get an encoding recommendation
quprep recommend data.csv --task classification --qubits 8
# QUBO problems
quprep qubo maxcut --adjacency "0,1,1;1,0,1;1,1,0" --solve
quprep qubo qaoa maxcut --adjacency "0,1,1;1,0,1;1,1,0" --p 2 --output circuit.qasm