Dimensionality Reduction¶
Current quantum hardware is limited to tens or hundreds of qubits. Most real-world datasets have far more features. Reduction brings your data into a range that hardware can handle.
Why it matters¶
| Encoding | Qubits needed for d features |
|---|---|
| Angle | d qubits |
| Amplitude | \(\log_2(d)\) qubits |
| Basis | d qubits |
| IQP | d qubits |
If your dataset has 512 features and your backend has 127 qubits, angle encoding is infeasible without reduction.
PCA¶
Good general-purpose default. n_components can be an integer (exact) or float (variance fraction, e.g. 0.95).
import quprep as qd
pipeline = qd.Pipeline(
reducer=qd.PCAReducer(n_components=8),
encoder=qd.AngleEncoder(),
)
result = pipeline.fit_transform("data.csv")
# Access explained variance after fit
print(pipeline.reducer.explained_variance_ratio_)
LDA¶
Maximises class separability. Research shows LDA outperforms PCA for quantum classification tasks. Requires class labels. Maximum components: n_classes − 1.
import quprep as qd
pipeline = qd.Pipeline(
reducer=qd.LDAReducer(n_components=4, labels=y),
encoder=qd.AngleEncoder(),
)
result = pipeline.fit_transform("data.csv")
# Access explained variance after fit
print(pipeline.reducer.explained_variance_ratio_)
Spectral (DFT)¶
Row-wise FFT — keeps the first n_components frequency magnitudes. Outputs are always ≥ 0. Best for time-series and signal data.
import quprep as qd
pipeline = qd.Pipeline(
reducer=qd.SpectralReducer(n_components=8),
encoder=qd.AngleEncoder(),
)
t-SNE¶
Preserves local structure. Best suited to 2–3 qubit circuits and visualization tasks.
Warning
t-SNE requires perplexity < n_samples. For small datasets use a lower perplexity.
UMAP¶
Faster than t-SNE and scales to large datasets. Preserves both local and global structure.
Raises ImportError with install hint if umap-learn is not installed.
Hardware-aware reduction¶
Automatically calculates the qubit budget for a target backend and reduces to fit. No manual calculation needed.
import quprep as qd
# By backend name
pipeline = qd.Pipeline(
reducer=qd.HardwareAwareReducer(backend="ibm_brisbane"),
encoder=qd.AngleEncoder(),
)
# By qubit count
pipeline = qd.Pipeline(
reducer=qd.HardwareAwareReducer(backend=16),
encoder=qd.AngleEncoder(),
)
Supported backends: ibm_brisbane (127), ibm_kyiv (127), ibm_torino (133), ionq_harmony (11), quantinuum_h1 (20), and more. Pass an integer to specify any qubit count directly.
References
Mancilla, J., & Pere, C. (2022). A preprocessing perspective for quantum machine learning classification advantage in finance using NISQ algorithms. Entropy, 24(11), 1656. doi:10.3390/e24111656