Class Imbalance¶
quprep.clean.imbalance
¶
Class imbalance handling — oversample, undersample, SMOTE, ADASYN.
Classes¶
ImbalanceHandler(strategy='oversample', sampling_strategy='auto', k_neighbors=5, random_state=42)
¶
Balance class distributions before quantum encoding.
Supports four strategies:
"oversample"— random duplication of minority samples (no extra deps)."undersample"— random removal of majority samples (no extra deps)."smote"— Synthetic Minority Over-sampling Technique; interpolates in feature space using k-nearest neighbours (requires scikit-learn, already a core dependency)."adasyn"— Adaptive Density-based Synthetic sampling; focuses synthetic samples on harder-to-learn regions (requiresimbalanced-learn:pip install quprep[imbalance]).
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
strategy
|
('oversample', 'undersample', 'smote', 'adasyn')
|
Resampling strategy. |
"oversample"
|
sampling_strategy
|
float or 'auto'
|
|
'auto'
|
k_neighbors
|
int
|
Number of nearest neighbours for SMOTE and ADASYN. |
5
|
random_state
|
int
|
Seed for reproducibility. |
42
|
Examples:
>>> import numpy as np
>>> import quprep as qd
>>> from quprep.core.dataset import Dataset
>>> rng = np.random.default_rng(0)
>>> X = rng.uniform(0, 1, (110, 4))
>>> y = np.array([0] * 100 + [1] * 10)
>>> ds = Dataset(data=X, labels=y)
>>> handler = qd.ImbalanceHandler(strategy="smote")
>>> ds_bal = handler.fit_transform(ds)
>>> from collections import Counter
>>> print(Counter(ds_bal.labels))
Counter({0: 100, 1: 100})
Functions¶
fit(dataset)
¶
Compute class distribution and target count from dataset.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
dataset
|
Dataset
|
Must have |
required |
fit_transform(dataset)
¶
Fit and transform in one step.
transform(dataset)
¶
Apply the fitted resampling strategy to dataset.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
dataset
|
Dataset
|
|
required |
Returns:
| Type | Description |
|---|---|
Dataset
|
New Dataset with resampled data and labels (shuffled). |