Skip to content

Class Imbalance

quprep.clean.imbalance

Class imbalance handling — oversample, undersample, SMOTE, ADASYN.

Classes

ImbalanceHandler(strategy='oversample', sampling_strategy='auto', k_neighbors=5, random_state=42)

Balance class distributions before quantum encoding.

Supports four strategies:

  • "oversample" — random duplication of minority samples (no extra deps).
  • "undersample" — random removal of majority samples (no extra deps).
  • "smote" — Synthetic Minority Over-sampling Technique; interpolates in feature space using k-nearest neighbours (requires scikit-learn, already a core dependency).
  • "adasyn" — Adaptive Density-based Synthetic sampling; focuses synthetic samples on harder-to-learn regions (requires imbalanced-learn: pip install quprep[imbalance]).

Parameters:

Name Type Description Default
strategy ('oversample', 'undersample', 'smote', 'adasyn')

Resampling strategy.

"oversample"
sampling_strategy float or 'auto'
  • "auto" balances all classes to the majority class count (oversampling) or the minority class count (undersampling).
  • A float r targets majority_count × r samples per class for oversampling, or minority_count / r for undersampling.
'auto'
k_neighbors int

Number of nearest neighbours for SMOTE and ADASYN.

5
random_state int

Seed for reproducibility.

42

Examples:

>>> import numpy as np
>>> import quprep as qd
>>> from quprep.core.dataset import Dataset
>>> rng = np.random.default_rng(0)
>>> X = rng.uniform(0, 1, (110, 4))
>>> y = np.array([0] * 100 + [1] * 10)
>>> ds = Dataset(data=X, labels=y)
>>> handler = qd.ImbalanceHandler(strategy="smote")
>>> ds_bal = handler.fit_transform(ds)
>>> from collections import Counter
>>> print(Counter(ds_bal.labels))
Counter({0: 100, 1: 100})
Functions
fit(dataset)

Compute class distribution and target count from dataset.

Parameters:

Name Type Description Default
dataset Dataset

Must have labels set (1-D array, single-target only).

required
fit_transform(dataset)

Fit and transform in one step.

transform(dataset)

Apply the fitted resampling strategy to dataset.

Parameters:

Name Type Description Default
dataset Dataset
required

Returns:

Type Description
Dataset

New Dataset with resampled data and labels (shuffled).