Implementing easy-to-use methods for classical and novel tabular data augmentation and synthesis.
Project description
Description
tabular_augmentation contains some classical and novel methods used for data augmentation, making tabular data
augmentation easier, especially for few-shot learning case.
Usage
SMOTE-based methods
from tabular_augmentation import smote_augmentation
method = 'SVMSMOTE'
x_synthesis, y_synthesis = smote_augmentation(x_few_train, y_few_train, method, seed=seed,
oversample_num=100, positive_ratio=None,
knn_neighbors=3)
tabular_model_test(x_synthesis, y_synthesis, x_test, y_test, model_name='xgb')
Mixup-base methods
from tabular_augmentation import mixup_augmentation_with_weight
method = 'vanilla'
x_synthesis, y_synthesis, sample_weight = mixup_augmentation_with_weight(
x_few_train, y_few_train, oversample_num=200, alpha=1, beta=1, mixup_type=method, seed=seed, rebalanced_ita=1)
tabular_model_test(x_synthesis, y_synthesis, x_test, y_test, model_name='xgb', sample_weight=sample_weight)
CTGAN/TVAE-based methods
Methods(CTGAN/TVAE/DeltaTVAE/DiffTVAE) use sdv_synthesis function to generate synthetic data, and ConditionalTVAE use sdv_synthesis_cvae function
from tabular_augmentation import sdv_synthesis, sdv_synthesis_cvae
method = 'CTGAN'
x_synthesis, y_synthesis = sdv_synthesis(
x_few_train, y_few_train, method, oversample_num=5000,
seed=seed, init_synthesizer=True, positive_ratio=0.5,
)
tabular_model_test(x_synthesis, y_synthesis, x_test, y_test, model_name='xgb')
TabDDPM-based methods
from tabular_augmentation import ddpm_synthesis
method = "DDPM"
x_synthesis, y_synthesis = ddpm_synthesis(
x_few_train, y_few_train, method, oversample_num=5000, seed=seed, init_synthesizer=True, positive_ratio=None, train_steps=10000)
tabular_model_test(x_synthesis, y_synthesis, x_test, y_test, model_name='xgb')
Example
For details, please refer to example.ipynb
Cite
SMOTE
MIXUP
[ICLR' 18]mixup: BEYOND EMPIRICAL RISK MINIMIZATION Mixup
[ICLR' 22]Noisy Feature Mixup NoisyMixup
[ECCV' 20]Remix: Rebalanced Mixup
CTGAN/TVAE
[NIPS' 19]Modeling Tabular data using Conditional GAN CTGAN
TabDDPM
[ICML' 23] TabDDPM: Modelling Tabular Data with Diffusion Models TabDDPM
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file tabular_augmentation-0.0.18.tar.gz.
File metadata
- Download URL: tabular_augmentation-0.0.18.tar.gz
- Upload date:
- Size: 59.8 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/4.0.2 CPython/3.10.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
fe443b4c481bc5c12199783e10d580994e22e4ee57ebfe536dcd428ec13904f6
|
|
| MD5 |
4f05794e935242824e00e6b9c1683397
|
|
| BLAKE2b-256 |
3b00e2293b84278e2b905621e230ec2471819fa9b2e3a49407c1c534d824625d
|
File details
Details for the file tabular_augmentation-0.0.18-py3-none-any.whl.
File metadata
- Download URL: tabular_augmentation-0.0.18-py3-none-any.whl
- Upload date:
- Size: 73.7 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/4.0.2 CPython/3.10.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
d066325d585cd67aac99e7fa6a842253ef338f0ac5c2cebda19c0e4d248add7e
|
|
| MD5 |
75535700e3261ffa8f5c5e87da2e5d49
|
|
| BLAKE2b-256 |
c644e0c5f8f0412831ad53edf6393e231c58fd33c793592fb189b09d2b6ee3b1
|