GaLore AdamW — Memory-efficient optimizer using Gradient Low-Rank Projection for training LLMs on consumer GPUs
Project description
GaLore AdamW
Memory-efficient PyTorch optimizer using Gradient Low-Rank Projection (GaLore) for training LLMs on consumer GPUs.
Projects gradients to a low-rank subspace before computing optimizer states, reducing memory by 8-32x for large weight matrices.
Features
- GaLore projection: Randomized SVD projects gradients to low-rank subspace
- 8-bit quantized states: Optional int8 quantization of Adam m/v (4x extra savings)
- Sophia-style clipping: Optional diagonal Hessian clipping for stability
- Adaptive rank: Automatically scales projection rank per layer size
Installation
pip install galore-adamw
Quick Start
from galore_adamw import GaLoreAdamW, GaLoreConfig
# Basic usage
cfg = GaLoreConfig(lr=1e-3, rank=128)
optimizer = GaLoreAdamW(model.parameters(), cfg)
# With 8-bit states + Sophia clipping (maximum memory savings)
cfg = GaLoreConfig(
lr=1e-3,
rank=128,
use_8bit_states=True,
use_sophia_clip=True,
)
optimizer = GaLoreAdamW(model.parameters(), cfg)
# Training loop (standard PyTorch)
for batch in dataloader:
loss = model(batch)
loss.backward()
optimizer.step()
optimizer.zero_grad()
Configuration
| Parameter | Default | Description |
|---|---|---|
lr |
1e-3 |
Learning rate |
rank |
128 |
Max rank of gradient projection |
update_proj_gap |
200 |
Steps between SVD re-projection |
scale |
1.0 |
Scale factor for projected update |
proj_type |
"std" |
"std" (rows) or "reverse" (cols) |
adaptive_rank |
True |
Auto-scale rank per layer |
use_8bit_states |
False |
Quantize m,v to int8 |
use_sophia_clip |
False |
Diagonal Hessian clipping |
sophia_rho |
0.03 |
Clipping threshold |
weight_decay |
0.01 |
Decoupled weight decay |
max_grad_norm |
1.0 |
Gradient clipping (0=disabled) |
Memory Stats
stats = optimizer.get_memory_stats()
print(f"GaLore: {stats['galore_state_mb']:.1f} MB")
print(f"AdamW: {stats['adamw_state_mb']:.1f} MB")
print(f"Savings: {stats['savings_ratio']:.1f}x")
License
MIT
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file galore_adamw-0.1.0.tar.gz.
File metadata
- Download URL: galore_adamw-0.1.0.tar.gz
- Upload date:
- Size: 9.5 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.12.3
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
6a9f33abb055c289eb77deb13c6fe18ed0c6fc4ea0db9fceaecdc3f15a5d92fc
|
|
| MD5 |
244706870058c193cacc3a64ea5fb160
|
|
| BLAKE2b-256 |
24fa0d51dbcaabc08bd08838cdf6792712cc175f5bf61547a46c0f4fea2228b1
|
File details
Details for the file galore_adamw-0.1.0-py3-none-any.whl.
File metadata
- Download URL: galore_adamw-0.1.0-py3-none-any.whl
- Upload date:
- Size: 9.9 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.12.3
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
e381afe501271ee9de3961642ea9a0fca09bdd8635d14d456d8227872e033ccd
|
|
| MD5 |
aa525fd7e23dd7a2bf2f8da4ca44e93e
|
|
| BLAKE2b-256 |
67c250f04b667973a44a664e4d4a7aef88f1352e6ba55c15b2c5c57166d2f6ee
|