Skip to main content

GaLore AdamW — Memory-efficient optimizer using Gradient Low-Rank Projection for training LLMs on consumer GPUs

Project description

GaLore AdamW

Memory-efficient PyTorch optimizer using Gradient Low-Rank Projection (GaLore) for training LLMs on consumer GPUs.

Projects gradients to a low-rank subspace before computing optimizer states, reducing memory by 8-32x for large weight matrices.

Features

  • GaLore projection: Randomized SVD projects gradients to low-rank subspace
  • 8-bit quantized states: Optional int8 quantization of Adam m/v (4x extra savings)
  • Sophia-style clipping: Optional diagonal Hessian clipping for stability
  • Adaptive rank: Automatically scales projection rank per layer size

Installation

pip install galore-adamw

Quick Start

from galore_adamw import GaLoreAdamW, GaLoreConfig

# Basic usage
cfg = GaLoreConfig(lr=1e-3, rank=128)
optimizer = GaLoreAdamW(model.parameters(), cfg)

# With 8-bit states + Sophia clipping (maximum memory savings)
cfg = GaLoreConfig(
    lr=1e-3,
    rank=128,
    use_8bit_states=True,
    use_sophia_clip=True,
)
optimizer = GaLoreAdamW(model.parameters(), cfg)

# Training loop (standard PyTorch)
for batch in dataloader:
    loss = model(batch)
    loss.backward()
    optimizer.step()
    optimizer.zero_grad()

Configuration

Parameter Default Description
lr 1e-3 Learning rate
rank 128 Max rank of gradient projection
update_proj_gap 200 Steps between SVD re-projection
scale 1.0 Scale factor for projected update
proj_type "std" "std" (rows) or "reverse" (cols)
adaptive_rank True Auto-scale rank per layer
use_8bit_states False Quantize m,v to int8
use_sophia_clip False Diagonal Hessian clipping
sophia_rho 0.03 Clipping threshold
weight_decay 0.01 Decoupled weight decay
max_grad_norm 1.0 Gradient clipping (0=disabled)

Memory Stats

stats = optimizer.get_memory_stats()
print(f"GaLore: {stats['galore_state_mb']:.1f} MB")
print(f"AdamW:  {stats['adamw_state_mb']:.1f} MB")
print(f"Savings: {stats['savings_ratio']:.1f}x")

License

MIT

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

galore_adamw-0.1.0.tar.gz (9.5 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

galore_adamw-0.1.0-py3-none-any.whl (9.9 kB view details)

Uploaded Python 3

File details

Details for the file galore_adamw-0.1.0.tar.gz.

File metadata

  • Download URL: galore_adamw-0.1.0.tar.gz
  • Upload date:
  • Size: 9.5 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.3

File hashes

Hashes for galore_adamw-0.1.0.tar.gz
Algorithm Hash digest
SHA256 6a9f33abb055c289eb77deb13c6fe18ed0c6fc4ea0db9fceaecdc3f15a5d92fc
MD5 244706870058c193cacc3a64ea5fb160
BLAKE2b-256 24fa0d51dbcaabc08bd08838cdf6792712cc175f5bf61547a46c0f4fea2228b1

See more details on using hashes here.

File details

Details for the file galore_adamw-0.1.0-py3-none-any.whl.

File metadata

  • Download URL: galore_adamw-0.1.0-py3-none-any.whl
  • Upload date:
  • Size: 9.9 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.3

File hashes

Hashes for galore_adamw-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 e381afe501271ee9de3961642ea9a0fca09bdd8635d14d456d8227872e033ccd
MD5 aa525fd7e23dd7a2bf2f8da4ca44e93e
BLAKE2b-256 67c250f04b667973a44a664e4d4a7aef88f1352e6ba55c15b2c5c57166d2f6ee

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page