AfriLink SDK — One-line access to GPUs, models and datasets from your notebook
Project description
AfriLink SDK
Version: 0.7.5
Last Updated: May 08, 2026
Train & Finetune on HPC from Any Notebook
AfriLink SDK gives you one-line access to A100 GPUs for training and finetuning across text, vision and multimodal models. Works on Google Colab, Kaggle, Jupyter, VS Code, and any Python environment.
| Capability | API | What It Does |
|---|---|---|
| Training | client.train() |
Run any training script (YOLOv8, custom PyTorch, etc.) on HPC A100s |
| Finetuning | client.finetune() |
LoRA/QLoRA LLM fine-tuning with one line |
pip install afrilink-sdk
Quick Start — Finetune an LLM
from afrilink import AfriLinkClient
client = AfriLinkClient()
client.authenticate() # prompts for DataSpires email/password
import pandas as pd
data = pd.DataFrame({"text": [
"Below is an instruction...\n\n### Response:\nHere is the answer..."
]})
job = client.finetune(
model="qwen2.5-0.5b",
training_mode="low",
data=data,
gpus=1,
time_limit="01:00:00",
)
result = job.run(wait=True)
if result["status"] == "completed":
client.download_model(result["job_id"], "./my-model")
Quick Start — Train a Vision Model
from afrilink import AfriLinkClient
client = AfriLinkClient()
client.authenticate()
# Submit a YOLOv8 training job to an A100 GPU
job = client.train(
script="train_yolo.py", # your training script
container="afrilink-yolo", # pre-built container with YOLOv8 + PyTorch
data="./dataset.tar.gz", # dataset (uploaded automatically)
data_config="dataset.yaml", # config file (e.g. YOLO dataset.yaml)
gpus=1,
time_limit="02:00:00",
)
result = job.run(wait=True)
# Check logs and download results
print(job.get_logs(tail=50))
client.transfer.download_file("$WORK/runs/best.pt", "./best.pt")
Installation
pip install afrilink-sdk
The package has zero required dependencies — heavy libraries (requests, torch, transformers, peft) are only needed at the point you actually use them and are pre-installed in most notebook environments.
Authentication
AfriLink uses a two-phase auth flow. Both phases happen inside a single client.authenticate() call:
| Phase | What happens | User action |
|---|---|---|
| 1. DataSpires | Validates your DataSpires account for billing/telemetry | Enter email + password when prompted |
| 2. HPC | Headless Selenium browser automation gets SSH certificates via Smallstep | Fully automatic (org credentials auto-provisioned) |
from afrilink import AfriLinkClient
client = AfriLinkClient()
client.authenticate() # prompts for DataSpires creds, then auto-handles HPC
# Or pass credentials explicitly:
client.authenticate(
dataspires_email="you@example.com",
dataspires_password="...",
)
After authentication you get:
- SSH certificate valid for ~12 hours (the SDK warns you before it expires — see Session Recovery)
- SLURM job manager ready to submit jobs
- SCP transfer manager ready to move files
- Telemetry tracker logging GPU-minutes to your DataSpires account
Check remaining credits
You can check your DataSpires balance after DataSpires-only auth (no HPC session) or after full authenticate().
from afrilink import AfriLinkClient
client = AfriLinkClient()
client.authenticate_dataspires() # DataSpires only
balance = client.check_credits()
if balance is None:
print("Not authenticated")
else:
print(f"Remaining credits: ${balance:.2f}")
Built-in User Guide
The SDK ships with an inline reference manual you can query from any notebook cell using a slash-style syntax:
import afrilink
afrilink/help # top-level index of all topics
afrilink/quickstart # step-by-step getting started guide
afrilink/auth # authentication & session management
afrilink/finetune # finetune job parameters & training modes
afrilink/credits # remaining credit balance
afrilink/specs # available models and GPU requirements
afrilink/datasets # dataset formats and upload
afrilink/transfer # SCP upload/download commands
afrilink/jobs # SLURM job management
Each page prints a formatted reference to your notebook output — no internet connection required.
API Reference
AfriLinkClient
Main entry point. Created once per notebook session.
| Method | Description |
|---|---|
authenticate() |
Full auth flow (DataSpires + HPC) |
train(script, container, data, gpus, ...) |
Create a TrainJob (general-purpose training) |
finetune(model, training_mode, data, gpus, ...) |
Create a FinetuneJob (LLM fine-tuning) |
download_model(job_id, local_dir) |
Download trained adapter weights |
upload_dataset(local_path, dataset_name) |
Upload dataset to HPC |
list_containers() |
List available training containers |
list_available_models(size=None) |
List models in the registry |
list_available_datasets() |
List datasets in the registry |
get_model_requirements(model, training_mode) |
GPU/memory recommendations |
list_jobs() |
List SLURM queue |
recover_session(download_dir=None) |
Re-authenticate + check/download tracked jobs |
cancel_job(job_id) |
Cancel a running job |
run_command(command) |
Run arbitrary shell command on HPC login node |
get_queue_status() |
SLURM partition info |
check_credits() |
Remaining DataSpires balance in USD (or None if not authenticated) |
cert_minutes_remaining |
Minutes until SSH certificate expires |
client.train()
General-purpose training on HPC. Use this for any training framework (YOLOv8, custom PyTorch, etc.). For LoRA/QLoRA LLM fine-tuning, use client.finetune() instead.
job = client.train(
script="train_yolo.py", # local Python script to upload and run
container="afrilink-yolo", # pre-built container (see Available Containers)
data="./dataset/", # local path, archive, DataFrame, or remote HPC path
data_config="dataset.yaml", # config file (e.g. YOLO dataset.yaml)
gpus=1, # number of A100 GPUs (1-32)
time_limit="04:00:00", # max wallclock (HH:MM:SS)
script_args="--epochs 100", # CLI arguments passed to your script
extra_files=["weights.pt"], # additional files to upload
memory_gb=64, # RAM per node (default: 64)
container_env={"KEY": "val"}, # extra environment variables
)
Available Containers:
| Name | Frameworks | Use Case |
|---|---|---|
afrilink-yolo |
Ultralytics, PyTorch, torchvision | Object detection, segmentation, pose estimation |
afrilink-finetune |
PyTorch, Transformers, PEFT, bitsandbytes | LLM fine-tuning |
Alternatively, pass a full SIF path as container= to use your own Singularity image.
Data handling:
| Input type | What happens |
|---|---|
| Local directory | Uploaded via SCP |
.tar.gz / .zip archive |
Uploaded and extracted on HPC |
| Single file | Uploaded to job directory |
pandas.DataFrame |
Serialised to JSONL, uploaded |
Path starting with $ or / |
Treated as remote HPC path (no upload) |
script_content= alternative: Instead of a file path, pass inline Python code as a string:
job = client.train(
script_content="from ultralytics import YOLO\nYOLO('yolov8n.pt').train(data='coco.yaml')",
container="afrilink-yolo",
gpus=1,
)
TrainJob
Returned by client.train().
| Method / Property | Description |
|---|---|
run(wait=True) |
Submit to SLURM. wait=True polls until done. |
cancel() |
Cancel the SLURM job |
get_logs(tail=100) |
Fetch recent log lines |
estimated_cost_usd() |
Estimate max cost based on GPUs and time limit |
status |
Current status string |
job_id |
AfriLink job ID (8-char UUID prefix) |
slurm_job_id |
SLURM numeric job ID (set after run()) |
run() returns a dict:
{
"job_id": "a1b2c3d4",
"slurm_job_id": "12345678",
"status": "completed", # or "submitted" if wait=False
"output_dir": "/path/...",
}
client.finetune()
job = client.finetune(
model="qwen2.5-0.5b", # model ID from registry
training_mode="low", # "low" | "medium" | "high"
data=my_dataframe, # pandas DataFrame, HF Dataset, or file path
gpus=1, # number of A100 GPUs
time_limit="01:00:00", # max wallclock (HH:MM:SS)
backend="cineca", # HPC backend cluster
output_dir=None, # default: $WORK/finetune_outputs
)
HPC Backends:
| Backend | Provider | Region | Status |
|---|---|---|---|
cineca |
CINECA Leonardo (EuroHPC) | Bologna, Italy | Available (default) |
eversetech |
EverseTech | Variable | Coming soon |
agh |
AGH | Variable | Coming soon |
acf |
ACF | Variable | Coming soon |
CINECA Leonardo — Hardware Specs:
Each GPU node on the Leonardo Booster partition (where AfriLink jobs run):
| Component | Specification |
|---|---|
| GPU per node | 4x NVIDIA A100 (custom) |
| GPU memory | 64 GB HBM2e per GPU (256 GB per node) |
| FP64 performance | 11.2 TFLOPS per GPU |
| FP32 performance | 22.4 TFLOPS per GPU |
| CPU cores per node | 32 |
| System RAM per node | 512 GB DDR4 |
| RAM per GPU (effective) | ~128 GB (shared, not partitioned) |
| Node interconnect | 200 Gb/s HDR InfiniBand |
| SLURM partition | boost_usr_prod |
Per-GPU memory guide:
| Model size | Training mode | Min GPUs recommended |
|---|---|---|
| 0.5B - 1B | low (QLoRA 4-bit) | 1 |
| 3B - 7B | low | 1 |
| 3B - 7B | high (bf16) | 2-4 |
| 13B | low | 2 |
| 13B | high | 4 |
| 30B+ | low or high | 4 |
Billing: $2.00 / GPU-hour, charged per completed GPU-minute (minimum 1 minute). Credits deducted automatically from your DataSpires balance.
Training modes:
| Mode | Strategy | Quantization | Typical GPUs |
|---|---|---|---|
low |
QLoRA (rank 8) | 4-bit | 1 |
medium |
LoRA (rank 16) | 8-bit / none | 1-2 |
high |
LoRA (rank 64) + DDP/FSDP | none | 2-4+ |
FinetuneJob
Returned by client.finetune().
| Method / Property | Description |
|---|---|
run(wait=True) |
Submit to SLURM. wait=True polls until done. |
cancel() |
Cancel the SLURM job |
get_logs(tail=100) |
Fetch recent log lines |
status |
Current status string |
job_id |
AfriLink job ID (8-char UUID prefix) |
slurm_job_id |
SLURM numeric job ID (set after run()) |
run() returns a dict:
{
"job_id": "a1b2c3d4",
"slurm_job_id": "12345678",
"status": "completed", # or "submitted" if wait=False
"output_dir": "/path/...",
"model_path": "/path/...",
}
Session Watchdog
The SDK monitors your SSH certificate in the background and prints a warning as it approaches expiry (at 60, 30, 15, and 5 minutes remaining). You can check time remaining at any point:
print(f"{client.cert_minutes_remaining:.0f} minutes remaining on SSH certificate")
Session Recovery
SSH certificates expire after ~12 hours. The SDK monitors this automatically and warns you before expiry. When you see the warning — or when you return to a notebook after being away — call recover_session() to re-authenticate and pick up where you left off:
# Re-authenticate and check on all tracked jobs
recovery = client.recover_session("./recovered-models")
# recovery.re_authenticated — True if fresh SSH cert was obtained
# recovery.jobs — status of each tracked SLURM job
# recovery.files_retrieved — list of model dirs downloaded for completed jobs
What recover_session() does:
- Re-authenticates with CINECA — gets a fresh SSH certificate without re-entering credentials
- Checks all tracked SLURM jobs — reports status of every job submitted in this session
- Downloads completed models — if you pass a
download_dir, finished adapters are pulled automatically - Registers email notification — for jobs still running, you'll get an email when they finish
Your SLURM jobs keep running on the cluster even after your certificate expires — you just need fresh credentials to check on them or download results.
# Minimal usage (just re-auth, no download)
client.recover_session()
# With download directory for completed jobs
client.recover_session("./my-models")
client.download_model()
client.download_model(result["job_id"], "./my-model")
Downloads adapter files (adapter_config.json, adapter_model.safetensors, tokenizer files) flat into the target directory — ready for PeftModel.from_pretrained().
Working With Your Model
Once you have downloaded adapter weights with client.download_model(), the adapter directory is ready for standard Hugging Face tooling.
GGUF Conversion & Ollama
Convert your adapter to GGUF format for use with Ollama or llama.cpp:
from transformers import AutoModelForCausalLM, AutoTokenizer
from peft import PeftModel
# 1. Merge adapter into base model
base = AutoModelForCausalLM.from_pretrained("Qwen/Qwen2.5-0.5B")
model = PeftModel.from_pretrained(base, "./my-model")
merged = model.merge_and_unload()
merged.save_pretrained("./my-model-merged")
AutoTokenizer.from_pretrained("Qwen/Qwen2.5-0.5B").save_pretrained("./my-model-merged")
# 2. Convert to GGUF (requires llama.cpp — see llama.cpp repo for build instructions)
# python convert_hf_to_gguf.py ./my-model-merged --outfile my-model.gguf --outtype f16
# 3. Quantize (optional, 4-bit)
# ./llama-quantize my-model.gguf my-model-q4.gguf Q4_K_M
# 4. Run with Ollama
# Create a Modelfile:
# FROM ./my-model-q4.gguf
# ollama create my-model -f Modelfile
# ollama run my-model
Publishing to Hugging Face Hub
from huggingface_hub import HfApi
api = HfApi(token="hf_...")
repo_id = "your-username/my-finetuned-model"
api.create_repo(repo_id, exist_ok=True)
# Option A — adapter only (small, loads on top of base model)
api.upload_folder(folder_path="./my-model", repo_id=repo_id)
# Option B — full merged model
api.upload_folder(folder_path="./my-model-merged", repo_id=repo_id)
# Option C — GGUF file
api.upload_file(path_or_fileobj="./my-model-q4.gguf",
path_in_repo="my-model-q4.gguf",
repo_id=repo_id)
Model & Dataset Registry
# List all models
client.list_available_models()
# Filter by size
client.list_available_models(size="tiny") # tiny | small | medium | large
# List datasets
client.list_available_datasets()
# Resource requirements
client.get_model_requirements("qwen2.5-0.5b", "low")
Available models (v0.1.0):
| ID | Name | Type | Params | Min VRAM |
|---|---|---|---|---|
qwen2.5-0.5b |
Qwen 2.5 0.5B | text | 0.5B | 4 GB |
gemma-3-270m |
Gemma 3 270M | text | 0.27B | 2 GB |
llama-3.2-1b |
Llama 3.2 1B | text | 1.0B | 4 GB |
deepseek-r1-1.5b |
DeepSeek R1 1.5B | text | 1.5B | 6 GB |
ministral-3b |
Ministral 3B | text | 3.3B | 8 GB |
florence-2-base |
Florence 2 Base | vision | 0.23B | 4 GB |
smolvlm-256m |
SmolVLM 256M | vision | 0.26B | 2 GB |
moondream2 |
Moondream 2 | vision | 1.9B | 8 GB |
internvl2-1b |
InternVL2 1B | vision | 1.0B | 4 GB |
llava-1.5-7b |
LLaVA 1.5 7B | vision | 7.0B | 16 GB |
Data Transfer
# Upload a dataset
client.upload_dataset("./train.jsonl", dataset_name="my-data")
# Download model weights
client.download_model("a1b2c3d4", "./my-model")
# List remote files
client.transfer.list_remote_files("$WORK/finetune_outputs/")
# Run shell commands on HPC
client.run_command("squeue -u $USER")
Dataset Formats
client.finetune(data=...) accepts:
| Type | How it's handled |
|---|---|
pandas.DataFrame |
Serialised to JSONL, uploaded via SCP |
datasets.Dataset |
Saved to disk, uploaded via SCP |
str (local path) |
Uploaded via SCP |
str (starts with $) |
Treated as remote HPC path (no upload) |
Your DataFrame should have a text column with the full prompt+response formatted as a single string (Alpaca-style or chat template).
Architecture
Notebook Interface High Performance Compute
+--------------+ SSH/SCP +------------------+
| AfriLink SDK | -------------------> | Login Node |
| | (Smallstep certs) | +- SLURM sbatch |
| DataSpires | | +- $WORK/ |
| (billing) | | | +- containers|
| | | | +- datasets |
+--------------+ | | +- finetune_ |
| | outputs/ |
| | +- {jobid}|
| +- Singularity |
| container |
| (A100 GPUs) |
+------------------+
Publishing to PyPI
For maintainers:
cd afrilink-sdk
pip install build twine
# Build wheel + sdist
python -m build
# Upload to PyPI (requires PyPI API token)
twine upload dist/*
You'll need a PyPI account at https://pypi.org and an API token configured in ~/.pypirc or passed via --username __token__ --password pypi-....
License
MIT
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file afrilink_sdk-0.7.5.tar.gz.
File metadata
- Download URL: afrilink_sdk-0.7.5.tar.gz
- Upload date:
- Size: 107.2 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.12.3
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
4d0e5f4ea37697c45a3b8acd33404d8651880c0272554fad9119f7546ebdb697
|
|
| MD5 |
7c072e5a0f3a969d99305d3adb24818a
|
|
| BLAKE2b-256 |
89be413703619be31c251ce18acc08d75a0a8d56260827af079d217add1cd969
|
File details
Details for the file afrilink_sdk-0.7.5-py3-none-any.whl.
File metadata
- Download URL: afrilink_sdk-0.7.5-py3-none-any.whl
- Upload date:
- Size: 110.2 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.12.3
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
6c9ed0d8bb774fd5c49aebfef2a09516224adfe08edaa92bfedfb90e6c189d97
|
|
| MD5 |
914cf90642c9044f248ac934eeca061c
|
|
| BLAKE2b-256 |
7e1854ed6ac57355b33bf98fde0ec2038175ad04119f2d53932d4d19663a3fee
|