Skip to main content

A Python package for Llama CPP.

Project description

Llama CPP

This is a Python package for Llama CPP ( https://github.com/ggml-org/llama.cpp ).

Installation

You can install the pre-built wheel from the releases page or build it from source.

pip install llama-cpp-pydist

Usage

This section provides a basic overview of how to use the llama_cpp_pydist library.

Deploying Windows Binaries

If you are on Windows, the package attempts to automatically deploy pre-compiled binaries. You can also manually trigger this process.

from llama_cpp import deploy_windows_binary

# Specify the target directory for the binaries
# This is typically within your Python environment's site-packages
# or a custom location if you prefer.
target_dir = "./my_llama_cpp_binaries" 

if deploy_windows_binary(target_dir):
    print(f"Windows binaries deployed successfully to {target_dir}")
else:
    print(f"Failed to deploy Windows binaries or no binaries were found for your system.")

# Once deployed, you would typically add the directory containing llama.dll (or similar)
# to your system's PATH or ensure your application can find it.
# For example, if llama.dll is in target_dir/bin:
# import os
# os.environ["PATH"] += os.pathsep + os.path.join(target_dir, "bin")

Conversion Library Installation

To perform Hugging Face to GGUF model conversions, you need to install additional Python libraries. You can install them via pip:

pip install transformers numpy torch safetensors sentencepiece

Alternatively, you can install them programmatically in Python:

from llama_cpp.install_conversion_libs import install_conversion_libs

if install_conversion_libs():
    print("Conversion libraries installed successfully.")
else:
    print("Failed to install conversion libraries.")

Converting Hugging Face Models to GGUF

This package provides a utility to convert Hugging Face models (including those using Safetensors) into the GGUF format, which is used by llama.cpp. This process leverages the conversion scripts from the underlying llama.cpp submodule.

1. Install Conversion Libraries:

Before converting models, ensure you have the necessary Python libraries. You can install them using a helper function:

from llama_cpp import install_conversion_libs

if install_conversion_libs():
    print("Conversion libraries installed successfully.")
else:
    print("Failed to install conversion libraries. Please check the output for errors.")

2. Convert the Model:

Once the dependencies are installed, you can use the convert_hf_to_gguf function:

from llama_cpp import convert_hf_to_gguf

# Specify the Hugging Face model name or local path
model_name_or_path = "TinyLlama/TinyLlama-1.1B-Chat-v1.0"  # Example: A small model from Hugging Face Hub
# Or, a local path: model_name_or_path = "/path/to/your/hf_model_directory"

output_directory = "./converted_gguf_models" # Directory to save the GGUF file
output_filename = "tinyllama_1.1b_chat_q8_0.gguf" # Optional: specify a filename
quantization_type = "q8_0"  # Example: 8-bit quantization. Common types: "f16", "q4_0", "q4_K_M", "q5_K_M", "q8_0"

print(f"Starting conversion for model: {model_name_or_path}")
success, result_message = convert_hf_to_gguf(
    model_path_or_name=model_name_or_path,
    output_dir=output_directory,
    output_filename=output_filename, # Can be None to auto-generate
    outtype=quantization_type
)

if success:
    print(f"Model converted successfully! GGUF file saved at: {result_message}")
else:
    print(f"Model conversion failed: {result_message}")

# The `result_message` will contain the path to the GGUF file on success,
# or an error message on failure.

This function will download the model from Hugging Face Hub if a model name is provided and it's not already cached locally by Hugging Face transformers. It then invokes the convert_hf_to_gguf.py script from llama.cpp.

For more detailed examples and advanced usage, please refer to the documentation of the underlying llama.cpp project and explore the examples provided there.

Building and Development

For instructions on how to build the package from source, update the llama.cpp submodule, or other development-related tasks, please see BUILDING.md.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

llama_cpp_pydist-0.7.0.tar.gz (36.7 MB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

llama_cpp_pydist-0.7.0-py3-none-any.whl (37.4 MB view details)

Uploaded Python 3

File details

Details for the file llama_cpp_pydist-0.7.0.tar.gz.

File metadata

  • Download URL: llama_cpp_pydist-0.7.0.tar.gz
  • Upload date:
  • Size: 36.7 MB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.12.3

File hashes

Hashes for llama_cpp_pydist-0.7.0.tar.gz
Algorithm Hash digest
SHA256 429c41d1ac6fae3c34f1c5c0feb71588a94cb9f4dcb5d42175c7fc5d4e56b545
MD5 f43353497fc4e7d297570b491b1cd3b8
BLAKE2b-256 a4fcb9875c8d9a9ce262a60aebf0bbd5423b84b7019fbeaf159bd84829e92daf

See more details on using hashes here.

File details

Details for the file llama_cpp_pydist-0.7.0-py3-none-any.whl.

File metadata

File hashes

Hashes for llama_cpp_pydist-0.7.0-py3-none-any.whl
Algorithm Hash digest
SHA256 192a67b69d21a41521c21dd615656e25e4613ebc0578298b8221d7926f8ce31b
MD5 99c89f2690dc84fb847d3f68a7981b42
BLAKE2b-256 3439586c0ceefc576a788c3dfffde8a872266057e4f92fb74f166ab2940a6341

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page