Skip to main content

Implementation of Nested Hierarchical Pitman-Yor Language Model for Python

Project description

Nested Hierarchical Pitman-Yor Languge Model, Unsupervised Segmentation for Python

Fast Cython implementation of the Nested Hierarchical Pitman-Yor Language Model (NHPYLM) for segmentation and classification.

The library provides two models:

  • NHPYLMModel: Performs unsupervised segmentation.
  • NHPYLMClassesModel: Extends unsupervised segmentation by incorporating classification during inference. Each class is associated with a separate NHPYLM submodel, which independently segments sequences based on its learned structure. During inference, an input sequence is segmented by all submodels, and the most probable model determines the predicted class.

The model processes the input string Thecatquietlyobservedthebirdsoutsidebeforeleapingontothewindowsill. and outputs the segmented sentence: The cat quietly observed the birds outside before leaping onto the windowsill.. Additionally, the NHPYLMClassesModel can also predict the class of the sentence, for instance observation.

Usage

Install the nhpylm package

pip install nhpylm

NHPYLMModel (Segmentation)

from nhpylm.models import NHPYLMModel

train_x = ["aaaaaaaaa", "aaaabbbbbbaaa", "aaabbbbbcbaaa"]
dev_x = ["aaaaaaabbba", "abaaaaaccaaa", "bbbaaa"]
test_x = ["bbb", "aaaaa", "aaaaaaa"]
epochs = 20

# Init model
model = NHPYLMModel(7, init_d = 0.5, init_theta = 2.0,
                init_a = 6.0, init_b = 0.83333333,
                beta_stops = 1.0, beta_passes = 1.0,
                d_a = 1.0, d_b = 1.0, theta_alpha = 1.0, theta_beta = 1.0)
# Train and Fit model
model.train(train_x, dev_x, epochs, True, True, print_each_nth_iteration=10)


# Predictions
train_segmentation, train_perplexity = model.predict_segments(train_x)
print("Train Perplexity: {}".format(train_perplexity))
print(train_segmentation)
dev_segmentation, dev_perplexity = model.predict_segments(dev_x)
print("Dev Perplexity: {}".format(dev_perplexity))
print(dev_segmentation)
test_segmentation, test_perplexity = model.predict_segments(test_x)
print("Test Perplexity: {}".format(test_perplexity))
print(test_segmentation)

NHPYLMClassesModel (Segmentation & Classification)

Classification model based on the Conditional NHPYLM segmentation.

from nhpylm.models import NHPYLMClassesModel

train_x = ["aaaaaaaaa", "aaaabbbbbbaaa", "aaabbbbbcbaaa"]
train_y = ["class1", "class1", "class2"] 
dev_x = ["aaaaaaabbba", "abaaaaaccaaa", "bbbaaa"]
dev_y = ["class2", "class2", "class1"]
test_x = ["bbb", "aaaaa", "aaaaaaa"]
test_y = ["class2", "class1", "class1"]
epochs = 20

# Init model
model = NHPYLMClassesModel(7, init_d = 0.5, init_theta = 2.0,
                init_a = 6.0, init_b = 0.83333333,
                beta_stops = 1.0, beta_passes = 1.0,
                d_a = 1.0, d_b = 1.0, theta_alpha = 1.0, theta_beta = 1.0)
# Train and Fit model
model.train(train_x, dev_x, train_y, dev_y, epochs, True, True, print_each_nth_iteration=10)


# Predictions
train_segmentation, train_perplexity, train_mode_prediction = model.predict_segments_classes(train_x)
print("Train Perplexity: {}".format(train_perplexity))
print(train_mode_prediction)
print(train_segmentation)
dev_segmentation, dev_perplexity, dev_mode_prediction = model.predict_segments_classes(dev_x)
print("Dev Perplexity: {}".format(dev_perplexity))
print(dev_mode_prediction)
print(dev_segmentation)
test_segmentation, test_perplexity, test_mode_prediction = model.predict_segments_classes(test_x)
print("Test Perplexity: {}".format(test_perplexity))
print(test_mode_prediction)
print(test_segmentation)

Install this package locally

pip install .

About NHPYLM

NHPYLM predicts the most probable segmentation for input sequences using a nested structure of Hierarchical Pitman-Yor Language Models (HPYLMs). The first HPYLM operates at the segment level, learning distributions of melodic or textual units. The second HPYLM models character-level (or tone-level) distributions, forming the base for segmentation.

Training is performed using Gibbs sampling, iteratively refining segmentations to maximize posterior probabilities. At inference, segmentation is determined via the Viterbi algorithm.

For classification, multiple NHPYLM models are trained per category (e.g., chant modes). An unknown sequence is segmented by all models, and the most probable model assigns the class.

How to cite

Anonymized for Submission

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

nhpylm-0.0.1.1.tar.gz (540.6 kB view details)

Uploaded Source

File details

Details for the file nhpylm-0.0.1.1.tar.gz.

File metadata

  • Download URL: nhpylm-0.0.1.1.tar.gz
  • Upload date:
  • Size: 540.6 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.10.12

File hashes

Hashes for nhpylm-0.0.1.1.tar.gz
Algorithm Hash digest
SHA256 0308bb582ec3fc8f4801598332753918aad25d5b6c1152de906d6283f03350fa
MD5 537dd3dba0ca3d016be468bfe67bb9e1
BLAKE2b-256 766d8d8191128803683e4845cb6426c23ec9dfacdd9a6bca85b0ca7d1a6bf14f

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page