Speech-driven bots and services (e.g. Telegram) with pluggable speech matching.
Project description
The speechbot package is a framework for building speech-first chatbots with a block tree. It is designed for bots where a user moves between options by speaking a word.
The bot is configured with:
a block tree that defines steps and word labels for moving between blocks
a speech engine that matches incoming voice messages to those labels
This repository also contains a Telegram service implementation so the bot can be run as a chat bot.
At a high level, speechbot runs a voice menu. A JSON file named tree.json lists the conversation steps and the spoken words that move between them.
CLI guide
This section focuses on running the bot and configuring tree.json.
Quick start
Install the package.
Set TELEGRAM_BOT_TOKEN.
Save a minimal tree.json.
Run the bot and complete the setup prompts for missing audio.
Minimal tree.json:
{
"root_id": "root",
"blocks": [
{
"id": "root",
"prompt_text": "Say hello or help",
"edges": [
{"word": "hello", "to": "hello"},
{"word": "help", "to": "help"}
]
},
{
"id": "hello",
"prompt_text": "Last word: {last_word}",
"edges": []
},
{
"id": "help",
"prompt_text": "Help menu",
"edges": []
}
]
}
Run from the directory that contains tree.json:
export TELEGRAM_BOT_TOKEN="123456:ABC..."
python3 -m speechbot telegram --tree tree.json --data-dir data \
--speech-engine speechmatching --debug-users <admin_user_id>
Expected messages (example):
Bot: <audio message for "Say hello or help"> Bot: Say hello or help User: <voice message> Bot: Heard: hello Bot: Last word: hello
Requirements
Install the package from PyPI [pypi]
pip install speechbot
The Telegram service requires a Telegram bot token. To get a token, create a bot with BotFather [botfather] in Telegram and copy the token it returns. BotFather can be found by searching for @BotFather in Telegram. Store the token in TELEGRAM_BOT_TOKEN before starting the bot.
The default speech engine uses the speechmatching package. Typical requirements include:
ffmpeg available on PATH (Telegram voice messages are compressed audio files);
the Python dependencies for speechmatching;
Docker access for the default Docker-based speech model.
Docker image
The CLI is available as the Docker image aukesch/speechbot. This can be used to run the bot without installing the Python package locally.
Example run:
docker pull aukesch/speechbot
docker run --rm \
-e TELEGRAM_BOT_TOKEN="123456:ABC..." \
-v "$PWD":/work \
-w /work \
--entrypoint speechbot \
aukesch/speechbot \
telegram --tree tree.json --data-dir data --speech-engine speechmatching \
--debug-users <admin_user_id>
Example Dockerfile:
FROM aukesch/speechbot
COPY . /app
WORKDIR /app
ENTRYPOINT ["speechbot"]
CMD ["telegram", "--tree", "tree.json", "--data-dir", "data",
"--speech-engine", "speechmatching"]
Run the bot
Export the token, then start the Telegram service:
export TELEGRAM_BOT_TOKEN=\"123456:ABC...\"
python3 -m speechbot telegram --tree examples/basic/tree.json \
--data-dir data --speech-engine speechmatching \
--debug-users <admin_user_id>
On startup, speechbot checks that all required assets exist. If word recordings, prompt recordings, or referenced media files are missing, the bot starts a guided setup process in Telegram to collect them.
While collecting recordings and uploads, you can send multiple voice messages per word recording. For prompt recordings and media files, sending another upload replaces the previous one. The /next command moves on, /skip moves on without saving, /status shows remaining items, and /done finishes setup.
When setup is active, the bot temporarily switches to a temporary setup tree. When --debug-users is set, setup is limited to those user identifiers. Setup is limited to one user per chat, and other users will see a busy message until setup completes. When all required assets exist, the bot returns to the main tree and normal interaction continues.
User commands
/start resets the state to the tree root.
/undo restores the previous state snapshot.
The tree cannot be moved through using text messages. Text messages only replay the prompt for the current block.
Example setup transcript
This is a short example of the prompt recording setup process:
Bot: Prompt setup (1/2). Send a voice message (or audio file) reading this text aloud: Bot: Bot: Hello there Bot: Bot: Send /next to continue (after at least 1 recording), /skip to move on without saving, /status for progress. Sending another recording replaces the previous one. User: <voice message> Bot: Saved prompt recording. Send another recording to replace it, or /next for the next prompt. User: /next Bot: Prompt setup (2/2). Send a voice message (or audio file) reading this text aloud: Bot: Bot: Welcome Bot: Bot: Send /next to continue (after at least 1 recording), /skip to move on without saving, /status for progress. Sending another recording replaces the previous one. User: <voice message> Bot: Saved prompt recording. All required prompt recordings exist. Send another recording to replace it, or /done (or /next) to finish setup. User: /done Bot: Prompt setup complete. Continuing...
Shop builder
The shop builder is an interactive admin process that runs inside Telegram and writes a new tree to data/shop/tree.json:
python3 examples/shop_builder/main.py telegram --data-dir data
When the shop is published (/publish), the builder also creates a .zip package (data/shop/shop.zip) containing tree.json and all referenced shop media under data/. The builder sends that zip back via Telegram.
To run the generated shop directly from that zip, start speechbot without a local tree.json and upload the zip as a document:
python3 -m speechbot telegram --data-dir data --debug-users <admin_user_id>
The generated tree can also be run directly by referencing its path:
python3 -m speechbot telegram --tree data/shop/tree.json \
--data-dir data --speech-engine speechmatching
CLI reference
The CLI accepts one required argument and several optional arguments.
Required argument:
service: service name. Only telegram is supported.
Optional arguments:
--token: service token. If not set, the TELEGRAM_BOT_TOKEN environment variable is used.
--poll-timeout-s: long polling timeout in seconds.
--tree: path to the block tree JSON file. If not set, the BOT_TREE environment variable is used and defaults to tree.json.
--data-dir: root data directory. If not set, the BOT_DATA environment variable is used and defaults to data.
--debug-users: Telegram user identifiers separated by comma that are allowed to run debug commands. If not set, the BOT_DEBUG_USERS environment variable is used.
--speech-engine: matcher engine under speechbot/matchers. If not set, the BOT_SPEECH_ENGINE environment variable is used and defaults to speechmatching.
Tree.json reference
The tree JSON file defines blocks and how to move between blocks. Each block has a list of edges that map a word label to a destination block id.
Each block is a step in the conversation. The prompt_text is shown when the block is active. Each edge is a spoken option that moves to another block. A recording is needed for each word label under data/recordings/<word>/. A minimal example appears in Quick start.
Structure
The top level keys are:
root_id: id of the entry block.
blocks: list of block objects.
Each block supports:
id: unique block id.
prompt_text: text shown to the user when the block is active. If empty, the default prompt Say one of the available options. is used.
edges: list of edges in the form {"word": "...", "to": "..."}.
on_enter: optional section for actions and context updates.
The on_enter section supports:
text: a text message that is sent when entering the block.
photo: list of photo paths to send.
video: list of video paths to send.
audio: list of audio paths to send.
context: map to set context keys.
context_inc: map of number increases.
context_delete: list of keys to remove from context.
If multiple photos or videos are provided, the service sends them as an album. Text fields in prompt_text and on_enter.text are formatted with text.format(**context).
Media paths are resolved from the current working directory, or relative to the tree JSON file location if the file is not found.
Files on disk
Path |
Purpose |
|---|---|
tree.json |
Block tree definition |
data/state/ |
Per-user state JSON files |
data/recordings/<word>/ |
Word recordings for matching |
data/prompts/prompt_<sha256>/ |
Prompt recordings for prompts |
data/inbox/ |
Downloaded service media |
data/media/ |
Media referenced by tree.json |
Prompt recordings send a spoken version of prompt text. They are not used for word matching.
State and debug
Context
Per-user context is stored in the user state file and is carried across blocks. The bot updates some keys automatically:
last_word
from_block_id
block_id
Text can include simple formatting expressions using text.format(**context). If formatting fails (for example because a key is missing), the original string is kept.
Missing format keys do not raise a user-visible error. The original text is used instead.
Undo history
The bot keeps a limited history of previous states. Users can restore the most recent state using the /undo command.
Debug commands
Debug commands can be enabled for specific Telegram user identifiers, for example with:
python3 -m speechbot telegram --tree examples/basic/tree.json \
--data-dir data --debug-users 123,456
When enabled, the following commands are available:
/debug shows the raw state information.
/where shows the current block id.
/context shows the current context map.
/history shows the history length.
Developer guide
This section is for extending speechbot in Python.
Custom blocks
Blocks can be implemented in Python by using speechbot.blocks.CustomBlock and setting a block_id class attribute. Custom blocks run inside the bot like normal blocks, but they should override handle to implement custom logic. The handle method receives the incoming message, the user state, and a runtime object. When running under the standard bot, that runtime object is the speechbot.bot.Bot instance.
Custom blocks are used by example code such as the shop builder, where the interactive logic is written in Python rather than purely in JSON.
Example:
from speechbot.blocks import CustomBlock
from speechbot.protocol import OutgoingText
class HelloBlock(CustomBlock):
block_id = 'hello'
def __init__(self, prompt_text='Say hello'):
CustomBlock.__init__(self, prompt_text)
def handle(self, incoming, state, runtime):
return ([OutgoingText(
chat_id=incoming.chat_id,
text='Hello from Python.'
)], None)
def on_enter_actions(self, incoming, state, runtime):
return [OutgoingText(
chat_id=incoming.chat_id,
text='Entering the hello block.'
)]
Custom services
The CLI only uses the Telegram service. To use another service, write a custom service that connects the platform to the bot message handler.
A custom service needs to:
receive messages and map them to Incoming classes from speechbot.protocol
include service, chat_id, user_id and message_id along with any metadata in meta
download media to disk and set path for IncomingVoice, IncomingAudio, IncomingPhoto, IncomingVideo and IncomingDocument
call the bot message handler and run every returned Outgoing action
map OutgoingMediaGroup to an album when supported, or send each item
Example:
from speechbot.protocol import IncomingText, OutgoingText
class DummyService:
def __init__(self):
self._message_handler = None
def run(self, message_handler):
self._message_handler = message_handler
incoming = IncomingText(
service='dummy',
chat_id=1,
user_id=1,
message_id=1,
data='hello'
)
actions = message_handler(incoming)
self._send(actions)
def _send(self, actions):
for action in actions:
if type(action) is OutgoingText:
self._send_text(action.chat_id, action.text)
def _send_text(self, chat_id, text):
print('send to {}: {}'.format(chat_id, text))
Most services will also need a run loop like speechbot.services.telegram.TelegramService.run. The built-in CLI does not know about new services, so create a custom entrypoint or extend speechbot/cli.py to add a new service option.
Speech engines
speechmatching is the default matcher, but additional engines can be added. Create a module under speechbot/matchers that implements SpeechEngine from speechbot.matchers. The engine must provide add_recording and match. If debug output is needed, implement set_debug and get_last_debug similar to the speechmatching engine.
Add the engine to load_speech_engine in speechbot/matchers/__init__.py so the --speech-engine option can find it.
Example:
from speechbot.matchers import SpeechEngine
class DummyEngine(SpeechEngine):
def __init__(self):
self._labels = set()
def add_recording(self, identifier, path):
self._labels.add(identifier)
def match(self, voice_path, identifiers=None):
if identifiers is None:
selected_identifiers = list(self._labels)
else:
selected_identifiers = [
i for i in identifiers if i in self._labels
]
if len(selected_identifiers) == 0:
return None
return selected_identifiers[0]
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file speechbot-1.0.0.tar.gz.
File metadata
- Download URL: speechbot-1.0.0.tar.gz
- Upload date:
- Size: 44.8 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.13.5
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
0c53e91a1d0ac40f3c4fb81bf0ab47253f1d20b6fd2a7d560908dde3995d492d
|
|
| MD5 |
b47f768a37ab5852521dfdde8774e277
|
|
| BLAKE2b-256 |
c4c03ab499c78e4da3f0b5d48933802ebee647030438970a713b1367a72fac5b
|
File details
Details for the file speechbot-1.0.0-py3-none-any.whl.
File metadata
- Download URL: speechbot-1.0.0-py3-none-any.whl
- Upload date:
- Size: 49.0 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.13.5
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
94a27f6b90628fc7efbe6535e3eca53c97b1e011f023c554c3742bd798d83abb
|
|
| MD5 |
954aff6b9df65af26244ac6d99d0135d
|
|
| BLAKE2b-256 |
be1bbd200617c2a68308b803bfd089f67cca50e4ac0b0f8b226f297b9cc0fd34
|