OpenNMT Tokenizer as TensorFlow Operations
Project description
OpenNMT Tokenizer TensorFlow Ops
DISCLAIMER: This package is not published by the OpenNMT authors.
Full credits for OpenNMT Tokenizer
and OpenNMT-tf goes to their respectively
authors.
This project aims to wrap OpenNMT Tokenizer into TensorFlow Ops.
It's primarily intended to be used as an addition to the OpenNMT-tf framework, in order to remove the need of applying tokenization and/or detokenization outside of a serving environment (e.g. TensorFlow Serving).
Compatibility
- TensorFlow
2.1,2.2 - OpenNMT-tf >=
2.6.0for usage in conjunction with OpenNMT-tf
Installation
Prerequisites :
- A Linux environment (
manylinux2014eligible) - Python
3.5,3.6,3.7or3.8
Install the package with pip :
pip install tensorflow-onmttok-ops
Usage
Available Tokenizer options
The majority of the OpenNMT Tokenizer
options
are available.
However, providing BPE or SentencePiece models is not supported,
and by extension, setting the tokenizer mode to none is not supported.
You therefore cannot use the following options :
bpe_model_pathsp_model_pathsp_nbest_sizesp_alphavocabulary_pathvocabulary_threshold
Note: Tokenizer options are defined at graph construction time and are constants.
Tokenization
import tensorflow_onmttok as tf_onmttok
tokens = tf_onmttok.tokenize(["Hello, how are you?"], mode="conservative")
Detokenization
import tensorflow_onmttok as tf_onmttok
text = tf_onmttok.detokenize(["How", "are", "you", "?"], mode="space")
With OpenNMT-tf
Usage with OpenNMT-tf is pretty straightforward.
This package comes with a built-in tokenizer
in order to make usage of the ops.
-
Before training your model, register the tokenizer as follows :
from tensorflow_onmttok import register_opennmt_in_graph_tokenizer register_opennmt_in_graph_tokenizer()
See the complete example
-
Now that the tokenizer is registered, you can use the
OpenNMTInGraphTokenizerclass instead ofOpenNMTTokenizerin your tokenization configuration files, e.g. :type: OpenNMTInGraphTokenizer params: mode: conservative case_feature: true
-
That's it ! You can now train your model as usual. Your
ExportedModelwill now expect atextinput instead oftokensandlength.Note: Tokenization resources will not be exported to the
assets.extradirectory.
Build TF Serving with this Ops
This guide will show you how to build TensorFlow Serving with this ops.
Prerequisites
- You have already cloned the
TF Serving
>= 2.1.0repository, and have all tools installed for building it - You have installed CMake
3.1.0or newer
Building
Add the Ops sources
First, download the release of your choice.
Inside the TF Serving sources folder, create a directory
named custom_ops and copy the content of the tensorflow_onmttok
directory into it.
$ cd <tf_serving_sources>
$ mkdir tensorflow_serving/custom_ops
$ cp -r <op_sources>/tensorflow_onmttok tensorflow_serving/custom_ops
Reference the Ops
Edit tensorflow_serving/model_servers/BUILD to reference
the Ops build target :
SUPPORTED_TENSORFLOW_OPS = [
...
"//tensorflow_serving/custom_ops/tensorflow_onmttok:onmttok_ops"
]
Build OpenNMT Tokenizer from sources
The last step is to build a static version of the
OpenNMT Tokenizer library.
This repository provides a shell script
that will build it with CMake.
$ cd <op_sources>
$ chmod +x build_tokenizer.sh && ./build_tokenizer.sh
Note: Pass
sudoargument to thebuild_tokenizer.shscript to execute themake installcommand with sudo.
Build TensorFlow Serving
You can now build TensorFlow Serving as usual.
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distributions
Built Distributions
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file tensorflow_onmttok_ops-0.4.0-cp38-cp38-manylinux2014_x86_64.whl.
File metadata
- Download URL: tensorflow_onmttok_ops-0.4.0-cp38-cp38-manylinux2014_x86_64.whl
- Upload date:
- Size: 144.8 kB
- Tags: CPython 3.8
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/3.2.0 pkginfo/1.5.0.1 requests/2.24.0 setuptools/41.4.0 requests-toolbelt/0.9.1 tqdm/4.48.2 CPython/3.8.0
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
75eb8962f0af155244724c64e1dd48e985abe27dd773fd2762d782bcccdfdde8
|
|
| MD5 |
ecedc11d1438f9799a6205b76fe4c427
|
|
| BLAKE2b-256 |
19685818031172da3dce2558be7dbca0863b92de4c3151edf8a8f3dc81df4836
|
File details
Details for the file tensorflow_onmttok_ops-0.4.0-cp37-cp37m-manylinux2014_x86_64.whl.
File metadata
- Download URL: tensorflow_onmttok_ops-0.4.0-cp37-cp37m-manylinux2014_x86_64.whl
- Upload date:
- Size: 144.8 kB
- Tags: CPython 3.7m
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/3.2.0 pkginfo/1.5.0.1 requests/2.24.0 setuptools/47.3.1 requests-toolbelt/0.9.1 tqdm/4.48.2 CPython/3.7.1
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
fc9dc0a31d9a9786bd869246c87d5efdfe0abd84ef5afacd395d97a38d566fd8
|
|
| MD5 |
93db0ecdc824b2b9cae71f2a4651661b
|
|
| BLAKE2b-256 |
708608c8768f449aed80983641d30e3dd93d7e220dab315b1a2b6ce17a870bbf
|
File details
Details for the file tensorflow_onmttok_ops-0.4.0-cp36-cp36m-manylinux2014_x86_64.whl.
File metadata
- Download URL: tensorflow_onmttok_ops-0.4.0-cp36-cp36m-manylinux2014_x86_64.whl
- Upload date:
- Size: 144.8 kB
- Tags: CPython 3.6m
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/3.2.0 pkginfo/1.5.0.1 requests/2.24.0 setuptools/47.3.1 requests-toolbelt/0.9.1 tqdm/4.48.2 CPython/3.6.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
1c32c2d23e17a48fb7338359a42919da80f201c4dd305a6e804a4563d0457012
|
|
| MD5 |
8bca973f4bc33264c15aa81b7d945000
|
|
| BLAKE2b-256 |
ee263c07030c7adb4cd33d1162c5f7a18b0ff431f409cbb81c6c218d29d3f1a8
|
File details
Details for the file tensorflow_onmttok_ops-0.4.0-cp35-cp35m-manylinux2014_x86_64.whl.
File metadata
- Download URL: tensorflow_onmttok_ops-0.4.0-cp35-cp35m-manylinux2014_x86_64.whl
- Upload date:
- Size: 144.8 kB
- Tags: CPython 3.5m
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/1.15.0 pkginfo/1.5.0.1 requests/2.24.0 setuptools/49.6.0 requests-toolbelt/0.9.1 tqdm/4.48.2 CPython/3.5.6
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
3aa028aab720c7dde021e89394a7c15be9655e7992ce38122a0eb3a750aeea37
|
|
| MD5 |
5ebdb34ad6469b967b0b168098305d9f
|
|
| BLAKE2b-256 |
eaed8fed6a5c4ed31c1dd32fe8a70b5814a7ad70ff6eeb078c3e633f27a9bbfd
|