Skip to main content

SAR Regexp Engine for Python

Project description

SAR - Simple Api for Regexp

Regexp SAR - a python module for multi match event handling regular expression engine

Description

SAR is a new way of handling regular expression which allows us to run many regular expressions (only limitation being the available memory) at once. When adding a regexp, there is also a related callback that will be called upon each match in the same order in which they appear on the text

Install

Before installation, make sure you have the latest version of pip:

pip install --upgrade pip

Install regexp-sar:

pip install regexp-sar

Import

from regexp_sar import RegexpSar

Example

'''
This example will find and print second match of each regexp,
while also showing what regexp was caught
'''

from regexp_sar import RegexpSar

sar = RegexpSar()

# string to be matched against
match_str = "hello world 123 abc 456 789"

# list of regexps, first item in pair is the regexp,
# second item in the pair is a unique word for that regexp
regexps = [
    ['\w+', 'word'],
    ['\d+', 'number'],
]

# add all regexps in a loop
for cur_regexp in regexps:
    def find_second_match(description):
        match_count = 0
        match_val = None

        # define inner method, to use with closure
        def callback(from_pos, to_pos):
            nonlocal match_count, match_val
            match_count += 1
            if match_count == 2:
                print("Match: " + str(description) + ": " + match_str[from_pos:to_pos])
            sar.continue_from(to_pos)
        return callback

    # add regexp with a callback
    sar.add_regexp(cur_regexp[0], find_second_match(cur_regexp[1]))

# run match
sar.match(match_str)
'''
Output:
    Match: word: world
    Match: number: 456
'''

Methods

constructor

creates a new sar instance with its own regexps and callbacks, many instances can be built at once

add_regexp

adds a regexp into the sar instance, recieved 2 parameters:

  • regexp - the required regexp
  • callback - the callback which will be called upon match, the callback receives 2 parameters -
    • from_pos - the start position of the match in the matched string
    • to_pos - the end position of the match in the matched string (exclude to_pos)
sar = RegexpSar()
sar.add_regexp('abc', lambda from_pos, to_pos: print("Match: " + str(from_pos) + "->" + str(to_pos)))


sar.match("hello abc world") # Match: 6->9

match

begins a match against the previously defined regexps on the received string. receive 1 parameter:

  • string to be matched with

  • NOTE: this is syntactic sugar for match_from(match_str, 0)

match_from

acts like match but starts from a custom position the search. receive 2 parameters:

  • string to be matched with
  • start position of the match

match_at

looks for a match from a specific character only, and will not continue to search for matches in the following characters

continue_from

Can be called only during a match/match_from, will continue the next match character from the given character index

receive 1 parameter:

  • position for next match

stop_match

Can be called only during a match/match_from, will stop the match after current char matching has ended

rules

abbreviations

  • . - matches any character
  • \d - matches a digit character (checked by isdigit method)
  • \w - matches an alphanumeric character (checked by isalnum method)
  • \a - matches an alpha character (checked by isalpha method)
  • \s - matches a space character (checked by isspace method)
  • ^ - matches a character NOT followed by the match (i.e. \^\d+ will match all non digit strings)

repetition

  • '?' - matches 1 or 0 times
  • '*' - matches 0 or more times
  • '+' - matches 1 or more times

backslash ('\') character

in order to match the '\' character, it needed to be followed by 3 more backslashes (4 in total) since python string takes 2 backslashes to be represented as one

sar = RegexpSar()
sar.add_regexp('\\\\', lambda from_pos, to_pos: print("Match: " + str(from_pos) + "->" + str(to_pos)))
sar.match('a\\b') # Match: 1->2

examples

Examples may be found in the test_oousage.py file, and in the examples directory

Articles

Unicode support

Currently not supported. May be added in future update

Author

Noam Nisanov - noam.nisanov@gmail.com

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

regexp-sar-0.1.2b5.tar.gz (84.3 kB view details)

Uploaded Source

Built Distributions

If you're not sure about the file name format, learn more about wheel file names.

regexp_sar-0.1.2b5-cp38-cp38-win_amd64.whl (17.0 kB view details)

Uploaded CPython 3.8Windows x86-64

regexp_sar-0.1.2b5-cp36-abi3-manylinux2010_x86_64.whl (40.9 kB view details)

Uploaded CPython 3.6+manylinux: glibc 2.12+ x86-64

File details

Details for the file regexp-sar-0.1.2b5.tar.gz.

File metadata

  • Download URL: regexp-sar-0.1.2b5.tar.gz
  • Upload date:
  • Size: 84.3 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.2 CPython/3.11.4

File hashes

Hashes for regexp-sar-0.1.2b5.tar.gz
Algorithm Hash digest
SHA256 8fbc444c7bb2e9d547f3fbf13aef2269cb20df59c8335c459b632237ad29d969
MD5 7736aa2c9ac5dab8c0fa6628401ee04a
BLAKE2b-256 0fa2b1993d4cdb54fdc8ebfe6056bbf5397c9e3a12fb9f7bd66f7991825de8ad

See more details on using hashes here.

File details

Details for the file regexp_sar-0.1.2b5-cp38-cp38-win_amd64.whl.

File metadata

  • Download URL: regexp_sar-0.1.2b5-cp38-cp38-win_amd64.whl
  • Upload date:
  • Size: 17.0 kB
  • Tags: CPython 3.8, Windows x86-64
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.2 CPython/3.11.4

File hashes

Hashes for regexp_sar-0.1.2b5-cp38-cp38-win_amd64.whl
Algorithm Hash digest
SHA256 80475505362940a87b55f1424760bc8461871f8cbdfb6a00fdb0f985246d398d
MD5 daf97d7d970ac6a3c2d391cad37b6c16
BLAKE2b-256 2a364595badff1e4adce3daa4abcacd916f15c4282d302ebe8b1e11fc39745ed

See more details on using hashes here.

File details

Details for the file regexp_sar-0.1.2b5-cp36-abi3-manylinux2010_x86_64.whl.

File metadata

File hashes

Hashes for regexp_sar-0.1.2b5-cp36-abi3-manylinux2010_x86_64.whl
Algorithm Hash digest
SHA256 59c2df0ae6b5561ac128771344e644264ce311551909cda084cb30c28b10e1e9
MD5 8c18fc51b1930c050ae97acb6ba076a4
BLAKE2b-256 0f85012db2ae9d4d808441cbf61672a889426542b58edc0663a53cc4e073c119

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page