Skip to main content

Data Lineage for Python

Project description

PyPI - Python PyPI - PyPi PyPI - License

pylineage

This package provides simple tools for parsing and visualizing your .sql scripts.

Installation

The package is distributed through PyPI, and can be installed with

pip install pylineage

In order to create a lineage graph you should also install and add Graphviz to your Path.

SQL Parser

Individual basic .sql scripts can be parsed by running

from pylineage import SqlParser

parser = SqlParser()
query = """

SELECT column1
     , column2 AS c2
FROM my_table t
WHERE column1 = 1

"""

parsed = parser.parse(query)

The output looks as follows

>>> parsed

{
  "select": [{ "content": "column1" }, { "content": "column2", "alias": "c2" }],
  "from": { "content": "my_table", "alias": "t" },
  "where": ["column1 = 1"]
}

Currently the parser supports the clauses

Clause
SELECT
FROM
LEFT JOIN
LEFT OUTER JOIN
RIGHT JOIN
RIGHT OUTER JOIN
FULL JOIN
FULL OUTER JOIN
JOIN
INNER JOIN
WHERE
QUALIFY
GROUP BY
ORDER BY
HAVING

Lineage Graph

Based on the SQL parser, a lineage graph can be constructed. We start off with the main constructor:

from pylineage import LineageGraph

lineage_graph = LineageGraph()

Subsequently, there are two options of adding SQL scripts: (1) as individual input strings or (2) as directory.

lineage_graph.extend_graph_from_input_string("CREATE TABLE my_view AS SELECT column1 FROM my_table")

lineage_graph.extend_graph_from_directory("/data")

The graph can be cleared at any time by running

lineage_graph.clear_graph()

One purpose of parsing and visualizing is to obtain execution order. This can be obtained by running

lineage_graph.get_execution_order()

NOTE
Any inner query that is not a part of a source clause (FROM / JOIN) is not included as a node in the graph. As such, statements like the one below are not taken into account.

...
WHERE column not in (
  SELECT column
  FROM table2
)

Finally the Lineage Graph can be accessed by directly checking

lineage_graph.graph

or by running it in interactive mode:

lineage_graph.serve_graph()

The interactive mode offers convenient highlighting and dragging capabilities.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

pylineage-0.1.4.tar.gz (10.9 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

pylineage-0.1.4-py3-none-any.whl (11.8 kB view details)

Uploaded Python 3

File details

Details for the file pylineage-0.1.4.tar.gz.

File metadata

  • Download URL: pylineage-0.1.4.tar.gz
  • Upload date:
  • Size: 10.9 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/1.1.5 CPython/3.8.10 Windows/10

File hashes

Hashes for pylineage-0.1.4.tar.gz
Algorithm Hash digest
SHA256 376f03a265da3bed74f5e65193074a6d8de72e42faafd42c5f99a60d747a6e56
MD5 5ee9aa65b354ee0730928de6698a764f
BLAKE2b-256 9352c31ffb0769e892af09551ac14376c5a416754236868472e1d0fc7326790b

See more details on using hashes here.

File details

Details for the file pylineage-0.1.4-py3-none-any.whl.

File metadata

  • Download URL: pylineage-0.1.4-py3-none-any.whl
  • Upload date:
  • Size: 11.8 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/1.1.5 CPython/3.8.10 Windows/10

File hashes

Hashes for pylineage-0.1.4-py3-none-any.whl
Algorithm Hash digest
SHA256 90e7a5a13b21d9042e20635eaa448ef0ad22ca5d048be93767bebde8d713c392
MD5 90bc35a974e0a22fda68f05d93e93829
BLAKE2b-256 0ee71e4f164c32f6d6c043b0590daceba9ca40515779c7dd8a6611579c98d5d2

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page