Data Lineage for Python
Project description
pylineage
This package provides simple tools for parsing and visualizing your .sql scripts.
Installation
The package is distributed through PyPI, and can be installed with
pip install pylineage
In order to create a lineage graph you should also install and add Graphviz to your Path.
SQL Parser
Individual basic .sql scripts can be parsed by running
from pylineage import SqlParser
parser = SqlParser()
query = """
SELECT column1
, column2 AS c2
FROM my_table t
WHERE column1 = 1
"""
parsed = parser.parse(query)
The output looks as follows
>>> parsed
{
"select": [{ "content": "column1" }, { "content": "column2", "alias": "c2" }],
"from": { "content": "my_table", "alias": "t" },
"where": ["column1 = 1"]
}
Currently the parser supports the clauses
| Clause |
|---|
| SELECT |
| FROM |
| LEFT JOIN |
| LEFT OUTER JOIN |
| RIGHT JOIN |
| RIGHT OUTER JOIN |
| FULL JOIN |
| FULL OUTER JOIN |
| JOIN |
| INNER JOIN |
| WHERE |
| QUALIFY |
| GROUP BY |
| ORDER BY |
| HAVING |
Lineage Graph
Based on the SQL parser, a lineage graph can be constructed. We start off with the main constructor:
from pylineage import LineageGraph
lineage_graph = LineageGraph()
Subsequently, there are two options of adding SQL scripts: (1) as individual input strings or (2) as directory.
lineage_graph.extend_graph_from_input_string("CREATE TABLE my_view AS SELECT column1 FROM my_table")
lineage_graph.extend_graph_from_directory("/data")
The graph can be cleared at any time by running
lineage_graph.clear_graph()
One purpose of parsing and visualizing is to obtain execution order. This can be obtained by running
lineage_graph.get_execution_order()
NOTE
Any inner query that is not a part of a source clause (FROM / JOIN) is not included as a node in the graph. As such, statements like the one below are not taken into account.
...
WHERE column not in (
SELECT column
FROM table2
)
Finally the Lineage Graph can be accessed by directly checking
lineage_graph.graph
or by running it in interactive mode:
lineage_graph.serve_graph()
The interactive mode offers convenient highlighting and dragging capabilities.
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file pylineage-0.1.4.tar.gz.
File metadata
- Download URL: pylineage-0.1.4.tar.gz
- Upload date:
- Size: 10.9 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: poetry/1.1.5 CPython/3.8.10 Windows/10
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
376f03a265da3bed74f5e65193074a6d8de72e42faafd42c5f99a60d747a6e56
|
|
| MD5 |
5ee9aa65b354ee0730928de6698a764f
|
|
| BLAKE2b-256 |
9352c31ffb0769e892af09551ac14376c5a416754236868472e1d0fc7326790b
|
File details
Details for the file pylineage-0.1.4-py3-none-any.whl.
File metadata
- Download URL: pylineage-0.1.4-py3-none-any.whl
- Upload date:
- Size: 11.8 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: poetry/1.1.5 CPython/3.8.10 Windows/10
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
90e7a5a13b21d9042e20635eaa448ef0ad22ca5d048be93767bebde8d713c392
|
|
| MD5 |
90bc35a974e0a22fda68f05d93e93829
|
|
| BLAKE2b-256 |
0ee71e4f164c32f6d6c043b0590daceba9ca40515779c7dd8a6611579c98d5d2
|