CICD tool for testing and deploying to Databricks
Project description
Databricks CI/CD
Forked from Manol Manolov's original databricks-cicd repo to use with tactivos/data-databricks repo
This is a tool for building CI/CD pipelines for Databricks. It is a python package that works in conjunction with a custom GIT repository (or a simple file structure) to validate and deploy content to databricks. Currently, it can handle the following content:
- Workspace - a collection of notebooks written in Scala, Python, R or SQL
- Jobs - list of Databricks jobs
- Clusters
- Instance Pools
- DBFS - an arbitrary collection of files that may be deployed on a Databricks workspace
Installation
pip install tactivos-databricks-cicd
Requirements
To use this tool, you need a source directory structure (preferably as a private GIT repository) that has the following structure:
any_local_folder_or_git_repo/
├── workspace/
│ ├── some_notebooks_subdir
│ │ └── Notebook 1.py
│ ├── Notebook 2.sql
│ ├── Notebook 3.r
│ └── Notebook 4.scala
├── jobs/
│ ├── My first job.json
│ └── Side gig.json
├── clusters/
│ ├── orion.json
│ └── Another cluster.json
├── instance_pools/
│ ├── Pool 1.json
│ └── Pool 2.json
└── dbfs/
├── strawbery_jam.jar
├── subdir
│ └── some_other.jar
├── some_python.egg
└── Ice cream.jpeg
Note: All folder names represent the default and can be configured. This is just a sample.
Usage
For the latest options and commands run:
cicd -h
A sample command could be:
cicd deploy \
-w sample_12432.7.azuredatabricks.net \
-u john.smith@domain.com \
-t dapi_sample_token_0d5-2 \
-lp '~/git/my-private-repo' \
-tp /blabla \
-c DEV.ini \
--verbose
Note: Paths for windows need to be in double quotes
The default configuration is defined in default.ini and can be overridden with a custom ini file using the -c option, usually one config file per target environment. (sample)
Create content
Notebooks:
- Add a notebook to source
- On the databricks UI go to your notebook.
- Click on
File -> Export -> Source file. - Add that file to the
workspacefolder of this repo without changing the file name.
Jobs:
- Add a job to source
-
Get the source of the job and write it to a file. You need to have the Databricks CLI and JQ installed. For Windows, it is easier to rename the
jq-win64.exetojq.exeand place it inc:\Windows\System32folder. Then on Windows/Linux/MAC:databricks jobs get --job-id 74 | jq .settings > Job_Name.jsonThis downloads the source JSON of the job from the databricks server and pulls only the settings from it, then writes it in to a file.
Note: The file name should be the same as the job name within the json file. Please, avoid spaces in names.
-
Add that file to the
jobsfolder
-
Clusters:
- Add a cluster to source
- Get the source of the cluster and write it to a file.
Note: The file name should be the same as the cluster name within the json file. Please, avoid spaces in names.databricks clusters get --cluster-name orion > orion.json - Add that file to the
clustersfolder
- Get the source of the cluster and write it to a file.
Instance pools:
- Add an instance pool to source
- Similar to clusters, just use
instance-poolsinstead ofclusters
- Similar to clusters, just use
DBFS:
- Add a file to dbfs
- Just add a file to the the
dbfsfolder.
- Just add a file to the the
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file tactivos-databricks-cicd-0.1.16.tar.gz.
File metadata
- Download URL: tactivos-databricks-cicd-0.1.16.tar.gz
- Upload date:
- Size: 16.2 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/4.0.2 CPython/3.11.4
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
ef80aed317701e5523b39c53b252db0c1a5ca4fdf04c8b5b0ebcbe17bcb5d1e4
|
|
| MD5 |
74f826a86bd81b9bf495fae90ccb3703
|
|
| BLAKE2b-256 |
1d499e0e14cf0d85690a678d61ebb428b43d92fc87fcc3169c17ca796d6749d4
|
File details
Details for the file tactivos_databricks_cicd-0.1.16-py3-none-any.whl.
File metadata
- Download URL: tactivos_databricks_cicd-0.1.16-py3-none-any.whl
- Upload date:
- Size: 22.3 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/4.0.2 CPython/3.11.4
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
f8f2f052ac24d99a7e39b9ea6b332a788896d6d3d752f23384d0996ac1ab11d0
|
|
| MD5 |
78f8dd5441a6680062954be6317ede93
|
|
| BLAKE2b-256 |
15fa59de245b453e3e934d5efba1fc976cb5fb3a2020e28175deb960db101afe
|