A new take on using Snakemake: pass around log-files instead of output-files

These details have not been verified by PyPI

Project links

Project description

snakemake-jobmonitor package

snakemake-jobmonitor is an alternative take on the regular Snakemake workflow. Instead of passing input and output-files around, it passes log-files around. The log-files contain pointers to result-files. The advantage of this is much better progress monitoring, error handling and logging. The JobMonitor and JobResult classes ensure that this can be achieved with minimal code that is easy to read and maintain. snakemake-jobmonitor is a super minimal library of just two pages, installed by `pip install snakemake-jobmonitor'. It does not modify Snakemake, only the way Snakemake is used.

Regular Snakemake

Snakemake is a powerful workflow-engine that compiles rules into a DAG (Directed Acyclic Graph) and automatically determines a parallel execution strategy. Rules invoke each other via filenames, which typically contain wildcards so that the same rule can be invoked for multiple cases.

Regular workflow example

Example of a Snakefile (examples/regular.snk, run with snakemake -s regular.snk --cores 1 --forceall) that splits a color image into the red, green and blue component for cases '1','2' and '3'.

import os
from PIL import Image

inputFolder = 'path_to_cases'
outputFolder = '../scratch/regular/path_to_results'
os.makedirs(outputFolder,exist_ok=True)

allCases = ['1','2','3']

def somethingUseful(colorInfile, redOutfile,greenOutfile,blueOutfile):
    im = Image.open(colorInfile)
    r,g,b = im.split()
    r.save(redOutfile); g.save(greenOutfile); b.save(blueOutfile)
    # uncomment this to raise a Division By Zero error:
    #1/0

def createReport(allRed,allBlue,allGreen, reportFile):
    with open(reportFile,'wt') as fp:
        fp.write(f'Red files:\n{",\n".join(allRed)}\n\n')
        fp.write(f'Blue files:\n{",\n".join(allBlue)}\n\n')
        fp.write(f'Green files:\n{",\n".join(allGreen)}\n\n')

rule runSingleCase:
    input:
        color=inputFolder+'/case-{case}_RGB.jpeg'
    output:
        R=outputFolder+'/case-{case}_R.png',
        G=outputFolder+'/case-{case}_G.png',
        B=outputFolder+'/case-{case}_B.png'
    run:
        somethingUseful(input.color, output.R,output.G,output.B)

rule runAllCases:
    input:
        R=[outputFolder+f'/case-{c}_R.png' for c in allCases],
        G=[outputFolder+f'/case-{c}_G.png' for c in allCases],
        B=[outputFolder+f'/case-{c}_B.png' for c in allCases]
    output:
        report=f'{outputFolder}/report.txt'
    default_target:
        True
    run:
        createReport(input.R,input.G,input.B, output.report)

Practical issues

For larger workflows some issues arise:

Snakemake does not come with a good progress monitor. There is a possibility to use the 'WMS monitoring protocol' but this is cumbersome to setup and being phased out. It is replaced by 'logger plugins', but these are still experimental.
If an error occurs, the pipeline stops and produces a very long error trace, most of which is irrelevant to the error. Or one can opt to ignore errors, but this will cause errors down the line that are even more difficult to trace.
Snakemake produces a log-file that contains information about process execution, but does not contain the console-output of the processes called by each rule. This is because a global log file is not suitable to contain logs from different components that may run in parallel.
If a rule has many outputs, and another rule needs these as inputs, the rules become cluttered.

An alternative approach

To solve these issues, snakemake-jobmonitor changes the way rules interact. Every rule is producing a log-file instead of output files. And instead of having rule B request the output of rule A, it requests the log-file of rule A. Inside that log-file there is a pointer to where the rule results are stored.

Snakemake-jobmonitor is implemented as a class that acts as a context-manager. A typical rule looks as follows:

rule decomposeSingle:
    input:
        color='inputFolder/case-{case}_RGB.png'
    log:
        'logFolder/case-{case}_decompose.log'
    run:
        caseFolder = f'{outputFolder}/case-{wildcards.case}'
        with JobMonitor(log,'Decompose RGB into R,G,B',caseFolder) as job:
            doDecompose(input.color, 
                job.result('R.png'),job.result('G.png'),job.result('B.png'))

The rule has changed in a few places, instead of producing three output files it produces a log file. In the statement that starts with with JobMonitor, JobMonitor creates the log file and in that file it stores the path where the rule output will be written, in this case in the caseFolder folder. In the last line, the job.result('R.png') creates the output folder and returns the full path to the file.

Although the code has become two lines longer, it offers huge advantages:

JobMonitor automatically creates the .log file, but while the rule executes the extension is changed into '.running'. So, at any moment you can see what Snakemake is working on by listing all .running files In the log folder.
If an error occurs within the JobMonitor context, the error is appended to the log file and written to a .error file (with otherwise the same name as the .log file). So, one can easily find all rules that gave errors by listing .error files in the log folder. After fixing the code that produced the error, delete the corresponding log-file before re-running Snakemake.
Naturally, every rule produces its own log. In addition, JobMonitor provides a run method to invoke external software. This method is mostly the same as subprocess.run, but it captures all output to the .log file and sends errors to the .error file.
Rules have inputs that are log files produced by other rules. And a single output: its own log file. The Snakefile is not cluttered by declaring all the output files that may be produced by each rule. Those are accessed indirectly via the result-pointer in its log-file.

Full snakemake-jobmonitor example

Here is the full version of the previous example in the snakemake-jobmonitor style (examples/jobmon.snk, run with snakemake -s jobmon.snk --cores 1).

import os
from PIL import Image
from snakemake_jobmonitor import JobMonitor, JobResult

inputFolder = 'path_to_cases'
outputFolder = '../scratch/jobmon/path_to_results'
logFolder = '../scratch/jobmon/path_to_logs'

allCases = ['1','2','3']

def doSomethingUseful(colorInfile, redOutfile,greenOutfile,blueOutfile):
    im = Image.open(colorInfile)
    r,g,b = im.split()
    r.save(redOutfile); g.save(greenOutfile); b.save(blueOutfile)
    # uncomment this to raise a Division By Zero error:
    #1/0

def createReport(allRed,allBlue,allGreen, reportFile):
    with open(reportFile,'wt') as fp:
        fp.write(f'Red files:\n{",\n".join(allRed)}\n\n')
        fp.write(f'Blue files:\n{",\n".join(allBlue)}\n\n')
        fp.write(f'Green files:\n{",\n".join(allGreen)}\n\n')

rule runSingleCase:
    input:
        color=inputFolder+'/case-{case}_RGB.jpeg'
    log:
        logFolder+'/case-{case}_decompose.log'
    run:
        caseFolder = f'{outputFolder}/case-{wildcards.case}'
        with JobMonitor(log,'Decompose RGB into R,G,B',caseFolder) as job:
            doSomethingUseful(
                input.color, 
                job.result('R.png'),job.result('G.png'),job.result('B.png')
            )

rule runAllCases:
    input:
        [logFolder+f'/case-{cs}_decompose.log' for cs in allCases]
    log:
        logFolder+f'/decomposeAll.log'
    default_target:
        True
    run:
        with JobMonitor(log,'Decompose All',outputFolder) as job:
            R = [JobResult(f)('R.png') for f in input]
            G = [JobResult(f)('G.png') for f in input]
            B = [JobResult(f)('B.png') for f in input]
            createReport(R,G,B, job.result('report.txt') )

Usage of JobMonitor

Signature: JobMonitor(logFile,description,resultFolder)

The JobMonitor class takes three arguments:

logFile path to the log file. If the file exists, it will be overwritten.
description brief description of what the rule does.
resultFolder path to the result folder. One can also pass a result prefix by adding an asterisc at the end. Examples:
- /my/results/case-1 will cause results to be written in the case-1 folder.
- /my/results/case-1_* will cause results to be written in the results folder, and every file therein will start with case-1_.

JobMonitor should be used as a context manager, like

with JobMonitor(logFile,description,resultFolder) as job:
    doSomething()

Inside the context, job can be used for the following tasks:

Create/access the result of this rule via job.result(resultFile)

This returns a filename that concatenates the previously specified resultFolder with resultFile, and will make sure the folder is created. One can also write results in subfolders, by just adding arguments, like job.result(subFolder,resultFile). Examples:
- If the resultFolder is specified as /my/results/case-1, then job.result('test','R.png') will return /my/results/case-1/test/R.png.
- If the resultFolder is specified as /my/results/case-1_*, then job.result('test','R.png') will return /my/results/case-1_test/R.png
This is all that job.result is doing, it just returns filenames and creates folders, it does not create results, that is up to the code inside the rule.
Run an external command via job.run(command,liveUpdates=False).

Here command must NOT be a string, but rather a list of strings that follows the exact same rules as the subprocess.run command. The advantage of using job.run() is that it saves stdout and stderror to the log/error file respectively.

If liveUpdates is set to False, then the log/error file will be updated once the command is finished, if set to True the update is more frequent.

As a general rule, we recommend that log-files are all stored in the same folder, with hierarchy expressed in the file name. For result-files it can be more natural to use a hierarchical folder structure.

Usage of JobResult

Signature: JobResult(logFile)

We already used job.result in the previous chapter to access result files inside the JobMonitor context. The JobResult class is to access results of other rules that produced log-files. It makes use of the fact that every log-file contains, on the second line, the resultFolder of the rule that created it.

If we start for example with:

result = JobResult('/my/logfolder/case-1_test.log')

then result can be used in the same way as job.result in the previous chapter. For example, result(subFolder,resultFile) will return the concatenation of resultFolder, subFolder and resultFile. It will not create any folders, that only happens in the JobMonitor context.

JobResult has some additional convenience methods:

result.file(*args) is the same as result(*args)
result.folder(*args) returns the result folder, internally using os.path.dirname(resultFile)
result.parseJson(*args) parses the json-formatted result file and returns its content.

Project details

These details have not been verified by PyPI

Project links

Release history Release notifications | RSS feed

This version

0.1.2

Oct 20, 2025

0.1.1

Oct 18, 2025

0.1

Oct 18, 2025

0.0

Oct 18, 2025

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

snakemake_jobmonitor-0.1.2.tar.gz (49.9 kB view details)

Uploaded Oct 20, 2025 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

snakemake_jobmonitor-0.1.2-py2.py3-none-any.whl (9.7 kB view details)

Uploaded Oct 20, 2025 Python 2Python 3

File details

Details for the file snakemake_jobmonitor-0.1.2.tar.gz.

File metadata

Download URL: snakemake_jobmonitor-0.1.2.tar.gz
Upload date: Oct 20, 2025
Size: 49.9 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.13.2

File hashes

Hashes for snakemake_jobmonitor-0.1.2.tar.gz
Algorithm	Hash digest
SHA256	`476710a26aca9d22776ec7059481599093d6729eed7cf9f70d901c439aa2e795`
MD5	`d8e9c0f0bfffcf1627b3856f25aa6821`
BLAKE2b-256	`1ee45710ab5625b31de90e830a7da5bc650c5c1833a4eec4c81efacb95ddfb52`

See more details on using hashes here.

File details

Details for the file snakemake_jobmonitor-0.1.2-py2.py3-none-any.whl.

File metadata

Download URL: snakemake_jobmonitor-0.1.2-py2.py3-none-any.whl
Upload date: Oct 20, 2025
Size: 9.7 kB
Tags: Python 2, Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.13.2

File hashes

Hashes for snakemake_jobmonitor-0.1.2-py2.py3-none-any.whl
Algorithm	Hash digest
SHA256	`7e34d9875e3e335ce36b4d90a09e700883d5c487ac9ce01d42ca575fe185f17a`
MD5	`e148284c37178f0921e2c6dab5400d46`
BLAKE2b-256	`1a9ce346361100e5b2bb8863aa34d31187b0146bab31572e275c2110765517f4`

See more details on using hashes here.

snakemake-jobmonitor 0.1.2

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

snakemake-jobmonitor package

Regular Snakemake

Regular workflow example

Practical issues

An alternative approach

Full snakemake-jobmonitor example

Usage of JobMonitor

Usage of JobResult

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes