Skip to main content

Counter factual exploration utility

Project description

whatifact

A Counter Factual Exploration Tool

Counter Factual explanations are a methdology to investigate model's predictions. whatifact allows data scientists to play with their population features, and find out how predictions are changed.

Specifically, it allows to ask "what-if" questions:

  • What would the model predict if a record belonged to a man, instead of a woman?
  • What would the model predict if my blood glucose levels were slightly higher, or lower?

It should be noted that this tool can help assess the causal questions of the model's prediction - but not the causal questions of the real world! Answer causal questions of the real worlds require unique design, rather than some UI tool...

Example

In the most basic setting, whatifact only requries the data and a classifier. Everything will be selected automatically:

  • Whether a feature is categorical or continuous
  • Should missing values be allowed
  • How to set-up the sliders for continuous features

You will notice that some sliders have a little checkbox on their left. Un-selecting this checkbox will disable the slider / make an empty selection in a drop-list widget, and will consider this feature to be a missing value.

from sklearn.datasets import fetch_openml
import lightgbm as lgb

from whatifact import whatifact

# Load Titanic dataset
titanic = fetch_openml("titanic", version=1, as_frame=True)

# Convert to DataFrame
df = titanic.data[['pclass', 'sex', 'age', 'sibsp', 'parch', 'fare', 'embarked']]
labels = titanic.target.astype(int)

# Train a LGBMClassifier
clf = lgb.LGBMClassifier(verbose=0).fit(df, labels)

# Running whatifact
app = whatifact(df=df, clf=clf)

# # Output: (Clicking on the http link will open whatifact in the browser)
# INFO:     Started server process [42841]
# INFO:     Waiting for application startup.
# INFO:     Application startup complete.
# INFO:     Uvicorn running on http://<LOCAL_IP>:8000 (Press CTRL+C to quit)

However, when running the code above, you will notice a strange behavior. The sliders for both age and fare start at negative values, which is of cource non-sensical, the parch variable has no missing-value checkbox next to it, and the sibsp was considered as a continuous feature, but we'd rather handle it as categorical.

To change this behavior, we can send the feature_settings parameter, such as:

feature_settings = {
    'parch': {'null': True},
    'age': {'min': 0},
    'fare': {'min': 0},
    'sibsp': {'type': 'categorical'}
}

app = whatifact(df=df, clf=clf, feature_settings=feature_settings)

This should solve the above behavior. feature_settings is a dictionary, with column names as keys and dicionaries as values. All features may contain null or type keys.

  • null: a boolearn (True/False) to state whether the feature can have null values.
  • type: 'continuous' or 'categorical' - manually defining the feature type.

Continuous features may also contain the min, max, step, and decimals keys. All other keys will be ignored. The decimals parameters is an integer defining the number of decimal digits for rounding purposes (default: 1).

The last two parameters in whatifact are sample_id and run_application.

  • sample_id is the name of a column in df, that will be used in the sample selector at the top of the app. If it remains None, the index column will be used as sample_id.
  • run_application is a boolean (True/False, defaults to True) that run the web service to run the shiny app. If changed to False, an App object will be returned, but not run, and running the app will require shiny run my_file.py

Limitations

whatifact currently works with binary prediction models only, and should support LogisticRegression, XGBoost, and LGBMClassifier.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

whatifact-0.1.6.tar.gz (15.4 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

whatifact-0.1.6-py3-none-any.whl (12.6 kB view details)

Uploaded Python 3

File details

Details for the file whatifact-0.1.6.tar.gz.

File metadata

  • Download URL: whatifact-0.1.6.tar.gz
  • Upload date:
  • Size: 15.4 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.12.9

File hashes

Hashes for whatifact-0.1.6.tar.gz
Algorithm Hash digest
SHA256 e06a9cadaec96a6c4a4f8e6dbefeaa9ae4d7aa9116550869ee2fc58d18a293d7
MD5 65706208ce0260117341a6a6dcb2befd
BLAKE2b-256 be3a2600776e77e2b0c3972d714f313793287381cb3d962d7d25c526311d13af

See more details on using hashes here.

File details

Details for the file whatifact-0.1.6-py3-none-any.whl.

File metadata

  • Download URL: whatifact-0.1.6-py3-none-any.whl
  • Upload date:
  • Size: 12.6 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.12.9

File hashes

Hashes for whatifact-0.1.6-py3-none-any.whl
Algorithm Hash digest
SHA256 892314632eba2b5afdfc4e6863fd265dc34ee2ccfa9814dd1abeec8b05a181d3
MD5 f17ef8cf95b6cac571130ae4370952ee
BLAKE2b-256 975353ec6ba88bf45f33d9b909f164068712b963f5bcec19b63f4bdf5df6c6a4

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page