Function
propensity_score
def propensity_score(cohort_1: sql.dataframe.DataFrame, cohort_2: sql.dataframe.DataFrame, features: sql.dataframe.DataFrame, cohort_names: list = None) ‑> sql.dataframe.DataFrame
-
Description
The propensity_score function calculates a propensity score for all patients in two cohorts. The propensity scores can then be used to match patients to create two balanced cohorts for an analysis.
Inputs
-
cohort_1 - user defined table with 2 mandatory columns:
-
patient_id
-
feature - name of the feature that will be used as a covariate to calculate propensity scores
-
feature (optional) - a column for each additional feature to use as a covariate
-
-
cohort_2 - user defined table with 2 mandatory columns:
-
patient_id
-
feature - name of the feature that will be used as a covariate to calculate propensity scores
-
feature (optional) - a column for each additional feature to use as a covariate
-
-
features - user defined table with 2 mandatory columns:
-
feature
-
feature_type - the type of feature
-
present (int, 0 or 1)
-
continuous (double)
-
categorical (string)
-
binned (double)
-
-
bin (optional)
-
user can set a bin for numeric features
-
a bin must be present for binned feature types
-
syntax for setting bins
-
'<=X': less than or equal to X
-
'<X': less than X
-
'>=X': greater than or equal to X
-
'>X': greater than X
-
'X:Y': between X and Y
-
'>X:<Y': greater than X to less than Y
-
'X:<Y': X to less than Y
-
'>X:Y': greater than X to Y
-
-
-
-
optional arguments:
- cohort names (exactly 2 names; letters, numbers, underscores only, no spaces)
Returns
-
a table with the following columns
-
patient_id
-
cohort
-
propensity_score
-
-
Upon running the propensity_score function, a propensity score for each patient is calculated indicating the propensity that a patient is in cohort 1. Propensity score per patient is calculated based on the logistic regression model.
-
Due to variations in the statistical packages, R and Python will not return the exact same numbers if not rounded.
Example
cohort_1: | patient_id | age | sex | diabetes | |-------------|-------|--------|----------| | 1 | 51 | Male | 1 | | 2 | 45 | Male | 0 | | 3 | 67 | Female | 0 | | 4 | 32 | Male | 1 | | 5 | 53 | Female | 0 | | 6 | 46 | Female | 1 | cohort_2: | patient_id | age | sex | diabetes | |-------------|-------|--------|----------| | 1 | 19 | Male | 1 | | 2 | 62 | Femal | 1 | | 3 | 55 | Female | 0 | | 4 | 46 | Male | 0 | | 5 | 59 | Male | 0 | | 6 | 33 | Female | 1 | features: | feature | feature_type | bin | |----------|---------------|--------| | age | continuous | | | sex | categorical | | | diabetes | present | | scores = propensity_score(cohort_1, cohort_2, features) display(scores) | patient_id | cohort | propensity_score | |-------------|----------|-------------------| | 1 | 1 | 0.30 | | 2 | 1 | 0.55 | | 3 | 1 | 0.92 | | 1 | 2 | 0.89 | | 2 | 2 | 0.74 | | 3 | 2 | 0.21 |
-