Function `propensity_score`

def propensity_score(cohort_1: sql.dataframe.DataFrame, cohort_2: sql.dataframe.DataFrame, features: sql.dataframe.DataFrame, cohort_names: list = None) ‑> sql.dataframe.DataFrame

Description

The propensity_score function calculates a propensity score for all patients in two cohorts. The propensity scores can then be used to match patients to create two balanced cohorts for an analysis.

Inputs

cohort_1 - user defined table with 2 mandatory columns:
- patient_id
- feature - name of the feature that will be used as a covariate to calculate propensity scores
- feature (optional) - a column for each additional feature to use as a covariate
cohort_2 - user defined table with 2 mandatory columns:
- patient_id
- feature - name of the feature that will be used as a covariate to calculate propensity scores
- feature (optional) - a column for each additional feature to use as a covariate
features - user defined table with 2 mandatory columns:
- feature
- feature_type - the type of feature
  - present (int, 0 or 1)
  - continuous (double)
  - categorical (string)
  - binned (double)
- bin (optional)
  - user can set a bin for numeric features
  - a bin must be present for binned feature types
  - syntax for setting bins
    - '<=X': less than or equal to X
    - '<X': less than X
    - '>=X': greater than or equal to X
    - '>X': greater than X
    - 'X:Y': between X and Y
    - '>X:<Y': greater than X to less than Y
    - 'X:<Y': X to less than Y
    - '>X:Y': greater than X to Y
optional arguments:
- cohort names (exactly 2 names; letters, numbers, underscores only, no spaces)

Returns

a table with the following columns
- patient_id
- cohort
- propensity_score
Upon running the propensity_score function, a propensity score for each patient is calculated indicating the propensity that a patient is in cohort 1. Propensity score per patient is calculated based on the logistic regression model.
Due to variations in the statistical packages, R and Python will not return the exact same numbers if not rounded.

Example

cohort_1:

| patient_id  | age   | sex    | diabetes |
|-------------|-------|--------|----------|
| 1           | 51    | Male   | 1        |
| 2           | 45    | Male   | 0        |
| 3           | 67    | Female | 0        |
| 4           | 32    | Male   | 1        |
| 5           | 53    | Female | 0        |
| 6           | 46    | Female | 1        |


cohort_2:

| patient_id  | age   | sex    | diabetes |
|-------------|-------|--------|----------|
| 1           | 19    | Male   | 1        |
| 2           | 62    | Femal  | 1        |
| 3           | 55    | Female | 0        |
| 4           | 46    | Male   | 0        |
| 5           | 59    | Male   | 0        |
| 6           | 33    | Female | 1        |


features:

| feature  | feature_type  | bin    |
|----------|---------------|--------|
| age      | continuous    |        |
| sex      | categorical   |        |
| diabetes | present       |        |


scores = propensity_score(cohort_1, cohort_2, features)

display(scores)


| patient_id  | cohort   | propensity_score  |
|-------------|----------|-------------------|
| 1           | 1        | 0.30              |
| 2           | 1        | 0.55              |
| 3           | 1        | 0.92              |
| 1           | 2        | 0.89              |
| 2           | 2        | 0.74              |
| 3           | 2        | 0.21              |

R functions