Function propensity_score — propensity

The propensity_score function calculates a propensity score for all patients in two cohorts. The propensity scores can then be used to match patients to create two balanced cohorts for an analysis.

Usage

propensity_score(cohort_1, cohort_2, features, cohort_names = vector())

Arguments

cohort_1

user defined table with 2 mandatory columns:
- patient_id
- feature - name of the feature that will be used as a covariate to calculate propensity scores
- feature (optional) - a column for each additional feature to use as a covariate

cohort_2

user defined table with 2 mandatory columns:
- patient_id
- feature - name of the feature that will be used as a covariate to calculate propensity scores
- feature (optional) - a column for each additional feature to use as a covariate

features

user defined table with 2 mandatory columns:
- feature
- feature_type - the type of feature
  - present (int, 0 or 1)
  - continuous (double)
  - categorical (string)
  - binned (double)
- bin (optional)
  - user can set a bin for numeric features
  - a bin must be present for binned feature types
  - syntax for setting bins
    - <=X: less than or equal to X
    - <X: less than X
    - >=X: greater than or equal to X
    - >X: greater than X
    - X:Y: between X and Y
    - >X:<Y: greater than X to less than Y
    - X:<Y: X to less than Y
    - >X:Y: greater than X to Y

cohort_names

(optional) - exactly 2 names; letters, numbers, underscores only; no spaces

Value

a table with the following columns

patient_id
cohort
propensity_score Upon running the propensity_score function, a propensity score for each patient is calculated indicating the propensity that a patient is in cohort 1. Propensity score per patient is calculated based on the logistic regression model. Due to variations in the statistical packages, R and Python will not return the exact same numbers if not rounded.

Details

cohort_1:

| patient_id  | age   | sex    | diabetes |
|-------------|-------|--------|----------|
| 1           | 51    | Male   | 1        |
| 2           | 45    | Male   | 0        |
| 3           | 67    | Female | 0        |
| 4           | 32    | Male   | 1        |
| 5           | 53    | Female | 0        |
| 6           | 46    | Female | 1        |

cohort_2:

| patient_id  | age   | sex    | diabetes |
|-------------|-------|--------|----------|
| 1           | 19    | Male   | 1        |
| 2           | 62    | Femal  | 1        |
| 3           | 55    | Female | 0        |
| 4           | 46    | Male   | 0        |
| 5           | 59    | Male   | 0        |
| 6           | 33    | Female | 1        |

features:
| feature  | feature_type  | bin    |
|----------|---------------|--------|
| age      | continuous    |        |
| sex      | categorical   |        |
| diabetes | present       |        |

scores = propensity_score(cohort_1, cohort_2, features)

display(scores)

| patient_id  | cohort   | propensity_score  |
|-------------|----------|-------------------|
| 1           | 1        | 0.30              |
| 2           | 1        | 0.55              |
| 3           | 1        | 0.92              |
| 1           | 2        | 0.89              |
| 2           | 2        | 0.74              |
| 3           | 2        | 0.21              |