Skip to contents

The propensity_score function calculates a propensity score for all patients in two cohorts. The propensity scores can then be used to match patients to create two balanced cohorts for an analysis.

Usage

propensity_score(cohort_1, cohort_2, features, cohort_names = vector())

Arguments

cohort_1
  • user defined table with 2 mandatory columns:

    • patient_id

    • feature - name of the feature that will be used as a covariate to calculate propensity scores

    • feature (optional) - a column for each additional feature to use as a covariate

cohort_2
  • user defined table with 2 mandatory columns:

    • patient_id

    • feature - name of the feature that will be used as a covariate to calculate propensity scores

    • feature (optional) - a column for each additional feature to use as a covariate

features
  • user defined table with 2 mandatory columns:

    • feature

    • feature_type - the type of feature

      • present (int, 0 or 1)

      • continuous (double)

      • categorical (string)

      • binned (double)

    • bin (optional)

      • user can set a bin for numeric features

      • a bin must be present for binned feature types

      • syntax for setting bins

        • <=X: less than or equal to X

        • <X: less than X

        • >=X: greater than or equal to X

        • >X: greater than X

        • X:Y: between X and Y

        • >X:<Y: greater than X to less than Y

        • X:<Y: X to less than Y

        • >X:Y: greater than X to Y

cohort_names

(optional) - exactly 2 names; letters, numbers, underscores only; no spaces

Value

a table with the following columns

  • patient_id

  • cohort

  • propensity_score Upon running the propensity_score function, a propensity score for each patient is calculated indicating the propensity that a patient is in cohort 1. Propensity score per patient is calculated based on the logistic regression model. Due to variations in the statistical packages, R and Python will not return the exact same numbers if not rounded.

Details

cohort_1:

| patient_id  | age   | sex    | diabetes |
|-------------|-------|--------|----------|
| 1           | 51    | Male   | 1        |
| 2           | 45    | Male   | 0        |
| 3           | 67    | Female | 0        |
| 4           | 32    | Male   | 1        |
| 5           | 53    | Female | 0        |
| 6           | 46    | Female | 1        |

cohort_2:

| patient_id  | age   | sex    | diabetes |
|-------------|-------|--------|----------|
| 1           | 19    | Male   | 1        |
| 2           | 62    | Femal  | 1        |
| 3           | 55    | Female | 0        |
| 4           | 46    | Male   | 0        |
| 5           | 59    | Male   | 0        |
| 6           | 33    | Female | 1        |

features:
| feature  | feature_type  | bin    |
|----------|---------------|--------|
| age      | continuous    |        |
| sex      | categorical   |        |
| diabetes | present       |        |

scores = propensity_score(cohort_1, cohort_2, features)

display(scores)

| patient_id  | cohort   | propensity_score  |
|-------------|----------|-------------------|
| 1           | 1        | 0.30              |
| 2           | 1        | 0.55              |
| 3           | 1        | 0.92              |
| 1           | 2        | 0.89              |
| 2           | 2        | 0.74              |
| 3           | 2        | 0.21              |