external-link R functions

trinetx.propensity_score API documentation

Function propensity_score

def propensity_score(cohort_1: sql.dataframe.DataFrame, cohort_2: sql.dataframe.DataFrame, features: sql.dataframe.DataFrame, cohort_names: list = None) ‑> sql.dataframe.DataFrame

Description


The propensity_score function calculates a propensity score for all patients in two cohorts. The propensity scores can then be used to match patients to create two balanced cohorts for an analysis.

Inputs


  • cohort_1 - user defined table with 2 mandatory columns:

    • patient_id

    • feature - name of the feature that will be used as a covariate to calculate propensity scores

    • feature (optional) - a column for each additional feature to use as a covariate

  • cohort_2 - user defined table with 2 mandatory columns:

    • patient_id

    • feature - name of the feature that will be used as a covariate to calculate propensity scores

    • feature (optional) - a column for each additional feature to use as a covariate

  • features - user defined table with 2 mandatory columns:

    • feature

    • feature_type - the type of feature

      • present (int, 0 or 1)

      • continuous (double)

      • categorical (string)

      • binned (double)

    • bin (optional)

      • user can set a bin for numeric features

      • a bin must be present for binned feature types

      • syntax for setting bins

        • '<=X': less than or equal to X

        • '<X': less than X

        • '>=X': greater than or equal to X

        • '>X': greater than X

        • 'X:Y': between X and Y

        • '>X:<Y': greater than X to less than Y

        • 'X:<Y': X to less than Y

        • '>X:Y': greater than X to Y

  • optional arguments:

    • cohort names (exactly 2 names; letters, numbers, underscores only, no spaces)

Returns


  • a table with the following columns

    • patient_id

    • cohort

    • propensity_score

  • Upon running the propensity_score function, a propensity score for each patient is calculated indicating the propensity that a patient is in cohort 1. Propensity score per patient is calculated based on the logistic regression model.

  • Due to variations in the statistical packages, R and Python will not return the exact same numbers if not rounded.

Example


cohort_1:

| patient_id  | age   | sex    | diabetes |
|-------------|-------|--------|----------|
| 1           | 51    | Male   | 1        |
| 2           | 45    | Male   | 0        |
| 3           | 67    | Female | 0        |
| 4           | 32    | Male   | 1        |
| 5           | 53    | Female | 0        |
| 6           | 46    | Female | 1        |


cohort_2:

| patient_id  | age   | sex    | diabetes |
|-------------|-------|--------|----------|
| 1           | 19    | Male   | 1        |
| 2           | 62    | Femal  | 1        |
| 3           | 55    | Female | 0        |
| 4           | 46    | Male   | 0        |
| 5           | 59    | Male   | 0        |
| 6           | 33    | Female | 1        |


features:

| feature  | feature_type  | bin    |
|----------|---------------|--------|
| age      | continuous    |        |
| sex      | categorical   |        |
| diabetes | present       |        |


scores = propensity_score(cohort_1, cohort_2, features)

display(scores)


| patient_id  | cohort   | propensity_score  |
|-------------|----------|-------------------|
| 1           | 1        | 0.30              |
| 2           | 1        | 0.55              |
| 3           | 1        | 0.92              |
| 1           | 2        | 0.89              |
| 2           | 2        | 0.74              |
| 3           | 2        | 0.21              |