Function propensity_score
propensity_score.RdThe propensity_score function calculates a propensity score for all patients in two cohorts. The propensity scores can then be used to match patients to create two balanced cohorts for an analysis.
Usage
propensity_score(cohort_1, cohort_2, features, cohort_names = vector())Arguments
- cohort_1
user defined table with 2 mandatory columns:
patient_idfeature- name of the feature that will be used as a covariate to calculate propensity scoresfeature(optional) - a column for each additional feature to use as a covariate
- cohort_2
user defined table with 2 mandatory columns:
patient_idfeature- name of the feature that will be used as a covariate to calculate propensity scoresfeature(optional) - a column for each additional feature to use as a covariate
- features
user defined table with 2 mandatory columns:
featurefeature_type- the type of featurepresent (int, 0 or 1)
continuous (double)
categorical (string)
binned (double)
bin(optional)user can set a bin for numeric features
a bin must be present for binned feature types
syntax for setting bins
<=X: less than or equal to X<X: less than X>=X: greater than or equal to X>X: greater than XX:Y: between X and Y>X:<Y: greater than X to less than YX:<Y: X to less than Y>X:Y: greater than X to Y
- cohort_names
(optional) - exactly 2 names; letters, numbers, underscores only; no spaces
Value
a table with the following columns
patient_idcohortpropensity_scoreUpon running the propensity_score function, a propensity score for each patient is calculated indicating the propensity that a patient is in cohort 1. Propensity score per patient is calculated based on the logistic regression model. Due to variations in the statistical packages, R and Python will not return the exact same numbers if not rounded.
Details
cohort_1:
| patient_id | age | sex | diabetes |
|-------------|-------|--------|----------|
| 1 | 51 | Male | 1 |
| 2 | 45 | Male | 0 |
| 3 | 67 | Female | 0 |
| 4 | 32 | Male | 1 |
| 5 | 53 | Female | 0 |
| 6 | 46 | Female | 1 |
cohort_2:
| patient_id | age | sex | diabetes |
|-------------|-------|--------|----------|
| 1 | 19 | Male | 1 |
| 2 | 62 | Femal | 1 |
| 3 | 55 | Female | 0 |
| 4 | 46 | Male | 0 |
| 5 | 59 | Male | 0 |
| 6 | 33 | Female | 1 |
features:
| feature | feature_type | bin |
|----------|---------------|--------|
| age | continuous | |
| sex | categorical | |
| diabetes | present | |
scores = propensity_score(cohort_1, cohort_2, features)
display(scores)
| patient_id | cohort | propensity_score |
|-------------|----------|-------------------|
| 1 | 1 | 0.30 |
| 2 | 1 | 0.55 |
| 3 | 1 | 0.92 |
| 1 | 2 | 0.89 |
| 2 | 2 | 0.74 |
| 3 | 2 | 0.21 |