Function propensity_score
propensity_score.Rd
The propensity_score function calculates a propensity score for all patients in two cohorts. The propensity scores can then be used to match patients to create two balanced cohorts for an analysis.
Usage
propensity_score(cohort_1, cohort_2, features, cohort_names = vector())
Arguments
- cohort_1
user defined table with 2 mandatory columns:
patient_id
feature
- name of the feature that will be used as a covariate to calculate propensity scoresfeature
(optional) - a column for each additional feature to use as a covariate
- cohort_2
user defined table with 2 mandatory columns:
patient_id
feature
- name of the feature that will be used as a covariate to calculate propensity scoresfeature
(optional) - a column for each additional feature to use as a covariate
- features
user defined table with 2 mandatory columns:
feature
feature_type
- the type of featurepresent (int, 0 or 1)
continuous (double)
categorical (string)
binned (double)
bin
(optional)user can set a bin for numeric features
a bin must be present for binned feature types
syntax for setting bins
<=X
: less than or equal to X<X
: less than X>=X
: greater than or equal to X>X
: greater than XX:Y
: between X and Y>X:<Y
: greater than X to less than YX:<Y
: X to less than Y>X:Y
: greater than X to Y
- cohort_names
(optional) - exactly 2 names; letters, numbers, underscores only; no spaces
Value
a table with the following columns
patient_id
cohort
propensity_score
Upon running the propensity_score function, a propensity score for each patient is calculated indicating the propensity that a patient is in cohort 1. Propensity score per patient is calculated based on the logistic regression model. Due to variations in the statistical packages, R and Python will not return the exact same numbers if not rounded.
Details
:
cohort_1
| patient_id | age | sex | diabetes |
|-------------|-------|--------|----------|
| 1 | 51 | Male | 1 |
| 2 | 45 | Male | 0 |
| 3 | 67 | Female | 0 |
| 4 | 32 | Male | 1 |
| 5 | 53 | Female | 0 |
| 6 | 46 | Female | 1 |
:
cohort_2
| patient_id | age | sex | diabetes |
|-------------|-------|--------|----------|
| 1 | 19 | Male | 1 |
| 2 | 62 | Femal | 1 |
| 3 | 55 | Female | 0 |
| 4 | 46 | Male | 0 |
| 5 | 59 | Male | 0 |
| 6 | 33 | Female | 1 |
:
features| feature | feature_type | bin |
|----------|---------------|--------|
| age | continuous | |
| sex | categorical | |
| diabetes | present | |
= propensity_score(cohort_1, cohort_2, features)
scores
display(scores)
| patient_id | cohort | propensity_score |
|-------------|----------|-------------------|
| 1 | 1 | 0.30 |
| 2 | 1 | 0.55 |
| 3 | 1 | 0.92 |
| 1 | 2 | 0.89 |
| 2 | 2 | 0.74 |
| 3 | 2 | 0.21 |