Function aggregate_table
aggregate_table.Rd
The aggregate_table function allows a user to quickly convert a patient level table into an aggregated cohort table with demographics, comordities, medications, labs, and more. Supported aggregations include sum, value_count, mean, median, range, and missing.
Arguments
- table
extract spark dataframe - a table with patient_ids in one column, and fields generated from find_presence and dstats, as well as those joined from the patient table (and any other fields present in the table)
- feature_function
2 column spark dataframe
feature - name of the column in the extract
func - calculation to perform on the column
supported functions:
sum - sums the column values
mean - finds the mean of all column values
median - finds the median of all column values
value_count - find unique values in the column and make a row for each, and counts the total number of each unique value
range - outputs the min and max values in the column
missing - null value count in column
Value
returns a 5 column dataframe:
characteristics
if func = sum or median or mean or min or max, then display the feature value in the feature column
if func = value_count, then display the unique values in the extract column, creating a new row for each unique value
feature - the feature from the input file the characteristic was generated from
func - the function type from the input file that was used on the feature
value_1
the value of the function output indicated by the user
if func = range: min
if func = missing: sum
value_2
if func = sum or value_count: value/(count of unique patient_ids in extract table input
if func = mean: standard deviation
if func = median: IQR (0.25 to 0.75 quartile range)
if func = range: max
if func = missing: value/(count of unique patient_ids in extract table input
Details
:
feature_function input table example
| feature | func |
|---------------|-------------|
| age | mean |
| age | median |
| age | range |
| weight | mean |
| weight | median |
| race | value_count |
| sex | value_count |
| depression | sum |
| diabetes | sum |
| dvt | sum |
| hyptertension | sum |
| asthma | sum |
| sleep apnea | sum |
= aggregate_table(table=extract, feature_function=feature_functions)
table_1
head(table_1)
| characteristic | feature | func | value_1 | value_2 |
|---------------------------------------------|---------------|-------------|---------|---------|
| age | age | mean | 50 | 15 |
| age | age | median | 52 | 10 |
| age | age | range | 30 | 80 |
| weight | weight | mean | 100 | 10 |
| weight | weight | median | 130 | 40 |
| white | race | value_count | 1000 | 0.18 |
| black or african american | race | value_count | 200 | 0.05 |
| asian | race | value_count | 600 | 0.1 |
| american indian or alaska native | race | value_count | 200 | 0.05 |
| native hawaiian or other pacific islander | race | value_count | 200 | 0.05 |
| male | sex | value_count | 3000 | 0.5 |
| female | sex | value_count | 3000 | 0.5 |
| depression | depression | sum | 100 | 0.05 |
| diabetes | diabetes | sum | 200 | 0.07 |
| dvt | dvt | sum | 300 | 0.09 |
| hyptertension | hyptertension | sum | 400 | 0.1 |
| asthma | asthma | sum | 500 | 0.11 |
| sleep apnea | sleep apnea | sum | 600 | 0.12 |