Function aggregate_table — aggregate

The aggregate_table function allows a user to quickly convert a patient level table into an aggregated cohort table with demographics, comordities, medications, labs, and more. Supported aggregations include sum, value_count, mean, median, range, and missing.

Usage

aggregate_table(table, feature_function)

Arguments

table

extract spark dataframe - a table with patient_ids in one column, and fields generated from find_presence and dstats, as well as those joined from the patient table (and any other fields present in the table)

feature_function

2 column spark dataframe

feature - name of the column in the extract
func - calculation to perform on the column
- supported functions:
  - sum - sums the column values
  - mean - finds the mean of all column values
  - median - finds the median of all column values
  - value_count - find unique values in the column and make a row for each, and counts the total number of each unique value
  - range - outputs the min and max values in the column
  - missing - null value count in column

Value

returns a 5 column dataframe:

characteristics
- if func = sum or median or mean or min or max, then display the feature value in the feature column
- if func = value_count, then display the unique values in the extract column, creating a new row for each unique value
feature - the feature from the input file the characteristic was generated from
func - the function type from the input file that was used on the feature
value_1
- the value of the function output indicated by the user
- if func = range: min
- if func = missing: sum
value_2
- if func = sum or value_count: value/(count of unique patient_ids in extract table input
- if func = mean: standard deviation
- if func = median: IQR (0.25 to 0.75 quartile range)
- if func = range: max
- if func = missing: value/(count of unique patient_ids in extract table input

Details

feature_function input table example:

| feature       | func    |
|---------------|-------------|
| age           | mean        |
| age           | median      |
| age           | range       |
| weight        | mean        |
| weight        | median      |
| race          | value_count |
| sex           | value_count |
| depression    | sum         |
| diabetes      | sum         |
| dvt           | sum         |
| hyptertension | sum         |
| asthma        | sum         |
| sleep apnea   | sum         |

table_1 = aggregate_table(table=extract, feature_function=feature_functions)

head(table_1)
| characteristic                              | feature       | func        | value_1 | value_2 |
|---------------------------------------------|---------------|-------------|---------|---------|
| age                                         | age           | mean        | 50      | 15      |
| age                                         | age           | median      | 52      | 10      |
| age                                         | age           | range       | 30      | 80      |
| weight                                      | weight        | mean        | 100     | 10      |
| weight                                      | weight        | median      | 130     | 40      |
| white                                       | race          | value_count | 1000    | 0.18    |
| black or african american                   | race          | value_count | 200     | 0.05    |
| asian                                       | race          | value_count | 600     | 0.1     |
| american indian or alaska   native          | race          | value_count | 200     | 0.05    |
| native hawaiian or other   pacific islander | race          | value_count | 200     | 0.05    |
| male                                        | sex           | value_count | 3000    | 0.5     |
| female                                      | sex           | value_count | 3000    | 0.5     |
| depression                                  | depression    | sum         | 100     | 0.05    |
| diabetes                                    | diabetes      | sum         | 200     | 0.07    |
| dvt                                         | dvt           | sum         | 300     | 0.09    |
| hyptertension                               | hyptertension | sum         | 400     | 0.1     |
| asthma                                      | asthma        | sum         | 500     | 0.11    |
| sleep apnea                                 | sleep apnea   | sum         | 600     | 0.12    |