Function `aggregate_table`

def aggregate_table(table=None, feature_function=None)

Description

The aggregate_table function allows a user to quickly convert a patient level table into an aggregated cohort table with demographics, comorbidities, medications, labs, and more. Supported aggregations include sum, value_count, mean, median, range, and missing.

Inputs

table: extract spark dataframe - a table with patient_ids in one column, and fields generated from find_presence and dstats, as well as those joined from the patient table (and any other fields present in the table)
feature_function: 2 column spark dataframe
- feature - name of the column in the extract
- function - calculation to perform on the column
  - supported functions:
    - sum - sums the column values
    - mean - finds the mean of all column values
    - median - finds the median of all column values
    - value_count - find unique values in the column and make a row for each, and counts the total number of each unique value
    - range - outputs the min and max values in the column
    - missing - null value count in column

Returns

returns a 5 column dataframe:
- characteristic
  - if function = sum or median or mean or min or max, then display the feature value in the feature column
  - if function = value_count, then display the unique values in the extract column, creating a new row for each unique value
- feature - the feature from the input file the characteristic was generated from
- function - the function type from the input file that was used on the feature
- value_1
  - the value of the function output indicated by the user
  - if function = range: min
  - if function = missing: sum
- value_2
  - if function = sum or value_count: value/(count of unique patient_ids in extract table input)
  - if function = mean: standard deviation
  - if function = median: IQR (0.25 to 0.75 quartile range)
  - if function = range: max
  - if function = missing: value/(count of unique patient_ids in extract table input)

Example

feature_function input table example:

| feature       | function    |
|---------------|-------------|
| age           | mean        |
| age           | median      |
| age           | range       |
| weight        | mean        |
| weight        | median      |
| race          | value_count |
| sex           | value_count |
| depression    | sum         |
| diabetes      | sum         |
| dvt           | sum         |
| hypertension  | sum         |
| asthma        | sum         |
| sleep apnea   | sum         |

table_1 = aggregate_table(table=extract, feature_function=feature_functions)
table_1.head()
| characteristic                              | feature       | function    | value_1 | value_2 |
|---------------------------------------------|---------------|-------------|---------|---------|
| age                                         | age           | mean        | 50      | 15      |
| age                                         | age           | median      | 52      | 10      |
| age                                         | age           | range       | 30      | 80      |
| weight                                      | weight        | mean        | 100     | 10      |
| weight                                      | weight        | median      | 130     | 40      |
| white                                       | race          | value_count | 1000    | 0.18    |
| black or african american                   | race          | value_count | 200     | 0.05    |
| asian                                       | race          | value_count | 600     | 0.1     |
| american indian or alaska   native          | race          | value_count | 200     | 0.05    |
| native hawaiian or other   pacific islander | race          | value_count | 200     | 0.05    |
| male                                        | sex           | value_count | 3000    | 0.5     |
| female                                      | sex           | value_count | 3000    | 0.5     |
| depression                                  | depression    | sum         | 100     | 0.05    |
| diabetes                                    | diabetes      | sum         | 200     | 0.07    |
| dvt                                         | dvt           | sum         | 300     | 0.09    |
| hypertension                                | hypertension  | sum         | 400     | 0.1     |
| asthma                                      | asthma        | sum         | 500     | 0.11    |
| sleep apnea                                 | sleep apnea   | sum         | 600     | 0.12    |

R functions