Skip to contents

The aggregate_table function allows a user to quickly convert a patient level table into an aggregated cohort table with demographics, comordities, medications, labs, and more. Supported aggregations include sum, value_count, mean, median, range, and missing.

Usage

aggregate_table(table, feature_function)

Arguments

table

extract spark dataframe - a table with patient_ids in one column, and fields generated from find_presence and dstats, as well as those joined from the patient table (and any other fields present in the table)

feature_function

2 column spark dataframe

  • feature - name of the column in the extract

  • func - calculation to perform on the column

    • supported functions:

      • sum - sums the column values

      • mean - finds the mean of all column values

      • median - finds the median of all column values

      • value_count - find unique values in the column and make a row for each, and counts the total number of each unique value

      • range - outputs the min and max values in the column

      • missing - null value count in column

Value

returns a 5 column dataframe:

  • characteristics

    • if func = sum or median or mean or min or max, then display the feature value in the feature column

    • if func = value_count, then display the unique values in the extract column, creating a new row for each unique value

  • feature - the feature from the input file the characteristic was generated from

  • func - the function type from the input file that was used on the feature

  • value_1

    • the value of the function output indicated by the user

    • if func = range: min

    • if func = missing: sum

  • value_2

    • if func = sum or value_count: value/(count of unique patient_ids in extract table input

    • if func = mean: standard deviation

    • if func = median: IQR (0.25 to 0.75 quartile range)

    • if func = range: max

    • if func = missing: value/(count of unique patient_ids in extract table input

Details

feature_function input table example:

| feature       | func    |
|---------------|-------------|
| age           | mean        |
| age           | median      |
| age           | range       |
| weight        | mean        |
| weight        | median      |
| race          | value_count |
| sex           | value_count |
| depression    | sum         |
| diabetes      | sum         |
| dvt           | sum         |
| hyptertension | sum         |
| asthma        | sum         |
| sleep apnea   | sum         |

table_1 = aggregate_table(table=extract, feature_function=feature_functions)

head(table_1)
| characteristic                              | feature       | func        | value_1 | value_2 |
|---------------------------------------------|---------------|-------------|---------|---------|
| age                                         | age           | mean        | 50      | 15      |
| age                                         | age           | median      | 52      | 10      |
| age                                         | age           | range       | 30      | 80      |
| weight                                      | weight        | mean        | 100     | 10      |
| weight                                      | weight        | median      | 130     | 40      |
| white                                       | race          | value_count | 1000    | 0.18    |
| black or african american                   | race          | value_count | 200     | 0.05    |
| asian                                       | race          | value_count | 600     | 0.1     |
| american indian or alaska   native          | race          | value_count | 200     | 0.05    |
| native hawaiian or other   pacific islander | race          | value_count | 200     | 0.05    |
| male                                        | sex           | value_count | 3000    | 0.5     |
| female                                      | sex           | value_count | 3000    | 0.5     |
| depression                                  | depression    | sum         | 100     | 0.05    |
| diabetes                                    | diabetes      | sum         | 200     | 0.07    |
| dvt                                         | dvt           | sum         | 300     | 0.09    |
| hyptertension                               | hyptertension | sum         | 400     | 0.1     |
| asthma                                      | asthma        | sum         | 500     | 0.11    |
| sleep apnea                                 | sleep apnea   | sum         | 600     | 0.12    |