external-link R functions

trinetx.aggregate_table API documentation

Function aggregate_table

def aggregate_table(table=None, feature_function=None)

Description


The aggregate_table function allows a user to quickly convert a patient level table into an aggregated cohort table with demographics, comorbidities, medications, labs, and more. Supported aggregations include sum, value_count, mean, median, range, and missing.

Inputs


  • table: extract spark dataframe - a table with patient_ids in one column, and fields generated from find_presence and dstats, as well as those joined from the patient table (and any other fields present in the table)

  • feature_function: 2 column spark dataframe

    • feature - name of the column in the extract

    • function - calculation to perform on the column

      • supported functions:

        • sum - sums the column values

        • mean - finds the mean of all column values

        • median - finds the median of all column values

        • value_count - find unique values in the column and make a row for each, and counts the total number of each unique value

        • range - outputs the min and max values in the column

        • missing - null value count in column

Returns


  • returns a 5 column dataframe:

    • characteristic

      • if function = sum or median or mean or min or max, then display the feature value in the feature column

      • if function = value_count, then display the unique values in the extract column, creating a new row for each unique value

    • feature - the feature from the input file the characteristic was generated from

    • function - the function type from the input file that was used on the feature

    • value_1

      • the value of the function output indicated by the user

      • if function = range: min

      • if function = missing: sum

    • value_2

      • if function = sum or value_count: value/(count of unique patient_ids in extract table input)

      • if function = mean: standard deviation

      • if function = median: IQR (0.25 to 0.75 quartile range)

      • if function = range: max

      • if function = missing: value/(count of unique patient_ids in extract table input)

Example


feature_function input table example:

| feature       | function    |
|---------------|-------------|
| age           | mean        |
| age           | median      |
| age           | range       |
| weight        | mean        |
| weight        | median      |
| race          | value_count |
| sex           | value_count |
| depression    | sum         |
| diabetes      | sum         |
| dvt           | sum         |
| hypertension  | sum         |
| asthma        | sum         |
| sleep apnea   | sum         |

table_1 = aggregate_table(table=extract, feature_function=feature_functions)
table_1.head()
| characteristic                              | feature       | function    | value_1 | value_2 |
|---------------------------------------------|---------------|-------------|---------|---------|
| age                                         | age           | mean        | 50      | 15      |
| age                                         | age           | median      | 52      | 10      |
| age                                         | age           | range       | 30      | 80      |
| weight                                      | weight        | mean        | 100     | 10      |
| weight                                      | weight        | median      | 130     | 40      |
| white                                       | race          | value_count | 1000    | 0.18    |
| black or african american                   | race          | value_count | 200     | 0.05    |
| asian                                       | race          | value_count | 600     | 0.1     |
| american indian or alaska   native          | race          | value_count | 200     | 0.05    |
| native hawaiian or other   pacific islander | race          | value_count | 200     | 0.05    |
| male                                        | sex           | value_count | 3000    | 0.5     |
| female                                      | sex           | value_count | 3000    | 0.5     |
| depression                                  | depression    | sum         | 100     | 0.05    |
| diabetes                                    | diabetes      | sum         | 200     | 0.07    |
| dvt                                         | dvt           | sum         | 300     | 0.09    |
| hypertension                                | hypertension  | sum         | 400     | 0.1     |
| asthma                                      | asthma        | sum         | 500     | 0.11    |
| sleep apnea                                 | sleep apnea   | sum         | 600     | 0.12    |