Function
aggregate_table
def aggregate_table(table=None, feature_function=None)
-
Description
The aggregate_table function allows a user to quickly convert a patient level table into an aggregated cohort table with demographics, comorbidities, medications, labs, and more. Supported aggregations include sum, value_count, mean, median, range, and missing.
Inputs
-
table: extract spark dataframe - a table with patient_ids in one column, and fields generated from find_presence and dstats, as well as those joined from the patient table (and any other fields present in the table)
-
feature_function: 2 column spark dataframe
-
feature - name of the column in the extract
-
function - calculation to perform on the column
-
supported functions:
-
sum - sums the column values
-
mean - finds the mean of all column values
-
median - finds the median of all column values
-
value_count - find unique values in the column and make a row for each, and counts the total number of each unique value
-
range - outputs the min and max values in the column
-
missing - null value count in column
-
-
-
Returns
-
returns a 5 column dataframe:
-
characteristic
-
if function = sum or median or mean or min or max, then display the feature value in the feature column
-
if function = value_count, then display the unique values in the extract column, creating a new row for each unique value
-
-
feature - the feature from the input file the characteristic was generated from
-
function - the function type from the input file that was used on the feature
-
value_1
-
the value of the function output indicated by the user
-
if function = range: min
-
if function = missing: sum
-
-
value_2
-
if function = sum or value_count: value/(count of unique patient_ids in extract table input)
-
if function = mean: standard deviation
-
if function = median: IQR (0.25 to 0.75 quartile range)
-
if function = range: max
-
if function = missing: value/(count of unique patient_ids in extract table input)
-
-
Example
feature_function input table example: | feature | function | |---------------|-------------| | age | mean | | age | median | | age | range | | weight | mean | | weight | median | | race | value_count | | sex | value_count | | depression | sum | | diabetes | sum | | dvt | sum | | hypertension | sum | | asthma | sum | | sleep apnea | sum | table_1 = aggregate_table(table=extract, feature_function=feature_functions) table_1.head() | characteristic | feature | function | value_1 | value_2 | |---------------------------------------------|---------------|-------------|---------|---------| | age | age | mean | 50 | 15 | | age | age | median | 52 | 10 | | age | age | range | 30 | 80 | | weight | weight | mean | 100 | 10 | | weight | weight | median | 130 | 40 | | white | race | value_count | 1000 | 0.18 | | black or african american | race | value_count | 200 | 0.05 | | asian | race | value_count | 600 | 0.1 | | american indian or alaska native | race | value_count | 200 | 0.05 | | native hawaiian or other pacific islander | race | value_count | 200 | 0.05 | | male | sex | value_count | 3000 | 0.5 | | female | sex | value_count | 3000 | 0.5 | | depression | depression | sum | 100 | 0.05 | | diabetes | diabetes | sum | 200 | 0.07 | | dvt | dvt | sum | 300 | 0.09 | | hypertension | hypertension | sum | 400 | 0.1 | | asthma | asthma | sum | 500 | 0.11 | | sleep apnea | sleep apnea | sum | 600 | 0.12 |
-