Function
dstats
def dstats(database: str, tables: list, code_list: sql.dataframe.DataFrame, index_date: sql.dataframe.DataFrame, index_days_start: int, index_days_end: int, allow_subset=False)
-
Description
The dstats function is used to find various patient lab value aggregations and calculations around an index date.
Inputs
-
database - database name to use
-
tables - names of tables to use in a dataset that is a list ["lab_result"] or ["lab_result", "vitals_signs"]]
- supported tables: lab_result and vitals_signs
-
code_list - user defined table with 3 mandatory columns:
-
mandatory columns
-
feature - the feature the exact code will roll up to and name of the column in the output (letters, numbers, underscores only, no spaces, not case sensitive)
-
code - exact code
-
code_system - RxNorm, LOINC, etc
-
-
optional columns
-
qualifier_num - looks across lab_results and vital_signs num value fields at the same time
-
if a user only passes in lab_results or vitals_signs to dstats, the function checks only that table
-
users can create as many additional columns as they want, but column names must be unique
-
syntax for qualifying lab numeric values:
-
'<=X': less than or equal to X
-
'<X': less than X
-
'>=X': greater than or equal to X
-
'>X': greater than X
-
'~=X': not equal to X
-
'X:Y': between X and Y
-
-
if cell is left blank, system skips and assumes no qualification for that code
-
one qualification of a code does not apply to the entire feature; in the case there is more than one code mapped to the feature - every code must be qualified
-
-
-
index_date - user defined table with two columns
-
patient_id
-
index_date
-
-
index_days_start - start period relative to index date in days
-
None means anytime
-
negative means before the index date
-
positive means after
-
-
index_days_end - end period relative to index date in days
-
None means anytime
-
negative means before the index date
-
positive means after
-
-
allow_subset (optional argument) - checks whether the number of patients in the index event input table is the same as the dataset cohort number - would default to false, but if set to true would warn and continue
- if True and there are fewer patients in the index_date_df table, only returns the patient_ids present in index_date_df
Returns
Returns a table with the following columns (unique key of the table is patient_id + feature)
-
patient_id
-
feature - each unique feature value per patient ID
-
min - the minimum lab value that appeared in time window defined by the user in relation to the index event
-
max - the maximum lab value that appeared in time window defined by the user in relation to the index event
-
median - the median of all lab values that appeared in time window defined by the user in relation to the index event.
- If it is even values, take average of the 2.
-
mean - the mean of all lab values that appeared in time window defined by the user in relation to the index event
-
first - the first lab value that appeared in time window defined by the user in relation to the index event
- If multiple values same day, system displays median
-
last - the last lab value that appeared in time window defined by the user in relation to the index event
- If multiple values same day, system displays median
Example
>>>dstats_table = dstats(database='covid_db', tables=['procedure','diagnosis'], code_list=code_list, index_date=sg_index_date, index_days_start=-365, index_days_end=-1) >>>dstats_table.head() | patient id | feature | min | max | median | mean | first | last | |------------|----------------|-----|-----|--------|------|-------|-------| | 1 | alkaline_phos | 50 | 30 | 33 | 40 | 45 | 50 | | 1 | aspartate_amin | 50 | 30 | 33 | 40 | 45 | 50 | | 2 | alkaline_phos | 50 | 30 | 33 | 40 | 45 | 50 | | 2 | aspartate_amin | 50 | 30 | 33 | 40 | 45 | 50 | | 3 | alkaline_phos | 50 | 30 | 33 | 40 | 45 | 50 | | 3 | aspartate_amin | 50 | 30 | 33 | 40 | 45 | 50 |
-