Function
find_presence
def find_presence(database: str, tables: list, code_list: sql.dataframe.DataFrame, index_date: sql.dataframe.DataFrame, function: str, index_days_start: int, index_days_end: int, allow_subset: bool = False) ‑> sql.dataframe.DataFrame
-
Description
The find_presence function is used to find the presence or absence of a specific feature around a defined index date of a patient's record. It can be used to find presence/absence, number of occurrences, date closest to the index date, or date closest to an index date in days.
Inputs
-
database - database name to use
-
tables - names of tables to use in a dataset that is a list ['diagnosis'], ['diagnosis','procedure']
- supported tables: encounter, diagnosis, procedure, medication_ingredient, medication_drug, lab_result, vitals_signs
-
code_list - user defined table with 3 mandatory columns:
-
mandatory columns
-
feature - the feature the exact code will roll up to and name of the matrix column in the output (letters, numbers, underscores only, no spaces, not case sensitive)
-
code - exact code
-
code_system - RxNorm, LOINC, etc
-
if a user is using the encounter table, code_system must equal "Encounter Type" and code maps to the "type" field in the encounter table
-
if a user enters the value month_year_death in code column and patient in code_system column, the function will use the value in the patient.month_year_death column for all calculations
-
-
-
optional columns
-
Supported columns are:
-
qualifier_num - looks across lab_result and vitals_signs num value fields at the same time
-
qualifier_text - looks across lab_result and vitals_signs text value fields at the same time
-
-
if a user only passes in lab_result or vitals_signs to find_presence, the function checks only that table
-
users can create as many additional columns as they want, but column names must be unique and match the supported column names
-
syntax for qualifying lab numeric values:
-
'<=X': less than or equal to X
-
'<X': less than X
-
'>=X': greater than or equal to X
-
'>X': greater than X
-
'~=X': not equal to X
-
'X:Y': between X and Y
-
-
syntax for qualifying lab categorical values
-
the user can enter any string they want - exact match
- if a user wants to use multiple values for a categorical lab, repeat the row with the same code but different qualifier value
-
-
if cell is left blank, system skips and assumes no qualification for that code
-
one qualification of a code does not apply to the entire feature; in the case there is more than one code mapped to the feature - every code must be qualified
-
-
-
index_date - user defined table with two columns
-
patient_id
-
index_date
-
-
function
-
date - outputs the closest date (as a date type) to the index date of the present code in the feature (Null if no presence)
- in the case of a tie, uses before the index date
-
boolean - outputs 1 for presence and 0 for absence
-
relative - outputs an integer value of how many days before or after the index date the closest presence code appeared, negative if before index, positive if after, Null if absent
- in case of a tie, uses the before the index date
-
count - number of unique dates a feature occurred in time window defined by the user in relation to the index event
- start_date is used for encounter
-
-
index_days_start - start period relative to index date in days
-
None means anytime
-
negative means before the index date
-
positive means after
-
-
index_days_end - end period relative to index date in days
-
None means anytime
-
negative means before the index date
-
positive means after
-
-
allow_subset (optional argument) - checks whether the number of patients in the index event input table is the same as the dataset cohort number - would default to false, but if set to true would warn and continue
- if True and there are fewer patients in the index_date table, only returns the patient_ids present in index_date_df
Returns
-
table with patient_id as the first column and each unique feature value in code list input as subsequent columns
-
each feature finds the function value the user chooses and populates the cells of the table
Examples
code list without qualifiers: | feature | code | code_system | |-----------------|----------|-------------| | lung_transplant | T86.81 | ICD-10-CM | | lung_transplant | Z48.24 | ICD-10-CM | | lung_transplant | Z94.2 | ICD-10-CM | | lung_transplant | T86.81 | ICD-10-CM | | lung_transplant | 1006036 | CPT | | lung_transplant | 88039007 | CPT | code list with qualifiers: | feature | code | code_system | qualifier_num | |--------------------|---------|-------------|---------------| | blood_pressure_sys | 8460-8 | LOINC | 100:160 | | blood_pressure_sys | 8461-6 | LOINC | 100:160 | | blood_pressure_sys | 8459-0 | LOINC | 100:160 | | blood_pressure_sys | 8450-9 | LOINC | 100:160 | | blood_pressure_sys | 87741-5 | LOINC | 100:160 | | blood_pressure_sys | 8480-6 | LOINC | 100:160 | present_absent_table = find_presence(database='covid_db', tables=['procedure','diagnosis'], code_list=code_list, index_date=sg_index_date, function='boolean', index_days_start=-365, index_days_end=-1) present_absent_table.head() | patient_id | lung_transplant | |------------|-----------------| | 1 | 0 | | 2 | 0 | | 3 | 0 | | 4 | 1 | | 5 | 1 |
-