Function
find_date
def find_date(database: str, tables: list, code_list: sql.dataframe.DataFrame, function: str, begin: str = None, end: str = None, patient_table: sql.dataframe.DataFrame = None) ‑> sql.dataframe.DataFrame
-
Description
The find_date function is used to find a patient’s date occurrence of a specific feature to help create an index date. It can be used to find the first, last, or random date of occurrence of a specific feature in a patient’s record.
Inputs
-
database - database name to use
-
tables - names of tables to use in a dataset that is a list [“diagnosis”], [“diagnosis”, “procedure”]
- supported tables: encounter, diagnosis, procedure, medication_ingredient, medication_drug, lab_result, vitals_signs
-
code_list - user defined table with 3 mandatory columns:
-
mandatory columns
-
feature - the feature the exact code will roll up to and name of the matrix column in the output (letters, numbers, underscores only, no spaces, not case sensitive)
-
code - exact code
-
code_system - RxNorm, LOINC, etc
- if a user is using the encounter table, code_system must equal "Encounter Type" and code maps to the "type" field in the encounter table
-
-
optional columns
-
Supported columns are:
-
qualifier_num - looks across lab_result and vitals_signs num value fields at the same time
-
qualifier_text - looks across lab_result and vitals_signs text value fields at the same time
-
-
if a user only passes in lab_result or vitals_signs to find_date, the function checks only that table
-
users can create as many additional columns as they want, but column names must be unique and match the supported column names
-
syntax for qualifying lab numeric values:
-
'<=X': less than or equal to X
-
'<X': less than X
-
'>=X': greater than or equal to X
-
'>X': greater than X
-
'~=X': not equal to X
-
'X:Y': between X and Y
-
-
syntax for qualifying lab categorical values
-
the user can enter any string they want - exact match
- if a user wants to use multiple values for a categorical lab, repeat the row with the same code but different qualifier value
-
-
if cell is left blank, system skips and assumes no qualification for that code
-
one qualification of a code does not apply to the entire feature; in the case there is more than one code mapped to the feature - every code must be qualified
-
-
-
function - first, last, random
-
first - each feature finds the first occurrence of any of the codes in the feature, as a date
- works for labs, returns the date of the lab
-
last - each feature finds the last occurrence of any of the codes in the feature, as a date
- works for labs, returns the date of the lab
-
random - Fortran seed approach to pull a random date based on all the occurrences of the feature within the relative time from index
-
Null if there are no codes present within a feature for a patient
-
-
begin (optional argument) - earliest date to look in a patient's record (YYYY-MM-DD)
-
end (optional argument) - latest date to look in a patient's record (YYYY-MM-DD)
-
patient_table (optional argument)
-
a dataframe with a single column called patient_id
-
if this argument is present, use this, and not the patient dataset table
-
this will supersede using the the patient table dataset
-
Returns
A dataframe with a patient_id column and a column for each unique feature value in the code_list input.
Examples
first_code_table = find_date(database='covid_db', tables=['procedure','diagnosis'], code_list=code_list, function='first') first_code_table.head() | patient_id | lung_transplant | |------------|-----------------| | 1 | 11/20/20 | | 2 | 11/21/20 | | 3 | 11/22/20 | | 4 | 11/23/20 | | 5 | 11/24/20 |
-