Function `find_date`

def find_date(database: str, tables: list, code_list: sql.dataframe.DataFrame, function: str, begin: str = None, end: str = None, patient_table: sql.dataframe.DataFrame = None) ‑> sql.dataframe.DataFrame

Description

The find_date function is used to find a patient’s date occurrence of a specific feature to help create an index date. It can be used to find the first, last, or random date of occurrence of a specific feature in a patient’s record.

Inputs

database - database name to use
tables - names of tables to use in a dataset that is a list [“diagnosis”], [“diagnosis”, “procedure”]
- supported tables: encounter, diagnosis, procedure, medication_ingredient, medication_drug, lab_result, vitals_signs
code_list - user defined table with 3 mandatory columns:
- mandatory columns
  - feature - the feature the exact code will roll up to and name of the matrix column in the output (letters, numbers, underscores only, no spaces, not case sensitive)
  - code - exact code
  - code_system - RxNorm, LOINC, etc
    - if a user is using the encounter table, code_system must equal "Encounter Type" and code maps to the "type" field in the encounter table
- optional columns
  - Supported columns are:
    - qualifier_num - looks across lab_result and vitals_signs num value fields at the same time
    - qualifier_text - looks across lab_result and vitals_signs text value fields at the same time
  - if a user only passes in lab_result or vitals_signs to find_date, the function checks only that table
  - users can create as many additional columns as they want, but column names must be unique and match the supported column names
  - syntax for qualifying lab numeric values:
    - '<=X': less than or equal to X
    - '<X': less than X
    - '>=X': greater than or equal to X
    - '>X': greater than X
    - '~=X': not equal to X
    - 'X:Y': between X and Y
  - syntax for qualifying lab categorical values
    - the user can enter any string they want - exact match
      - if a user wants to use multiple values for a categorical lab, repeat the row with the same code but different qualifier value
  - if cell is left blank, system skips and assumes no qualification for that code
  - one qualification of a code does not apply to the entire feature; in the case there is more than one code mapped to the feature - every code must be qualified
function - first, last, random
- first - each feature finds the first occurrence of any of the codes in the feature, as a date
  - works for labs, returns the date of the lab
- last - each feature finds the last occurrence of any of the codes in the feature, as a date
  - works for labs, returns the date of the lab
- random - Fortran seed approach to pull a random date based on all the occurrences of the feature within the relative time from index
- Null if there are no codes present within a feature for a patient
begin (optional argument) - earliest date to look in a patient's record (YYYY-MM-DD)
end (optional argument) - latest date to look in a patient's record (YYYY-MM-DD)
patient_table (optional argument)
- a dataframe with a single column called patient_id
- if this argument is present, use this, and not the patient dataset table
- this will supersede using the the patient table dataset

Returns

A dataframe with a patient_id column and a column for each unique feature value in the code_list input.

Examples

first_code_table = find_date(database='covid_db', tables=['procedure','diagnosis'], code_list=code_list, function='first')

first_code_table.head()

| patient_id | lung_transplant |
|------------|-----------------|
|          1 | 11/20/20        |
|          2 | 11/21/20        |
|          3 | 11/22/20        |
|          4 | 11/23/20        |
|          5 | 11/24/20        |

R functions