Function `find_presence`

def find_presence(database: str, tables: list, code_list: sql.dataframe.DataFrame, index_date: sql.dataframe.DataFrame, function: str, index_days_start: int, index_days_end: int, allow_subset: bool = False) ‑> sql.dataframe.DataFrame

Description

The find_presence function is used to find the presence or absence of a specific feature around a defined index date of a patient's record. It can be used to find presence/absence, number of occurrences, date closest to the index date, or date closest to an index date in days.

Inputs

database - database name to use
tables - names of tables to use in a dataset that is a list ['diagnosis'], ['diagnosis','procedure']
- supported tables: encounter, diagnosis, procedure, medication_ingredient, medication_drug, lab_result, vitals_signs
code_list - user defined table with 3 mandatory columns:
- mandatory columns
  - feature - the feature the exact code will roll up to and name of the matrix column in the output (letters, numbers, underscores only, no spaces, not case sensitive)
  - code - exact code
  - code_system - RxNorm, LOINC, etc
    - if a user is using the encounter table, code_system must equal "Encounter Type" and code maps to the "type" field in the encounter table
    - if a user enters the value month_year_death in code column and patient in code_system column, the function will use the value in the patient.month_year_death column for all calculations
- optional columns
  - Supported columns are:
    - qualifier_num - looks across lab_result and vitals_signs num value fields at the same time
    - qualifier_text - looks across lab_result and vitals_signs text value fields at the same time
  - if a user only passes in lab_result or vitals_signs to find_presence, the function checks only that table
  - users can create as many additional columns as they want, but column names must be unique and match the supported column names
  - syntax for qualifying lab numeric values:
    - '<=X': less than or equal to X
    - '<X': less than X
    - '>=X': greater than or equal to X
    - '>X': greater than X
    - '~=X': not equal to X
    - 'X:Y': between X and Y
  - syntax for qualifying lab categorical values
    - the user can enter any string they want - exact match
      - if a user wants to use multiple values for a categorical lab, repeat the row with the same code but different qualifier value
  - if cell is left blank, system skips and assumes no qualification for that code
  - one qualification of a code does not apply to the entire feature; in the case there is more than one code mapped to the feature - every code must be qualified
index_date - user defined table with two columns
- patient_id
- index_date
function
- date - outputs the closest date (as a date type) to the index date of the present code in the feature (Null if no presence)
  - in the case of a tie, uses before the index date
- boolean - outputs 1 for presence and 0 for absence
- relative - outputs an integer value of how many days before or after the index date the closest presence code appeared, negative if before index, positive if after, Null if absent
  - in case of a tie, uses the before the index date
- count - number of unique dates a feature occurred in time window defined by the user in relation to the index event
  - start_date is used for encounter
index_days_start - start period relative to index date in days
- None means anytime
- negative means before the index date
- positive means after
index_days_end - end period relative to index date in days
- None means anytime
- negative means before the index date
- positive means after
allow_subset (optional argument) - checks whether the number of patients in the index event input table is the same as the dataset cohort number - would default to false, but if set to true would warn and continue
- if True and there are fewer patients in the index_date table, only returns the patient_ids present in index_date_df

Returns

table with patient_id as the first column and each unique feature value in code list input as subsequent columns
each feature finds the function value the user chooses and populates the cells of the table

Examples

code list without qualifiers:

|     feature     |   code   | code_system |
|-----------------|----------|-------------|
| lung_transplant | T86.81   | ICD-10-CM   |
| lung_transplant | Z48.24   | ICD-10-CM   |
| lung_transplant | Z94.2    | ICD-10-CM   |
| lung_transplant | T86.81   | ICD-10-CM   |
| lung_transplant | 1006036  | CPT         |
| lung_transplant | 88039007 | CPT         |

code list with qualifiers:

|      feature       |  code   | code_system | qualifier_num |
|--------------------|---------|-------------|---------------|
| blood_pressure_sys | 8460-8  | LOINC       | 100:160       |
| blood_pressure_sys | 8461-6  | LOINC       | 100:160       |
| blood_pressure_sys | 8459-0  | LOINC       | 100:160       |
| blood_pressure_sys | 8450-9  | LOINC       | 100:160       |
| blood_pressure_sys | 87741-5 | LOINC       | 100:160       |
| blood_pressure_sys | 8480-6  | LOINC       | 100:160       |


present_absent_table = find_presence(database='covid_db', tables=['procedure','diagnosis'], code_list=code_list, index_date=sg_index_date, function='boolean', index_days_start=-365, index_days_end=-1)
present_absent_table.head()

| patient_id | lung_transplant |
|------------|-----------------|
|          1 |               0 |
|          2 |               0 |
|          3 |               0 |
|          4 |               1 |
|          5 |               1 |

R functions