Skip to contents

The find_date function is used to find a patient's date occurrence of a specific feature to help create an index date. It can be used to find the first, last, or random date of occurrence of a specific feature in a patient's record.

Usage

find_date(
  database,
  tables,
  code_list,
  func,
  begin = NA,
  end = NA,
  patient_table = NULL
)

Arguments

database

database name to use

tables

names of tables to use in a dataset that is a list c("diagnosis"), c("diagnosis", "procedure")

  • supported tables: encounter, diagnosis, procedure, medication_ingredient, medication_drug, lab_result, vitals_signs

code_list

user defined table with 3 mandatory columns:

  • mandatory columns

    • feature - the feature the exact code will roll up to and name of the matrix column in the output (letters, numbers, underscores only, no spaces, not case sensitive)

    • code - exact code

    • code_system - RxNorm, LOINC, etc

      • if a user is using the encounter table, code_system must equal "Encounter Type" and code maps to the "type" field in the encounter table

  • optional columns

    • Supported columns are:

      • qualifier_num - looks across lab_result and vitals_signs num value fields at the same time

      • qualifier_text - looks across lab_result and vitals_signs text value fields at the same time

    • if a user only passes in lab_result or vitals_signs to find_date, the function checks only that table

    • users can create as many additional columns as they want, but column names must be unique and match the supported column names

    • syntax for qualifying lab numeric values:

      • '<=X': less than or equal to X

      • '<X': less than X

      • '>=X': greater than or equal to X

      • '>X': greater than X

      • '~=X': not equal to X

      • 'X:Y': between X and Y

    • syntax for qualifying lab categorical values

      • the user can enter any string they want - exact match

        • if a user wants to use multiple values for a categorical lab, repeat the row with the same code but different qualifier value

    • if cell is left blank, system skips and assumes no qualification for that code

    • one qualification of a code does not apply to the entire feature; in the case there is more than one code mapped to the feature - every code must be qualified

func

first, last, random

  • first - each feature finds the first occurrence of any of the codes in the feature, as a date

    • works for labs, returns the date of the lab

  • last - each feature finds the last occurrence of any of the codes in the feature, as a date

    • works for labs, returns the date of the lab

  • random - Fortran seed approach to pull a random date based on all the occurrences of the feature within the relative time from index

  • NULL if there are no codes present within a feature for a patient

begin

(optional argument) - earliest date to look in a patient's record (YYYY-MM-DD)

end

(optional argument) - latest date to look in a patient's record (YYYY-MM-DD)

patient_table

(optional argument)

  • a dataframe with a single column called patient_id

  • if this argument is present, use this, and not the patient dataset table

  • this will supersede using the the patient table dataset

Value

A dataframe with a patient_id column and a column for each unique feature value in the code_list input.

Details

first_code_table = find_date(database='covid_db', tables=c('procedure','diagnosis'), code_list=code_list, func='first')
head(first_code_table)

| patient_id | lung_transplant |
|------------|-----------------|
|          1 | 11/20/20        |
|          2 | 11/21/20        |
|          3 | 11/22/20        |
|          4 | 11/23/20        |
|          5 | 11/24/20        |