external-link R functions

trinetx.profile_coverage API documentation

Function profile_coverage

def profile_coverage(database=None)

Description


The profile_coverage function leverages the Python DataPrep package ((https://pypi.org/project/dataprep/) to generate a patient level data coverage report for a cohort. The report will allow you to quickly understand patient demographics, patient record lengths, medication and lab results coverage, and much more.

Inputs


  • database - dataset database name to use

Returns


Returns both the create_report function of the DataPrep package as well as a pandas dataframe of the patient coverage table generated.

Patient coverage table consists of the following fields:

  • if patient table is present, create the following fields if present:

    • patient_id

    • sex

    • race

    • ethnicity

    • marital_status

    • year_of_birth

    • month_year_death

    • age - the age of the patient at the time of the dataset download

      • date_created from dataset_details
    • patient_regional_location

  • if encounter table is present, create the following fields:

    • encounter_count: the total number of encounters each patient has in the dataset

    • first_encounter_date: the date of the earliest encounter of each patient in the dataset

    • last_encounter_date: the date of the latest encounter of each patient in the dataset

    • encounter_date_range_months: the difference in months between the first and last encounter dates of a patient - round to the nearest month (datediff, set to months)

  • if diagnosis table is present, create the following fields:

    • diagnosis_count: the total number of diagnoses each patient has in the dataset

    • diagnosis_unique_code_count: the number of unique diagnosis codes each patient has in the dataset

    • first_diagnosis_date: the date of the earliest diagnosis of each patient in the dataset

    • last_diagnosis_date: the date of the latest diagnosis of each patient in the dataset

    • diagnosis_date_range_months: the difference in months between the first and last diagnosis dates of a patient

  • if procedure table is present, create the following fields:

    • procedure_count: the total number of procedures each patient has in the dataset

    • procedure_unique_code_count: the number of unique procedure codes each patient has in the dataset

    • first_procedure_date: the date of the earliest procedure of each patient in the dataset

    • last_procedure_date: the date of the latest procedure of each patient in the dataset

    • procedure_date_range_months: the difference in months between the first and last procedure dates of a patient

  • if medication_ingredient table is present, create the following fields:

    • medication_ingredient_count: the total number of medication ingredients each patient has in the dataset

    • medication_ingredient_unique_code_count: the number of unique medication ingredient codes each patient has in the dataset

    • first_medication_ingredient_date:the date of the earliest medication ingredient of each patient in the dataset

    • last_medication_ingredient_date: the date of the latest medication ingredient of each patient in the dataset

    • medication_ingredient_date_range_months: the difference in months between the first and last medication ingredient dates of a patient

  • if medication_drug table is present, create the following fields:

    • medication_drug_count: the total number of medication drugs each patient has in the dataset

    • medication_drug_unique_code_count: the number of unique medication drug codes each patient has in the dataset

    • first_medication_drug_date:the date of the earliest medication drug of each patient in the dataset

    • last_medication_drug_date: the date of the latest medication drug of each patient in the dataset

    • medication_drug_date_range_months: the difference in months between the first and last medication drug dates of a patient

  • if lab_result table is present, create the following fields:

    • lab_result_count: the total number of lab results each patient has in the dataset

    • lab_result_unique_code_count: the number of unique lab result codes each patient has in the dataset

    • first_lab_result_date:the date of the earliest lab result of each patient in the dataset

    • last_lab_result_date: the date of the latest lab result of each patient in the dataset

    • lab_result_date_range_months: the difference in months between the first and last lab result dates of a patient

  • if vitals_signs table is present, create the following fields:

    • vitals_signs_count: the total number of vitals signs each patient has in the dataset

    • vitals_signs_unique_code_count: the number of unique vital signs codes each patient has in the dataset

    • first_vitals_signs_date: the date of the earliest vital signs of each patient in the dataset

    • last_vitals_signs_date: the date of the latest vital signs of each patient in the dataset

    • vitals_signs_date_range_months: the difference in months between the first and last vitals signs dates of a patient

  • if tumor table is present, create the following fields:

    • tumor_count: the total number of rows in the tumor table of each patient has in the dataset

    • tumor_site_code_count: the number of tumor site codes each patient has in the dataset

    • morphology_code_count: the number of morphology codes each patient has in the dataset

    • tumor_site_unique_code_count: the number of unique tumor site codes each patient has in the dataset

    • morphology_unique_code_count: the number of unique morphology codes each patient has in the dataset

    • first_tumor_diagnosis_date: the date of the earliest tumor diagnosis of each patient in the dataset

    • last_tumor_diagnosis_date: the date of the latest tumor diagnosis of each patient in the dataset

    • tumor_diagnosis_date_range_months: the difference in months between the first and last tumor diagnosis dates of a patient

    • first_observation_date: the date of the earliest observation of each patient in the dataset

    • last_observation_date: the date of the latest observation of each patient in the dataset

    • observation_date_range_months: the difference in months between the first and last observation dates of a patient

    • stage_code_count: the number of stage codes each patient has in the dataset

    • stage_code_unique_count: the number of unique stage codes each patient has in the dataset

  • if tumor_properties table is present, create the following fields:

    • tumor_properties_count: the total number of rows in the tumor_properties table each patient has in the dataset

    • tumor_properties_unique_code_count: the number of unique tumor property codes each patient has in the dataset

  • if oncology_treatment table is present, create the following fields:

    • oncology_treatment_count: the total number of rows in the oncology_treatment table each patient has in the dataset

    • oncology_treatment_unique_code_count: he number of unique oncology treatment codes each patient has in the dataset

    • first_oncology_treatment_start_date: the date of the earliest oncology_treatment_start date of each patient in the dataset

    • last_oncology_treatment_start_date: the date of the latest oncology_treatment_start_date of each patient in the dataset

    • oncology_treatment_start_date_range_months: the difference in months between the first and last oncology_treatment_start_date for each patient in the dataset

  • if the genomic table is present, create the following fields:

    • genomic_count: the total number of genomics each patient has in the dataset

    • genomic_unique_code_count: the number of unique genomics codes each patient has in the dataset

    • first_genomic_date: the date of the earliest genomics of each patient in the dataset

    • last_genomic_date: the date of the latest genomics of each patient in the dataset

    • genomic_date_range_months: the difference in months between the first and last genomics dates of a patient

  • if the member_enrollment table is present, create the following fields:

    • member_enrollment_count: the total number of member enrollment rows each patient has in the dataset

    • first_effective_date: the date of the earliest effective_date of each patient in the dataset

    • last_effective_date: the date of the latest effective_date of each patient in the dataset

    • effective_date_range_months: the difference in months between the first and last effective_date dates of a patient - round to the nearest month (datediff, set to months)

    • first_temination_date: the date of the earliest termination_date of each patient in the dataset

    • last_termination_date: the date of the latest termination_date of each patient in the dataset

    • termination_date_range_months: the difference in months between the first and last termination_date dates of a patient - round to the nearest month (datediff, set to months)

  • if the claim_header table is present, create the following fields:

    • claim_header_count: the total number of claim headers each patient has in the dataset

    • first_service_from_date: the date of the earliest service_from_date of each patient in the dataset

    • last_service_from_date: the date of the latest service_from_date of each patient in the dataset

    • service_from_date_range_months (numerical): the difference in months between the first and last service_from_date dates of a patient - round to the nearest month (datediff, set to months)

    • total_proxy_cost: the sum of the total_proxy_cost of each patient in the dataset

  • if the claim_line table is present, create the following fields:

    • claim_line_count: the total number of claim lines each patient has in the dataset

    • first_service_date: the date of the earliest service_date of each patient in the dataset

    • last_service_date: the date of the latest service_date of each patient in the dataset

    • service_date_range_months: the difference in months between the first and last service_date dates of a patient - round to the nearest month (datediff, set to months)

Example


dataworks_df = profile_coverage(database='database')

  • displays profile report

  • returns dataworks_df dataframe for use later