Function `profile_coverage`

def profile_coverage(database=None)

Description

The profile_coverage function leverages the Python DataPrep package ((https://pypi.org/project/dataprep/) to generate a patient level data coverage report for a cohort. The report will allow you to quickly understand patient demographics, patient record lengths, medication and lab results coverage, and much more.

Inputs

database - dataset database name to use

Returns

Returns both the create_report function of the DataPrep package as well as a pandas dataframe of the patient coverage table generated.

Patient coverage table consists of the following fields:

if patient table is present, create the following fields if present:
- patient_id
- sex
- race
- ethnicity
- marital_status
- year_of_birth
- month_year_death
- age - the age of the patient at the time of the dataset download
  - date_created from dataset_details
- patient_regional_location
if encounter table is present, create the following fields:
- encounter_count: the total number of encounters each patient has in the dataset
- first_encounter_date: the date of the earliest encounter of each patient in the dataset
- last_encounter_date: the date of the latest encounter of each patient in the dataset
- encounter_date_range_months: the difference in months between the first and last encounter dates of a patient - round to the nearest month (datediff, set to months)
if diagnosis table is present, create the following fields:
- diagnosis_count: the total number of diagnoses each patient has in the dataset
- diagnosis_unique_code_count: the number of unique diagnosis codes each patient has in the dataset
- first_diagnosis_date: the date of the earliest diagnosis of each patient in the dataset
- last_diagnosis_date: the date of the latest diagnosis of each patient in the dataset
- diagnosis_date_range_months: the difference in months between the first and last diagnosis dates of a patient
if procedure table is present, create the following fields:
- procedure_count: the total number of procedures each patient has in the dataset
- procedure_unique_code_count: the number of unique procedure codes each patient has in the dataset
- first_procedure_date: the date of the earliest procedure of each patient in the dataset
- last_procedure_date: the date of the latest procedure of each patient in the dataset
- procedure_date_range_months: the difference in months between the first and last procedure dates of a patient
if medication_ingredient table is present, create the following fields:
- medication_ingredient_count: the total number of medication ingredients each patient has in the dataset
- medication_ingredient_unique_code_count: the number of unique medication ingredient codes each patient has in the dataset
- first_medication_ingredient_date:the date of the earliest medication ingredient of each patient in the dataset
- last_medication_ingredient_date: the date of the latest medication ingredient of each patient in the dataset
- medication_ingredient_date_range_months: the difference in months between the first and last medication ingredient dates of a patient
if medication_drug table is present, create the following fields:
- medication_drug_count: the total number of medication drugs each patient has in the dataset
- medication_drug_unique_code_count: the number of unique medication drug codes each patient has in the dataset
- first_medication_drug_date:the date of the earliest medication drug of each patient in the dataset
- last_medication_drug_date: the date of the latest medication drug of each patient in the dataset
- medication_drug_date_range_months: the difference in months between the first and last medication drug dates of a patient
if lab_result table is present, create the following fields:
- lab_result_count: the total number of lab results each patient has in the dataset
- lab_result_unique_code_count: the number of unique lab result codes each patient has in the dataset
- first_lab_result_date:the date of the earliest lab result of each patient in the dataset
- last_lab_result_date: the date of the latest lab result of each patient in the dataset
- lab_result_date_range_months: the difference in months between the first and last lab result dates of a patient
if vitals_signs table is present, create the following fields:
- vitals_signs_count: the total number of vitals signs each patient has in the dataset
- vitals_signs_unique_code_count: the number of unique vital signs codes each patient has in the dataset
- first_vitals_signs_date: the date of the earliest vital signs of each patient in the dataset
- last_vitals_signs_date: the date of the latest vital signs of each patient in the dataset
- vitals_signs_date_range_months: the difference in months between the first and last vitals signs dates of a patient
if tumor table is present, create the following fields:
- tumor_count: the total number of rows in the tumor table of each patient has in the dataset
- tumor_site_code_count: the number of tumor site codes each patient has in the dataset
- morphology_code_count: the number of morphology codes each patient has in the dataset
- tumor_site_unique_code_count: the number of unique tumor site codes each patient has in the dataset
- morphology_unique_code_count: the number of unique morphology codes each patient has in the dataset
- first_tumor_diagnosis_date: the date of the earliest tumor diagnosis of each patient in the dataset
- last_tumor_diagnosis_date: the date of the latest tumor diagnosis of each patient in the dataset
- tumor_diagnosis_date_range_months: the difference in months between the first and last tumor diagnosis dates of a patient
- first_observation_date: the date of the earliest observation of each patient in the dataset
- last_observation_date: the date of the latest observation of each patient in the dataset
- observation_date_range_months: the difference in months between the first and last observation dates of a patient
- stage_code_count: the number of stage codes each patient has in the dataset
- stage_code_unique_count: the number of unique stage codes each patient has in the dataset
if tumor_properties table is present, create the following fields:
- tumor_properties_count: the total number of rows in the tumor_properties table each patient has in the dataset
- tumor_properties_unique_code_count: the number of unique tumor property codes each patient has in the dataset
if oncology_treatment table is present, create the following fields:
- oncology_treatment_count: the total number of rows in the oncology_treatment table each patient has in the dataset
- oncology_treatment_unique_code_count: he number of unique oncology treatment codes each patient has in the dataset
- first_oncology_treatment_start_date: the date of the earliest oncology_treatment_start date of each patient in the dataset
- last_oncology_treatment_start_date: the date of the latest oncology_treatment_start_date of each patient in the dataset
- oncology_treatment_start_date_range_months: the difference in months between the first and last oncology_treatment_start_date for each patient in the dataset
if the genomic table is present, create the following fields:
- genomic_count: the total number of genomics each patient has in the dataset
- genomic_unique_code_count: the number of unique genomics codes each patient has in the dataset
- first_genomic_date: the date of the earliest genomics of each patient in the dataset
- last_genomic_date: the date of the latest genomics of each patient in the dataset
- genomic_date_range_months: the difference in months between the first and last genomics dates of a patient
if the member_enrollment table is present, create the following fields:
- member_enrollment_count: the total number of member enrollment rows each patient has in the dataset
- first_effective_date: the date of the earliest effective_date of each patient in the dataset
- last_effective_date: the date of the latest effective_date of each patient in the dataset
- effective_date_range_months: the difference in months between the first and last effective_date dates of a patient - round to the nearest month (datediff, set to months)
- first_temination_date: the date of the earliest termination_date of each patient in the dataset
- last_termination_date: the date of the latest termination_date of each patient in the dataset
- termination_date_range_months: the difference in months between the first and last termination_date dates of a patient - round to the nearest month (datediff, set to months)
if the claim_header table is present, create the following fields:
- claim_header_count: the total number of claim headers each patient has in the dataset
- first_service_from_date: the date of the earliest service_from_date of each patient in the dataset
- last_service_from_date: the date of the latest service_from_date of each patient in the dataset
- service_from_date_range_months (numerical): the difference in months between the first and last service_from_date dates of a patient - round to the nearest month (datediff, set to months)
- total_proxy_cost: the sum of the total_proxy_cost of each patient in the dataset
if the claim_line table is present, create the following fields:
- claim_line_count: the total number of claim lines each patient has in the dataset
- first_service_date: the date of the earliest service_date of each patient in the dataset
- last_service_date: the date of the latest service_date of each patient in the dataset
- service_date_range_months: the difference in months between the first and last service_date dates of a patient - round to the nearest month (datediff, set to months)

Example

dataworks_df = profile_coverage(database='database')

displays profile report
returns dataworks_df dataframe for use later

R functions