Function
profile_coverage
def profile_coverage(database=None)
-
Description
The profile_coverage function leverages the Python DataPrep package ((https://pypi.org/project/dataprep/) to generate a patient level data coverage report for a cohort. The report will allow you to quickly understand patient demographics, patient record lengths, medication and lab results coverage, and much more.
Inputs
- database - dataset database name to use
Returns
Returns both the create_report function of the DataPrep package as well as a pandas dataframe of the patient coverage table generated.
Patient coverage table consists of the following fields:
-
if patient table is present, create the following fields if present:
-
patient_id
-
sex
-
race
-
ethnicity
-
marital_status
-
year_of_birth
-
month_year_death
-
age - the age of the patient at the time of the dataset download
- date_created from dataset_details
-
patient_regional_location
-
-
if encounter table is present, create the following fields:
-
encounter_count: the total number of encounters each patient has in the dataset
-
first_encounter_date: the date of the earliest encounter of each patient in the dataset
-
last_encounter_date: the date of the latest encounter of each patient in the dataset
-
encounter_date_range_months: the difference in months between the first and last encounter dates of a patient - round to the nearest month (datediff, set to months)
-
-
if diagnosis table is present, create the following fields:
-
diagnosis_count: the total number of diagnoses each patient has in the dataset
-
diagnosis_unique_code_count: the number of unique diagnosis codes each patient has in the dataset
-
first_diagnosis_date: the date of the earliest diagnosis of each patient in the dataset
-
last_diagnosis_date: the date of the latest diagnosis of each patient in the dataset
-
diagnosis_date_range_months: the difference in months between the first and last diagnosis dates of a patient
-
-
if procedure table is present, create the following fields:
-
procedure_count: the total number of procedures each patient has in the dataset
-
procedure_unique_code_count: the number of unique procedure codes each patient has in the dataset
-
first_procedure_date: the date of the earliest procedure of each patient in the dataset
-
last_procedure_date: the date of the latest procedure of each patient in the dataset
-
procedure_date_range_months: the difference in months between the first and last procedure dates of a patient
-
-
if medication_ingredient table is present, create the following fields:
-
medication_ingredient_count: the total number of medication ingredients each patient has in the dataset
-
medication_ingredient_unique_code_count: the number of unique medication ingredient codes each patient has in the dataset
-
first_medication_ingredient_date:the date of the earliest medication ingredient of each patient in the dataset
-
last_medication_ingredient_date: the date of the latest medication ingredient of each patient in the dataset
-
medication_ingredient_date_range_months: the difference in months between the first and last medication ingredient dates of a patient
-
-
if medication_drug table is present, create the following fields:
-
medication_drug_count: the total number of medication drugs each patient has in the dataset
-
medication_drug_unique_code_count: the number of unique medication drug codes each patient has in the dataset
-
first_medication_drug_date:the date of the earliest medication drug of each patient in the dataset
-
last_medication_drug_date: the date of the latest medication drug of each patient in the dataset
-
medication_drug_date_range_months: the difference in months between the first and last medication drug dates of a patient
-
-
if lab_result table is present, create the following fields:
-
lab_result_count: the total number of lab results each patient has in the dataset
-
lab_result_unique_code_count: the number of unique lab result codes each patient has in the dataset
-
first_lab_result_date:the date of the earliest lab result of each patient in the dataset
-
last_lab_result_date: the date of the latest lab result of each patient in the dataset
-
lab_result_date_range_months: the difference in months between the first and last lab result dates of a patient
-
-
if vitals_signs table is present, create the following fields:
-
vitals_signs_count: the total number of vitals signs each patient has in the dataset
-
vitals_signs_unique_code_count: the number of unique vital signs codes each patient has in the dataset
-
first_vitals_signs_date: the date of the earliest vital signs of each patient in the dataset
-
last_vitals_signs_date: the date of the latest vital signs of each patient in the dataset
-
vitals_signs_date_range_months: the difference in months between the first and last vitals signs dates of a patient
-
-
if tumor table is present, create the following fields:
-
tumor_count: the total number of rows in the tumor table of each patient has in the dataset
-
tumor_site_code_count: the number of tumor site codes each patient has in the dataset
-
morphology_code_count: the number of morphology codes each patient has in the dataset
-
tumor_site_unique_code_count: the number of unique tumor site codes each patient has in the dataset
-
morphology_unique_code_count: the number of unique morphology codes each patient has in the dataset
-
first_tumor_diagnosis_date: the date of the earliest tumor diagnosis of each patient in the dataset
-
last_tumor_diagnosis_date: the date of the latest tumor diagnosis of each patient in the dataset
-
tumor_diagnosis_date_range_months: the difference in months between the first and last tumor diagnosis dates of a patient
-
first_observation_date: the date of the earliest observation of each patient in the dataset
-
last_observation_date: the date of the latest observation of each patient in the dataset
-
observation_date_range_months: the difference in months between the first and last observation dates of a patient
-
stage_code_count: the number of stage codes each patient has in the dataset
-
stage_code_unique_count: the number of unique stage codes each patient has in the dataset
-
-
if tumor_properties table is present, create the following fields:
-
tumor_properties_count: the total number of rows in the tumor_properties table each patient has in the dataset
-
tumor_properties_unique_code_count: the number of unique tumor property codes each patient has in the dataset
-
-
if oncology_treatment table is present, create the following fields:
-
oncology_treatment_count: the total number of rows in the oncology_treatment table each patient has in the dataset
-
oncology_treatment_unique_code_count: he number of unique oncology treatment codes each patient has in the dataset
-
first_oncology_treatment_start_date: the date of the earliest oncology_treatment_start date of each patient in the dataset
-
last_oncology_treatment_start_date: the date of the latest oncology_treatment_start_date of each patient in the dataset
-
oncology_treatment_start_date_range_months: the difference in months between the first and last oncology_treatment_start_date for each patient in the dataset
-
-
if the genomic table is present, create the following fields:
-
genomic_count: the total number of genomics each patient has in the dataset
-
genomic_unique_code_count: the number of unique genomics codes each patient has in the dataset
-
first_genomic_date: the date of the earliest genomics of each patient in the dataset
-
last_genomic_date: the date of the latest genomics of each patient in the dataset
-
genomic_date_range_months: the difference in months between the first and last genomics dates of a patient
-
-
if the member_enrollment table is present, create the following fields:
-
member_enrollment_count: the total number of member enrollment rows each patient has in the dataset
-
first_effective_date: the date of the earliest effective_date of each patient in the dataset
-
last_effective_date: the date of the latest effective_date of each patient in the dataset
-
effective_date_range_months: the difference in months between the first and last effective_date dates of a patient - round to the nearest month (datediff, set to months)
-
first_temination_date: the date of the earliest termination_date of each patient in the dataset
-
last_termination_date: the date of the latest termination_date of each patient in the dataset
-
termination_date_range_months: the difference in months between the first and last termination_date dates of a patient - round to the nearest month (datediff, set to months)
-
-
if the claim_header table is present, create the following fields:
-
claim_header_count: the total number of claim headers each patient has in the dataset
-
first_service_from_date: the date of the earliest service_from_date of each patient in the dataset
-
last_service_from_date: the date of the latest service_from_date of each patient in the dataset
-
service_from_date_range_months (numerical): the difference in months between the first and last service_from_date dates of a patient - round to the nearest month (datediff, set to months)
-
total_proxy_cost: the sum of the total_proxy_cost of each patient in the dataset
-
-
if the claim_line table is present, create the following fields:
-
claim_line_count: the total number of claim lines each patient has in the dataset
-
first_service_date: the date of the earliest service_date of each patient in the dataset
-
last_service_date: the date of the latest service_date of each patient in the dataset
-
service_date_range_months: the difference in months between the first and last service_date dates of a patient - round to the nearest month (datediff, set to months)
-
Example
dataworks_df = profile_coverage(database='database')
-
displays profile report
-
returns dataworks_df dataframe for use later