In this exercise, you will download two years of data from HRSA. Each year has a

In this exercise, you will download two years of data from HRSA. Each year has a different set of variables with some overlap. The exercise requires that you combine the two input datasets into one long output dataset with the common columns aligned. This video shows you one way to complete Assignment 1.1. The demonstrated approach is not the only way to accomplish this task.
Part 1 Introduction by Friday 11:59 pm (estimate 30 min)
Review the Course Project Summary in your Blackboard Learning Modules link.
Spend approximately 15 minutes exploring the data tables and visualizations at https://data.hrsa.gov/
Part 2 Data Wrangling by Monday 11:59 pm (estimate 1-2 hours)
A large percentage of data analyst time may be spent getting data into a format that a computer can analyze. This process is called data wrangling. You’ll start by downloading two files, and then try to combine them into one useful file for a future analysis project. This task simulates a common process called ETL, Extract Transform, and Load, that occurs nightly for large electronic health record systems.
EXTRACT: Download the HRSA-2019.xls and HRSA-2020.xls attachments to your local download folder. You may want to move these into a new folder called HCIN547 to make them easier to find later. Open both datasets in Excel, and notice that there are significant differences.
TRANSFORM: Create a new Excel File called HCIN547_Lastname_FirstInitial_HRSA2019_20.xls
Do your best to combine the datasets from two different years into a single dataset that you can analyze. You will create a “long” dataset, meaning that you must align the data in rows for both years under the same set of columns.
The combined dataset should have columns (A-H):
From the 2019 Age and Race-Ethnicity table, add the following columns
(A) Reporting Year (manually add the value 2019 to all of the rows)
(B) Health Center Name
(C) City
(D) State
(E) Total Patients
From the 2019 Clinical Data, include:
(F) Hypertension (Percent of patients with HTN)
(G) Calculate a new variable: “Patients with HTN” = Total Patients *Hypertension
From the HRSA-2020 table align the following columns
(A) Reporting Year
(B) Health Center Name
(C) City
(D) State
(E) Total Patients (you will need to calculate)
(G)Patients with HTN (2020 -given)
(H) Urban Rural Flag
Check that you have 8 columns and the number of rows should be the sum of rows for 2019 and 2020. This may be challenging for you, but stay calm. The exercise is to expose you to some of the issues that analysts face everyday when combining data from different data sources.
Part 3 Submit Blackboard Assignment 1.1 for Grading by Monday 11:59pm (estimate 20 min)
LOAD: Attach HCIN547_Lastname_FirstInitial_HRSA2019_20.xls
Describe the one or two of the challenges you faced and how your overcame it. (Approx 100 words)
To understand how your work will be assessed, view the Assignment Rubric.
Click the assignment link above to submit your assignment.Part 1 Introduction by Friday 11:59 pm (estimate 30 min)
Review the Course Project Summary in your Blackboard Learning Modules link.
Spend approximately 15 minutes exploring the data tables and visualizations at https://data.hrsa.gov/
Part 2 Data Wrangling by Monday 11:59 pm (estimate 1-2 hours)
A large percentage of data analyst time may be spent getting data into a format that a computer can analyze. This process is called data wrangling. You’ll start by downloading two files, and then try to combine them into one useful file for a future analysis project. This task simulates a common process called ETL, Extract Transform, and Load, that occurs nightly for large electronic health record systems.
EXTRACT: Download the HRSA-2019.xls and HRSA-2020.xls attachments to your local download folder. You may want to move these into a new folder called HCIN547 to make them easier to find later. Open both datasets in Excel, and notice that there are significant differences.
TRANSFORM: Create a new Excel File called HCIN547_Lastname_FirstInitial_HRSA2019_20.xls
Do your best to combine the datasets from two different years into a single dataset that you can analyze. You will create a “long” dataset, meaning that you must align the data in rows for both years under the same set of columns.
The combined dataset should have columns (A-H):
From the 2019 Age and Race-Ethnicity table, add the following columns
(A) Reporting Year (manually add the value 2019 to all of the rows)
(B) Health Center Name
(C) City
(D) State
(E) Total Patients
From the 2019 Clinical Data, include:
(F) Hypertension (Percent of patients with HTN)
(G) Calculate a new variable: “Patients with HTN” = Total Patients *Hypertension
From the HRSA-2020 table align the following columns
(A) Reporting Year
(B) Health Center Name
(C) City
(D) State
(E) Total Patients (you will need to calculate)
(G)Patients with HTN (2020 -given)
(H) Urban Rural Flag
Check that you have 8 columns and the number of rows should be the sum of rows for 2019 and 2020. This may be challenging for you, but stay calm. The exercise is to expose you to some of the issues that analysts face everyday when combining data from different data sources.
Part 3 Submit Blackboard Assignment 1.1 for Grading by Monday 11:59pm (estimate 20 min)
LOAD: Attach HCIN547_Lastname_FirstInitial_HRSA2019_20.xls
Describe the one or two of the challenges you faced and how your overcame it. (Approx 100 words)
To understand how your work will be assessed, view the Assignment Rubric.
Click the assignment link above to submit your assignment.

Need help Working on This or a Similar Assignment?

We specialize in custom-written, original papers. No prewritten essays here—order your plagiarism-free and AI-free paper today for guaranteed originality.