QSO 570 SNHU Module Two Memo – Description
Overview
You are a data and business intelligence analyst working for a hospital that has a network of hospitals in rural districts. The hospital’s chief of behavioral health services (CBHS) would like to use existing research to create a model to predict or classify newer patients as at-risk or not at-risk for clinical depression. Such predictions would enable them to provide early mental health interventions to at-risk individuals. The CBHS has tasked your team with creating the predictive model.
The CBHS’s team has researched case histories of patients with and without diagnosed clinical depression in rural districts using non-personally identifiable data. Data includes sociodemographic factors such as age, marital status, patient medical history, number of children, financial information, job status, farm income, expenses, patient survey results, and so on.
To make the clinical depression data set more accurate for the predictive model, your team needs to identify and address issues in the data, including multicollinear variables. A multicollinear variable is an independent variable that has a strong relationship with other independent variables. Multicollinear variables provide the same information about a dependent variable, and having them in a predictive model can increase the model’s error rate. Your first task is to identify and remove the multicollinear variables from the data set.
To identify and remove the multicollinear variables, you will create a correlation matrix using Rattle, a package for R Studio, within the VDI.
Directions
Create a memo to the data team using a correlation matrix. Include the relevant screenshots of your correlation matrix in the report.
Data Type: Identify the type of data in the given data set.
What type of data are you working with? Explain.
Correlation Matrix: Create a correlation matrix to identify multicollinear variables.
Use the Pearson correlation-coefficient method to create a correlation matrix of the rural clinical depression data.
Explain how this method will help you identify and remove unnecessary variables from the data set.
Multicollinear Variables: Analyze the correlation matrix to identify and remove potential multicollinear variables.
What variables can be removed from the clinical depression data set, and why?
Remove these potentially multicollinear variables.
Rationale : Explain how the correlation analysis helped improve the data quality and how it will affect the predictive model.
How did removing the multicollinear variables from the clinical depression data set improve the data quality?
How will the correlation analysis help you to build a better model for identifying patients at risk for depression?
What other elements of correlation analysis can you apply to this data set?
The post QSO 570 SNHU Module Two Memo first appeared on .