MATH325 Lab 2
Measures of Central Tendency and Variability
The steps required for completing the deliverables for this assignment, including screen shots that correspond to these instructions, are outlined below. Complete the questions below and paste the answers from Excel below each question (type your answers to the questions where noted). Therefore, your response to the lab will be this ONE submitted document.
Context: Remember that statistics are far more than numbers or values – you need to know the context to perform a good analysis!
Central Tendency: Defined as statistics that describe the location of the distribution. This includes the mean, median, mode, and sum of all the values.
Mean. A measure of central tendency. The arithmetic average, the sum divided by the number of cases.
Median. The value above and below which half of the cases fall, the 50th percentile. If there is an even number of cases, the median is the average of the two middle cases when they are sorted in ascending or descending order. The median is a measure of central tendency not sensitive to outlying values (unlike the mean, which can be affected by a few extremely high or low values).
Mode. The most frequently occurring value. If several values share the greatest frequency of occurrence, each of them is a mode. The Frequencies procedure reports only the smallest of such multiple modes.
Maximum. The largest value in the dataset.
Minimum. The smallest value in the dataset.
Percentile Values (Q1, Median, Q3). Values of a quantitative variable that divide the ordered data into groups so that a certain percentage is above and another percentage is below. Quartiles (the 25th, 50th, and 75th percentiles) divide the observations into four groups of equal size default in Minitab). If you want an equal number of groups other than four, select Cut points for equal groups. You can also specify individual percentiles (for example, the 95th percentile, the value below which 95% of the observations fall).
IQR (Interquartile Range). This is the middle 50% of the data calculated by Q3 – Q1.
Range The difference between the largest and smallest values of a numeric variable, the maximum minus the minimum.
S.E. Mean. A measure of how much the value of the mean may vary from sample to sample taken from the same distribution. It can be used to roughly compare the observed mean to a hypothesized value (that is, you can conclude the two values are different if the ratio of the difference to the standard error is less than –2 or greater than +2).
StDev. A measure of dispersion around the mean. In a normal distribution, 68% of cases fall within one standard deviation of the mean and 95% of cases fall within two standard deviations. For example, if the mean age is 45, with a standard deviation of 10, 95% of the cases would be between 25 and 65 in a normal distribution.
Variance. A measure of dispersion around the mean, equal to the sum of squared deviations from the mean divided by one less than the number of cases. The variance is measured in units that are the square of those of the variable itself.
1.
1. Open Microsoft Excel.
2. Open the HealthCareData.xlsx file using Microsoft Excel.
3. Click on the Data tab (1), then Data Analysis (2), then click Descriptive Statistics (3), and click OK.
3.
2.
1.
a. We are going to look at Hosp_Stay and Hosp_Satisfaction. In the Descriptive Statistics window that pops up, make sure that Summary statistics (1) is checked, and click on the button next to the input range to select the data in the first two columns (do not include the column name):
2.
1.
b. Once the Input Range field is populated, click OK.
c. The Summary statistics will be output on a new worksheet. Highlight the output and use Ctrl + C to copy and then Ctrl + V to past the summary into this document.
4. To create a histogram of this data, select the dataset in Column A (Hosp_Stay):
5. Click the Insert tab, and in the Charts area, click on the Insert Statistics Chart.
6. Click on the first histogram graph option:
7. Right-click on the horizontal axis and select Format Axis from the drop-down menu.
8. From the menu that pops up, set Bin Width to be 1.0:
9. Click on the graph and use Ctrl+C to copy and use Ctrl+V to paste it into the box below.
10. Play around with changing the Bin width. How does this affect the graph? How about Number of bins?
11. Repeat the above Steps 4-11, for Column B (Hosp_Satisfaction). Copy and paste the graph below:
12. Think about it: Why are so many items listed as missing? What can be done if you want to run the analysis and not include all those missing data values that occur at the end of the data set? Why does the satisfaction data have missing information before the end of the data set? Do we want to include the fact that those data values are missing in our analysis?
13. Explorations – suggested activities but not required (optional): Use table on page 99 (Chapter 4, Exhibit 4-5) as a reference point for analysis or do Class Activity # 3 on page 121 of your book.
14. Deliverable: Save this document and submit it into the Assignments, Week 2: Lab.