Statistics
Data Presentation and Interpretation
Central Tendency and Variation
🤓 Study
📖 Quiz
Play audio lesson
Central Tendency and Variation
Measures of Central Tendency
- The three main measures of central tendency are the mean, median, and mode.
- The mean is calculated by adding all the values in a data set and then dividing by the number of values.
- The median is the middle value in a data set once it's been arranged in ascending order. For an even number of values, it's the mean of the middle two.
- The mode refers to the value(s) that appear most frequently in a dataset.
Measures of Variation
- Variability in a dataset can be represented through the range, interquartile range, variance and standard deviation.
- The range is the simplest measure of variation, calculated as the difference between the maximum and minimum values in a data set.
- The interquartile range (IQR), which measures the range of the middle 50% of data, is more resistant to outliers as it excludes the lowest 25% and highest 25% of data values.
- Variance is an average of the squared differences from the mean. High variance indicates values spread far from the mean, and low variance indicates values close to the mean.
- The standard deviation is the square root of the variance. It's used more commonly than variance as it is in original units of measurement, providing a more intuitive understanding of variability.
Understanding Distributions
- When plotting frequency distributions, an histogram is useful for a continuous dataset, while a bar chart is appropriate for categorical data.
- Normal distribution is characterised by a bell-shaped symmetric curve, with the mean, median and mode all at the centre.
- Skewness refers to the degree of asymmetry in the distribution: positive skewness means the right tail is longer, and negative skewness means the left tail is longer.
- Kurtosis refers to the sharpness of the peak of the frequency-distribution curve: Leptokurtic distributions are sharper than the normal distribution, and platykurtic distributions are flatter.
Making Predictions
- Linear regression can reveal relationships between two continuous variables.
- The correlation coefficient quantifies the direction and strength of a relationship between two variables.
- The coefficient of determination (R-squared) provides information about the amount of variance explained by the regression model.
- In hypothesis testing, the null hypothesis assumes no significant difference or relationship, while the alternative hypothesis assumes a significant difference or relationship.
Remember that the best data interpretation accounts for both central tendency and variability, since it depicts the general trend and the spread of data points around this trend. While measures of central tendency give you an overview of data, measures of variation will tell you more about the individual data points in your sample.