# Software Tutorials

## Univariate Statistics

Univariate statistical analyses are data analysis procedures using only one variable. A variable measures a single attribute of an entity or individual (e.g. height) and may take different values from one individual to another. Univariate statistical analyses may consist of descriptive or inferential procedures. Descriptive procedures typically describe the distribution of a variable using statistics or graphical representations. Inferential procedures are testing hypotheses about variable and aim to estimate the values of descriptive measures such as the mean, median, standard deviation, etc. for the general population (parameters). An example of a univariate inferential procedure is estimating a confidence interval for the population mean GRE score, or testing whether a population mean test score is significantly different than a given value.

## Bivariate Statistics

Bivariate statistical analyses are data analysis procedures using two variables (e.g. self-efficacy and academic performance). Bivariate analyses can be descriptive (e.g. a scatterplot), but the goal is typically to compare or examine the relationship between two variables. For instance, researchers may examine whether student self-efficacy in mathematics is a significant predictor of mathematics standardized test scores. Another example is comparing academic performance across groups of students receiving different modes of instruction (e.g. face to face versus online), in which case the two variables are test scores and group membership (e.g. face-to-face cohort or online cohort).

Descriptive statistics enable researchers to analyze and describe their data prior to running any statistical tests. Measures of central tendency and variance explain a data set without forcing researchers to look at every value in the set. This topic will explain how to find measures of central tendency, including mean, median, and mode, and measures of spread, including quartiles and standard deviation. Descriptive analyses are always the first steps in analyzing data.

- Diana Mindrila, Ph.D.
- Phoebe Balentyne, M.Ed.

A z-score measures the relative location of a score within a distribution. The distribution of z-scores follows the shape of the theoretical normal distribution. The z-score constitutes the simplest statistic that can be used for the purposes of testing hypotheses about a mean.

The resources in this section focus exclusively on the z-statistic and the statistical procedures for which it can be used.

A confidence interval is a range of values that a parameter may take in the population. This range is estimated using information collected from a sample, such as the mean, the degree to which values vary across individuals, or the sample size. For instance, a researcher may be interested in estimating the achievement motivation of first year college students. The researcher must select a random sample of students, administer a motivation scale, and then compute an average score for the entire sample. Results from the sample can then be used to make an inference about the motivation of the entire population of first year college students.

The narrated presentation bellow provides an introduction to the topic of confidence intervals and demonstrates how to estimate the population mean of a normally distributed variable after computing the mean for a specific sample. The software tutorial shows how to calculate confidence intervals using SPSS.

Confidence Intervals Notes (PDF)

External resource video: What We Learned from 5 Million Books

A hypothesis is a statement about a parameter such as the population proportion or the population mean. To determine whether this statement is true, researchers use tests of significance to compare observed values of a statistic to the given parameters. Results from such tests show whether the difference between the sample statistic and the given parameter are statistically significant. The results of a significance test are expressed in terms of a probability that indicates the extent to which data from the sample and the hypothesis agree. The narrated presentation provides an introduction to this topic. It demonstrates how to formulate hypotheses, and how to conduct a test of significance for a population mean using the properties of the normal distribution.

- Diana Mindrila, Ph.D.
- Phoebe Balentyne, M.Ed.

z-Procedures: Testing a Hypothesis About the Population Mean (PDF)

To be able to use the z procedures, certain assumption must be met. First, data should have a normal distribution. Second, the sample must have an adequate size, and individuals must be randomly selected. However, these conditions are often difficult to meet in practice. The following narrated presentation describes the necessary conditions for making inferences based on the z procedures, demonstrates how to determine the sample size needed for a certain level of error, and discusses the notions of Type I and Type II error, and the power of a significance test.

- Diana Mindrila, Ph. D.
- Phoebe Balentyne, M.Ed.

Similar to a z score, t scores measure the relative location of a score within a distribution. The distribution of t scores is a standardized distribution which has a mean of 50 and standard deviation of 10.

The resources in this section focus exclusively on the t statistic and the statistical procedures for which it can be used.

t Procedures are very similar to the z procedures and are used when the distribution of the data are not perfectly normal, and when the population standard deviation of a variable is unknown. The interpretation of t scores is similar to the interpretation of z scores. However, the t distribution has a slightly different shape. It is symmetric, but not normal. It has a single peak, and a mean of 0, which is the center of the distribution, but that tails are higher and fatter, and the distribution has more spread. Further, the t distribution looks different for different sample sizes. The following presentation describes the properties of the t distribution, and demonstrates how to use t scores to make inferences about a population mean and to compare matched samples. The software tutorials show how to conduct these procedures in SPSS.

- Diana Mindrila, Ph. D.
- Phoebe Balentyne, M.Ed.

Inferences About the Population Mean Notes (PDF)

t procedures can be used to compare variables across two independent samples. For instance, researchers may want to know whether the average performance on a certain achievement test differs significantly between males and females. The following narrated presentation demonstrates how to estimate a confidence interval for the mean difference between two populations, and how to test whether this difference is statistically significant. The presentation also discusses of the assumptions that must be met, and the robustness of t procedures to the violation of these assumptions. The software tutorial demonstrates how to perform an independent samples t test using SPSS.

- Diana Mindrila, Ph. D.
- Phoebe Balentyne, M.Ed.

Paired samples is a type of research design in which there is a dependency between two groups of data collected from a single sample. In a within-subjects approach, paired samples data often takes the form of pre- and post- data; in a between-subjects approach, paired samples often takes the form of matched groups. Statistical approaches have been developed to handle paired samples data that is measured on an interval/ratio scale (paired samples t test) as well as an ordinal scale (Wilcoxon Signed Ranks Test). This presentation provides more information on paired samples research designs, presents example research questions for which this approach might be used, and addresses the task of interpreting results.

Frances Chumney, Ph.D.

Scatterplots and correlation coefficients are measures of association between two quantitative variables. For instance, researchers may want to know whether an increased amount of time spent on homework is associated with higher scores on a standardized test. Scatterplots provide descriptive information on the direction, form, and strength of the relationship between the two variables by representing individuals as points on a two-dimensional graph. These points may aggregate to describe a linear relationship, curvilinear relationship, or no relationship. Scatterplots may also indicate whether there is positive or negative association between variables, and suggest the strength of their relationship. A positive association means that high values in one variable are associated with high values in the other variable, whereas a negative association shows that high values in one variable are associated with low values in the other variable (e.g. the relationship between poverty and student achievement).

The Pearson product moment correlation coefficient can be calculated to quantify a linear relationship between two quantitative variables. These coefficients take values between -1 and 1, where values closer to 0 indicate weak relationships, and values closer to 1 or -1 indicate stronger relationships. Positive values show a positive association, whereas negative values show a negative association of the two variables. The narrated presentation provides more details on the interpretation of scatterplots and correlation coefficients. The software tutorials demonstrate how to generate scatterplots and to compute the Pearson product moment coefficient in SPSS.

- Diana Mindrila, Ph.D.
- Phoebe Balentyne, M.Ed.

Scatterplots and Pearson Correlation Coefficients (SBSS) (PDF)

External Video: The Big Idea My Brother Inspired

Simple linear regression allows researchers to predict or explain the variance of a response variable using a predictor variable. For instance, simple linear regression may be used in educational research to predict college GPA based on SAT scores. The narrated presentation bellow provides an introduction to the topic of simple linear regression. It discusses basic concepts related to simple linear regression, the assumptions on which this procedure is based, and how to interpret and use the regression equation. The software tutorial demonstrates how to conduct a simple linear regression in SPSS.

- Diana Mindrila, Ph. D.
- Phoebe Balentyne, M.Ed

The chi-square test is used to determine whether there is a statistically significant association between two or more categorical variables. For instance, educational researchers may want to determine whether the proportions of students preferring online instruction and face to face instruction differ significantly across undergraduate and graduate students. This procedure allows researchers to compare categorical variables across more than two groups and uses the chi-square statistic to determine statistical significance. The following narrated presentation describes the properties of the chi-square distribution and explains how to conduct and interpret the results of chi-square tests. The software tutorial demonstrates how to conduct this procedure in SPSS.

- Diana Mindrila, Ph. D.
- Phoebe Balentyne, M.Ed.

Analysis of variance (ANOVA) is used to compare means across groups of similar individuals. ANOVA is comparing the variation of means across several samples to the variations of scores within each sample. It allows researchers to compare more than two groups, and uses the F statistic to determine statistical significance. The narrated presentation describes the F distribution and discusses ANOVA and its assumptions in more detail. The software tutorial demonstrates how to conduct a one-way ANOVA in SPSS.

- Diana Mindrila, Ph. D.
- Phoebe Balentyne, M.Ed.