This snapshot can be very useful: for example, the bimodal distribution of exam scores instantly indicates a trend that students are typically either very good at statistics or struggle with it (there are relatively few who fall in between these extremes). Finally, the numeracy test has produced very positively skewed data (the majority of people did very badly on this test and only a few did well).ĭescriptive statistics and histograms are a good way of getting an instant picture of the distribution of your data. It looks as though computer literacy is fairly normally distributed (a few people are very good with computers and a few are very bad, but the majority of people have a similar degree of knowledge) as is the lecture attendance. ![]() The exam scores are very interesting because this distribution is quite clearly not normal in fact, it looks suspiciously bimodal (there are two peaks, indicative of two modes). Theme(plot.title = element_text(hjust = 0.5)) Produce histograms for each of the four measures in the previous task and interpret them ggplot2::ggplot(rexam_tidy_tib, aes(score)) + In addition, the confidence interval for computer literacy was relatively narrow compared to that of the percentage of lectures attended and exam scores. From this table, we can see that, on average, students attended nearly 60% of lectures, obtained 58% in their exam, scored only 51% on the computer literacy test, and only 5 out of 15 on the numeracy test. The output shows the table of descriptive statistics for the four variables in this example. Kurtosis = moments::kurtosis(score, na.rm = TRUE) Skew = moments::skewness(score, na.rm = TRUE), Compute and interpret summary statistics for exam, computer, lecture and numeracy for the sample as a whole.Ĭi_lower = ggplot2::mean_cl_normal(score)$ymin,Ĭi_upper = ggplot2::mean_cl_normal(score)$ymax, There is a variable called uni indicating whether the student attended Sussex University (where I work) or Duncetown University. Four variables were measured: exam (first-year SPSS exam scores as a percentage), computer (measure of computer literacy in percent), lecture (percentage of statistics lectures attended) and numeracy (a measure of numerical ability out of 15). The file r_exam.csv contains data on students’ performance on an SPSS exam. the dots fall within the confidence interval for the line). ![]() Labs(x = "Theoretical quantiles", y = "Sample quantiles") +įacet_wrap(~film, ncol = 1, scales = "free") +įor both films the expected quantile points are close, on the whole, to those that would be expected from a normal distribution (i.e. Qqplotr::stat_qq_point(alpha = 0.2, size = 1) + Ggplot2::ggplot(., aes(sample = arousal)) + Load the data directly from the discovr package: notebook_tib % Using the notebook.csv data from Chapter 5, create and interpret a Q-Q plot for the two films (ignore sex). See the full license terms at the bottom of the page. You can use this material for teaching and non-profit activities but please do not meddle with it or claim it as your own work. ![]() StatisticsChapter 9The linear model (regression) The linear model with one predictor outcome = (b0+b1xi) +erroriThis model uses an unstandardised measure of the relationship (b1) and consequently we include a parameter b0 that tells us the value of the outcome when the predictor is zero.Any straight line can be defined by two things:the slope of the line (usually denoted by b1)the point at which the the line crosses the vertical axis of the graph (the intercept of the line, b0)These parameters are regression coefficients.The linear model with several predictors The linear model expands to include as many predictor variables as you like.An additional predictor can be placed in the model given a b to estimate its relationship to the outcome:Yi = (b0 +b1X1i +b2X2i+ … bnXni) + Ɛibn is the coefficient is the nth predictor (Xni)Regression analysis is a term for fitting a linear model to data and using it to predict values of an outcome variable form one or more predictor variables.Simple regression: with one predictor variableMultiple regression: with several predictorsEstimating the model No matter how many predictors there are, the model can be described entirely by a constant (b0) and by parameters associated with each predictor (bs).To estimate these parameters we use the method of least squares.We could assess the fit of a model by looking at the deviations between the model and the data collected.Residuals: the differences between what the model predicts and the.This document contains abridged sections from Discovering Statistics Using R and RStudio by Andy Field so there are some copyright considerations.
0 Comments
Leave a Reply. |
AuthorWrite something about yourself. No need to be fancy, just an overview. ArchivesCategories |