STATISTICAL LABORATORY

Academic Year 2024/2025 - Docente: ANTONIO PUNZO

Risultati di apprendimento attesi

 

1. Knowledge and understanding. The objectives aim to introduce the knowledge of the R language for statistical data analysis with a special focus on descriptive statistics, probability distributions, statistical inference, and statistical modeling. 

2.   Applying knowledge and understanding. After finishing the course, the student will have the capability to use the R language for: i) providing basic statistical analyses of data; ii) simulating data according to given probability distributions; and iii) applying main methods of statistical inference. 

3.  Making judgements. Upon finishing the course, the student will have the ability to extract insights from data by utilizing statistical analyses in R. 

4. Communication skills. After finishing the course, the student will have the ability to effectively communicate the outcomes of statistical analyses implemented via the R statistical software. 

5.  Learning skills. Upon finishing the course, students will acquire the skills to utilize the statistical software R for conducting basic data analyses and statistical modeling.

Course Structure

The course will include lectures delivered through slides and R code demonstrations. We will use the freely available R statistical software extensively. Practical activities and data analysis sessions in R will also be organized.

Required Prerequisites

Basic notions in statistics, linear algebra, and computing.

Attendance of Lessons

Highly recommended.

Detailed Course Content

Getting started with R and RStudio

Descriptive Statistics. Simple Statistical Distributions. Data tables. Frequency distributions. Main summary statistics: arithmetic mean, geometric mean, harmonic mean. Median and percentiles. Variance, standard deviation, relative variation. Graphical representations. Multiple Statistical Distributions. Contingency Tables. Joint distributions, marginal and conditional distributions. Covariance and correlation.

Probability. Random number generation and data modeling according to different probability distributions: uniform, binomial, Poisson, and Gaussian.

Statistical inference. Sample distributions: Student-t, chi-square. Confidence estimation. Confidence level. Confidence bounds for means, variances, and proportions. Hypothesis testing. Null hypotheses and alternative hypotheses. P-values. Statistical tests for means, variances, proportions, comparison of means, and comparison of proportions.

Statistical models. The simple regression model. Goodness of fit. Residual analysis. Inference on the parameters of a linear regression model. 

Textbook Information

·         Dalgaard, P. (2008). Introductory Statistics with R. Germany: Springer New York.

·         Venables, W. N., Smith, D. M. (2009). An Introduction to R: A Programming Environment for Data Analysis and Graphics. United Kingdom: Network Theory.

·         Verzani, J. (2018). Using R for Introductory Statistics. United States: CRC Press.

Course Planning

 SubjectsText References
1Syllabus: illustration and explanation. Getting started with R and RStudio. Why use R? How to install R. Slide
2RStudio. RStudio orientation. Console. R script. Source. Run button. Environment/History/Connections. Files/Plots/Packages/Help/Viewer.Slide
3R packages (CRAN packages and GitHub packages). Using packages.Slide
4Projects in RStudio. Directory structure. File names. R style guide. Citing R. Slide
5Some R basics. Objects in R. Errors and warnings. Naming objects.Slide
6The use of the directory. Getting help. Set the number of digits to display.Slide
7Operators in R. Using functions in R. Assignment of objects.Slide
8Vectors. Different ways to create vectors. Extracting elements from a vector. Replacing elements. Search for elements within a vectorSlide
9Workspace content and manipulation. Saving in R. Data types. Missing data. Slide
10Matrices and algebraic operations. Reserved words. Arrays.Slide
11Lists. Data frames. Attach and detach.Slide
12Frequency distributions. Contingency tables. Box-plot.Slide
13Graphical representations. Empirical distribution function. Basic statistics. Concentration index and Lorenz curve. Slide
14Sampling and ad hoc generators of discrete random variables. Q-Q plot.Slide
15Univariate constrained optimization with optimize(). Multivariate unconstrained optimization with optim(). Maximum likelihood estimation method.  Slide
16Chi-square test of goodness of fit. Kolmogorov-Smirnov test (goodness-of-fit and distributional comparison between 2 samples). Chi-square test of independence.Slide
17Univariate and multivariate linear regression model. Nonparametric regression. Changes in scale.  Slide
18Generalized linear models. Logistic regression. Poisson regression. Regression models with qualitative covariates. 1-way ANOVA. Slide

Learning Assessment

Learning Assessment Procedures

The exam aims to evaluate the achievement of the learning objectives. It is carried out through a practical test concerning the writing of a convenient R code to solve a statistical problem in R and interpret the output produced by well-known functions in R.

Examples of frequently asked questions and / or exercises

·       Writing an R code to find the maximum likelihood estimates of the parameters of the log-normal distribution

·      Writing an R code to find the maximum likelihood estimates of the parameters of a linear model with covariates both on the mean and on the variance of the normal distribution for the error

·         

ENGLISH VERSION