DATA BASE AND BIG DATA ANALYTICSModulo BIG DATA ANALYTICS
Academic Year 2024/2025 - Docente: GIOVANNI MORANARisultati di apprendimento attesi
General Objectives.
Big Data Fundamentals and Pipeline Design
Understand the core principles of Big Data and master the design and implementation of data pipelines. These pipelines are the backbone for extracting, transforming, and loading data into repositories, ensuring its readiness for analytical processing.
Data Modeling Techniques:
Exploring a variety of data modeling methodologies tailored for organizing and managing data within the context of data warehouses and data lakes, aiming for efficient data storage and retrieval mechanisms.
OLAP Concepts and Techniques:
Develop a comprehensive understanding of Online Analytical Processing (OLAP), encompassing its principles, methodologies, and advanced techniques.
Data Visualization and Dashboard Design:
Acquire expertise in data visualization techniques and tools, and learn how to design informative and interactive dashboards, empowering decision-makers to interpret and utilize complex insights derived from big data analytics with clarity and precision.
Synthetic General Description.
The module is designed to provide a comprehensive understanding of big data concepts and techniques for processing, storing, and analyzing large datasets. It covers fundamental concepts related to Big Data and its management, focusing on ETL and ELT data pipelines, principles and methodologies for designing Data Warehouses and Data Lakes, Online Analytical Processing (OLAP) and BI tools, and data visualization and dashboard design principles.
Expected Learning Results
Fornire una descrizione schematica secondo i principali descrittori di Dublino.
- Knowledge and understanding;
- To understand the most important methodologies and techniques used by industries to analyse data to support the decision process;
- To understand the main methodologies for designing data warehouses and data lakes;
- To understand the main methodologies to transform data into sources of knowledge through visual representation
- Applying knowledge and understanding;
- To be able to apply methodologies and techniques to analyse data.
- To be able to design a data warehouse or a data lake.
- To be able to build reports and data analysis and organize them into interactive dashboards
- Making judgments;
- To evaluate the different alternatives and techniques when analyzing data with different characteristics.
- To evaluate the different alternatives and techniques when defining and designing a data repository.
Course Structure
Required Prerequisites
Basic knowledge of database systems;
Basic knowledge of SQL;
Basic knowledge of Python;
Attendance of Lessons
Attendance is not mandatory but strongly encouraged.
Detailed Course Content
Introduction to Business Intelligence and Big Data Analytics (2 CFU)
- Goal and rationale of BI systems
- The value of data-driven decision making
- The structure and evolution of BI and Big Data analytics systems
- OLAP vs OLTP
- Data Pipelines
- Advanced and innovative tools for data preparation: the Tableau Prep solution
- Data warehouses and Data Lakes.
Data models (1 CFU)
- Conceptual modeling
- Dimensions and facts
- Multi-dimensional data model
- Conceptual, logical and physical design
BI Architecture (1CFU)
- Extract, Transform and Load functionalities
- OLAP analysis
- OLAP query
- Reporting and Interactive Dashboard
Data Visualization (2 CFU)
- Introduction to Visualization
- Data Visualization fundamentals;
- Charts and standard views: relevance, appropriateness, and best practices
- Dashboard Design
- Advanced and innovative tools for data visualization: the Tableau Desktop solution
Textbook Information
Golfarelli, Rizzi. Data Warehouse Design: Modern Principles and Methodologies, McGraw Hill
Course Planning
Subjects | Text References | |
---|---|---|
1 | Introduction to Business Intelligence and Big Data Analytics | |
2 | Data models | |
3 | BI Architecture | |
4 | Data Visualization |
Learning Assessment
Learning Assessment Procedures
The final exam consists of:
- a project work aiming at assessing the capabilities in developing a BI system, including the analysis and the visualization of relevant information,
- an oral exam about the project work.
Assessment criteria include depth of analysis, adequacy, quality, and correctness of the proposed solutions to the project work, ability to justify and critically evaluate the adopted solutions, and clarity. The vote on the Big Data Analytics module will account for 50% of the total grade for the entire course.
The exam is structured so that each student is given a grade according to the following scheme:
- Not approved: the student has not acquired the basic concepts and is not able to answer at least 60% of the questions or carry out the exercises.
- 18-23: the student demonstrates minimal mastery of the basic concepts, his content connection skills are modest, he is able to solve simple exercises.
- 24-27: the student demonstrates good mastery of the course contents, his skills in connecting the contents are good, he solves the exercises with few errors.
- 28-30 cum laude (distinction): the student has acquired all the contents of the course and is able to master them completely and connect them with a critical spirit; solves the exercises completely and without errors.
Students with disabilities and/or DSA must contact the teacher and the DMI CInAP contact person sufficiently in advance of the exam date
to communicate that they intend to take the exam taking advantage of the appropriate compensatory measures.
Examples of frequently asked questions and / or exercises
- Create a dashboard that compares data trends with actionable insights for decision-makers.
- How do Big Data tools solve challenges like data volume, variety, and speed, and what strategies help overcome them?
- What are the main differences between structured, semi-structured, and unstructured data, and how are they managed in Big Data?
ENGLISH VERSION
- How do Big Data tools solve challenges like data volume, variety, and speed, and what strategies help overcome them?
- What are the main differences between structured, semi-structured, and unstructured data, and how are they managed in Big Data?