Data Preparation and Analysis (Coursera)

Offered by Illinois Tech,
Data Preparation and Analysis (Coursera)

This course introduces the necessary concepts and common techniques for analyzing data. The primary emphasis is on the process of data analysis, including data preparation, descriptive analytics, model training, and result interpretation.

Class Deals by MOOC List - Click here and see Coursera's Active Discounts, Deals, and Promo Codes.

The process starts with removing distractions and anomalies, followed by discovering insights, formulating propositions, validating evidence, and finally building professional-grade solutions. Following the process properly, regularly, and transparently brings credibility and increases the impact of the results.
This course will cover topics including Exploratory Data Analysis, Feature Screening, Segmentation, Association Rules, Nearest Neighbors, Clustering, Decision Tree, Linear Regression, Logistic Regression, and Performance Evaluation. Besides, this course will review statistical theory, matrix algebra, and computational techniques as necessary.
This course prepares students ready for and capable of the data preparation and analysis process. Besides developing Python codes for carrying out the process, students will learn to tune the software tools for the most efficient implementation and optimal performance. At the end of this course, students will have built their inventory of data analysis codes and their confidence in advocating their propositions to the business stakeholders.
Required Textbook: This course does not mandate any textbooks because the lecture notes are self-contained.
Optional Materials: A Practitioner's Guide to Machine Learning (abbreviated PGML for Reading)
Software Requirements: Python version 3.11 or above with the latest compatible versions of NumPy, SciPy, Pandas, Scikit-learn, and Statsmodels libraries.

What you'll learn

  1. Apply appropriate techniques for generating insights from data.
  2. Present actionable solutions with confidence to the business stakeholders.

Syllabus

Module 1: Process of Preparing and Analyzing Data
Welcome to Data Preparation and Analysis! Module 1 guides students through the art of crafting informative and visually appealing histograms, a fundamental aspect of data visualization. Students will learn techniques for measuring the location and scale of data, understanding the origins and impacts of noise and missing values in datasets. This module also introduces the CRISP-DM Process, a structured approach to data mining, along with Gartner's Analytics Ascendancy Model for advanced data analysis. Additionally, students will explore the distinction between raw data and processed information, a key concept for effective data interpretation and decision-making.

Module 2: Measure and Visualize Correlation
Module 2 delves into the intricacies of statistical analysis, beginning with a thorough understanding of the p-value concept and its significance as a Type I Error indicator. Students will learn to apply statistical tests in Python to identify significantly correlated features, exploring various correlation metrics tailored for categorical, mixed-type, and continuous features. This module emphasizes practical application, equipping students with the skills to calculate and interpret these metrics using Python, thereby enhancing their ability to conduct sophisticated data analysis and draw meaningful conclusions from complex datasets.

Module 3: Market Basket Analysis
Module 3 offers a deep dive into the world of Association Rules, teaching students how to improvise these rules for identifying valuable feature combinations that generate specific label values. Learners will master setting appropriate thresholds for Support and Confidence and gain a comprehensive understanding of the Apriori Algorithm and the significance of Frequent Itemsets within it. This module covers the calculation of common metrics for Association Rules, familiarizing students with the relevant terminology. Additionally, learners will explore the practical application of Association Rules in Market Basket Analysis, including strategies for cross-selling, up-selling, and product bundling, equipping them with valuable skills for advanced data-driven decision making in business contexts.

Module 4: Partitioning, Segmenting, and Clustering of Observations
In Module 4, students will learn how to describe and interpret profiles of clusters, gaining proficiency in deploying the K-Means and K-Modes clustering algorithms. They will explore the application of Recency, Frequency, and Monetary (RFM) Analysis to identify the most valuable customers in retail business settings. The module also covers the technique of Simple Random Sampling with the option of incorporating stratification variables, enhancing the precision of data analysis. Furthermore, it emphasizes the importance of objectively validating models using a testing partition, ensuring the reliability and effectiveness of the analytical models in real-world scenarios.

Module 5: Linear Regression
This module delves into feature importance analysis in machine learning, covering Shapley Values, feature selection methods, statistical evaluation, feature interaction, aliasing, and the Least Squares Algorithm. Students will be able to master these concepts to build robust and interpretable models. included

Module 6: Binary Logistic Regression
In Module 6, students will master the art of feature selection in machine learning by exploring the Forward and Backward Selection Method, the All-Possible Subsets Method, and the concept of complete and quasi-complete separation. Students will also discover association rules for identifying separations, interpret model parameters and predicted probabilities, and delve into the concepts of maximum likelihood estimation, odds, and odds ratios.

Module 7: Decision Trees - The CART Algorithm
Module 7 will equip students wth the ability to harness the power of tree-based models to uncover hidden patterns in your data. Students will be able to describe clusters effectively, intelligently set algorithm parameters, construct business rules from tree results, and utilize variance metrics, entropy values, and Gini indices for optimal tree construction.

Module 8: Evaluating the Performance of Models
Module 8 delves into the realm of evaluation metrics for machine learning models. Students will master the concepts of precision and recall curves, lift curves, and receiver operating characteristics (ROC) curves. Additionally, students will obtain the ability to discover methods for calculating probability thresholds using Kolmogorov-Smirnov statistics and F1 scores. They will be able to explore metrics like misclassification rate, area under the curve (AUC), and root mean squared error (RMSE), along with techniques for computing RMSE and detecting severely misfitted observations using model-specific residuals.

Summative Course Assessment
This module contains the summative course assessment that has been designed to evaluate your understanding of the course material and assess your ability to apply the knowledge you have acquired throughout the course. Be sure to review the course material thoroughly before taking the assessment.

Go to Class
MOOC List is learner-supported. When you buy through links on our site, we may earn an affiliate commission.

Related Courses

Machine Learning: Clustering & Retrieval (Coursera) Coursera
University of Washington

Machine Learning: Clustering & Retrieval (Coursera)

Case Studies: Finding Similar Documents. A reader is interested in a specific news article and you want to find similar articles to recommend. What is the right notion of similarity? Moreover, what if there are millions of other documents? Each time you want to a retrieve a new document, do you need to search through all other documents? How do you group similar documents together? How do you discover new, emerging topics that the documents cover?

Jun 8th 2026
5-12 Weeks
Infonomics II: Business Information Management and Measurement (Coursera) Coursera
University of Illinois at Urbana-Champaign

Infonomics II: Business Information Management and Measurement (Coursera)

Even decades into the Information Age, accounting practices yet fail to recognize the financial value of information. Moreover, traditional asset management practices fail to recognize information as an asset to be managed with earnest discipline. This has led to a business culture of complacence, and the inability for most organizations to fully leverage available information assets. This second course in the two-part Infonomics series explores how and why to adapt well-honed asset management principles and practices to information, and how to apply accepted and new valuation models to gauge information’s potential and realized economic benefits.

Jun 10th 2026
4 Weeks
Customer Analytics (Coursera) Coursera
University of Pennsylvania

Customer Analytics (Coursera)

Data about our browsing and buying patterns are everywhere. From credit card transactions and online shopping carts, to customer loyalty programs and user-generated ratings/reviews, there is a staggering amount of data that can be used to describe our past buying behaviors, predict future ones, and prescribe new ways to influence future purchasing decisions. In this brand new course, four of Wharton’s top marketing professors will dive deeper into the key areas of customer analytics: descriptive analytics, predictive analytics, prescriptive analytics, and their application to real-world business practices including Amazon, Google, and Starbucks to name a few.

Jun 8th 2026
5-12 Weeks
Machine Learning: Regression (Coursera) Coursera
University of Washington

Machine Learning: Regression (Coursera)

Case Study - Predicting Housing Prices. In our first case study, predicting house prices, you will create models that predict a continuous value (price) from input features (square footage, number of bedrooms and bathrooms,...). This is just one of the many places where regression can be applied. Other applications range from predicting health outcomes in medicine, stock prices in finance, and power usage in high-performance computing, to analyzing which regulators are important for gene expression.

Jun 8th 2026
5-12 Weeks
Leadership Through Marketing (Coursera) Coursera
Northwestern University

Leadership Through Marketing (Coursera)

The success of every organization depends on attracting and retaining customers. Although the marketing concepts for doing so are well established, digital technology has empowered customers, while producing massive amounts of data, revolutionizing the processes through which organizations attract and retain customers. In this course, students will learn how to identify new opportunities to create value for empowered consumers, develop strategies that yield an advantage over rivals, and develop the data science skills to lead more effectively, allocate resources, and to confront this very challenging environment with confidence.

Jun 14th 2026
4 Weeks
Market Research and Consumer Behavior (Coursera) Coursera
IE Business School

Market Research and Consumer Behavior (Coursera)

Your marketing quest begins here! The first course in this specialization lays the neccessary groundwork for an overall successful marketing strategy. It is separated into two sections: Market Research and Consumer Behavior. Gain the tools and techniques to translate a decision problem into a research question in the Market Research module. Learn how to design a research plan, analyze the data gathered and accurately interpret and communicate survey reports, translating the results into practical recommendations.

Jun 8th 2026
4 Weeks
Framework for Data Collection and Analysis (Coursera) Coursera
University of Maryland, College Park

Framework for Data Collection and Analysis (Coursera)

This course will provide you with an overview over existing data products and a good understanding of the data collection landscape. With the help of various examples you will learn how to identify which data sources likely matches your research question, how to turn your research question into measurable pieces, and how to think about an analysis plan.

Jun 8th 2026
4 Weeks
Pattern Discovery in Data Mining (Coursera) Coursera
University of Illinois at Urbana-Champaign

Pattern Discovery in Data Mining (Coursera)

Learn the general concepts of data mining along with basic methodologies and applications. Then dive into one subfield in data mining: pattern discovery. Learn in-depth concepts, methods, and applications of pattern discovery in data mining. We will also introduce methods for data-driven phrase mining and some interesting applications of pattern discovery. This course provides you the opportunity to learn skills and content to practice and engage in scalable pattern discovery methods on massive transactional data, discuss pattern evaluation measures, and study methods for mining diverse kinds of patterns, sequential patterns, and sub-graph patterns.

Jun 8th 2026
4 Weeks
Introduction to Data Analysis Using Excel (Coursera) Coursera
Rice University

Introduction to Data Analysis Using Excel (Coursera)

The use of Excel is widespread in the industry. It is a very powerful data analysis tool and almost all big and small businesses use Excel in their day to day functioning. This course is designed to give you a working knowledge of Excel with the aim of getting to use it for more advance topics in Business Statistics. The course is designed keeping in mind two kinds of learners - those who have very little functional knowledge of Excel and those who use Excel regularly but at a peripheral level and wish to enhance their skills.

Jun 8th 2026
4 Weeks
People Analytics (Coursera) Coursera
University of Pennsylvania

People Analytics (Coursera)

People analytics is a data-driven approach to managing people at work. For the first time in history, business leaders can make decisions about their people based on deep analysis of data rather than the traditional methods of personal relationships, decision making based on experience, and risk avoidance. In this brand new course, three of Wharton’s top professors, all pioneers in the field of people analytics, will explore the state-of-the-art techniques used to recruit and retain great people, and demonstrate how these techniques are used at cutting-edge companies. They’ll explain how data and sophisticated analysis is brought to bear on people-related issues, such as recruiting, performance evaluation, leadership, hiring and promotion, job design, compensation, and collaboration.

Jun 8th 2026
4 Weeks
Practical Predictive Analytics: Models and Methods (Coursera) Coursera
University of Washington

Practical Predictive Analytics: Models and Methods (Coursera)

Statistical experiment design and analytics are at the heart of data science. In this course you will design statistical experiments and analyze the results using modern methods. You will also explore the common pitfalls in interpreting statistical arguments, especially those associated with big data. Collectively, this course will help you internalize a core set of practical and effective machine learning methods and concepts, and apply them to solve some real world problems.

Jun 8th 2026
4 Weeks
Introduction to Probability and Data with R (Coursera) Coursera
Duke University

Introduction to Probability and Data with R (Coursera)

This course introduces you to sampling and exploring data, as well as basic probability theory and Bayes' rule. You will examine various types of sampling methods, and discuss how such methods can impact the scope of inference. A variety of exploratory data analysis techniques will be covered, including numeric summary statistics and basic data visualization.

Jun 8th 2026
5-12 Weeks