MOOC List is learner-supported. When you buy through links on our site, we may earn an affiliate commission.
MOOC List is learner-supported. When you buy through links on our site, we may earn an affiliate commission.
The course is taught using the R programming language, and starts with a brief introduction to the language itself (and RStudio, the primary IDE used for R programming), together with a short introduction to Tidyverse, a commonly used set of R libraries. Then, text preprocessing techniques and supervised learning methods will be introduced. The final part of the course covers various unsupervised learning methods that can be used for analysis of textual data.
Students are required to complete quizzes (1-2 for each of the 4 weeks) and to complete a final project using open data and the knowledge they gained during the course.
Course 3 of 4 in the Network Analytics for Business Specialization
Syllabus
WEEK 1
R and RStudio Basics
In this module, you will learn how to work with R and RStudio, how to use RMarkdown for literate programming, and how to work with data using basic R data types and structures
WEEK 2
Working with Tidyverse
In this module, you will learn how to work with data using the Tidyverse set of packages. You will learn how to use tibbles (a Tidyverse alternative to data.frames), the pipe operator from the magrittr package, and how to clean and transform data using the powerful dplyr package. You will also learn how to efficiently work with strings using the stringr package.
WEEK 3
Supervised machine learning with the bag-of-words approach
In this module, you will learn how to obtain text data from Project Gutenberg, how to prepare text data for analysis. You will also learn how to use TF-IDF to find most distinctive words in a corpus of texts and how to build, interpret and evaluate supervised learning models for textual data.
WEEK 4
Unsupervised machine learning
Is this module, you will learn how to preprocess text data using the preText package that can compare many types of preprocessing for a particular corpus. You will also learn how train, interpret and compare topic models.
WEEK 5
Final Project
This module in its entirety is dedicated to the final project of the course, in which you will apply all the knowledge you've gained in this course to do a real analysis of real texts all on your own. You will have to download data from the Project Gutenberg database, explore it, and then apply both supervised and unsupervised machine learning techniques. You will then have to review and grade the work of your peers.
MOOC List is learner-supported. When you buy through links on our site, we may earn an affiliate commission.
MOOC List is learner-supported. When you buy through links on our site, we may earn an affiliate commission.