Introduction to Text Mining with R (Coursera)

Introduction to Text Mining with R (Coursera)
Course Auditing
Categories
Effort
Certification
Languages
No specific background required
Misc

MOOC List is learner-supported. When you buy through links on our site, we may earn an affiliate commission.

Introduction to Text Mining with R (Coursera)
This course gives you access to the text mining techniques that are used by top data scientists from all over the world. Since most information available online in the form of text, knowing when and how to use these techniques, algorithms and models will not only give you an edge over your competition in the job market, but also allow you to see the world around you from a completely new perspective. This course covers from the very basics of programmatically working with text to advanced unsupervised learning methods.

MOOC List is learner-supported. When you buy through links on our site, we may earn an affiliate commission.

The course is taught using the R programming language, and starts with a brief introduction to the language itself (and RStudio, the primary IDE used for R programming), together with a short introduction to Tidyverse, a commonly used set of R libraries. Then, text preprocessing techniques and supervised learning methods will be introduced. The final part of the course covers various unsupervised learning methods that can be used for analysis of textual data.

Students are required to complete quizzes (1-2 for each of the 4 weeks) and to complete a final project using open data and the knowledge they gained during the course.


Course 3 of 4 in the Network Analytics for Business Specialization


Syllabus


WEEK 1

R and RStudio Basics

In this module, you will learn how to work with R and RStudio, how to use RMarkdown for literate programming, and how to work with data using basic R data types and structures


WEEK 2

Working with Tidyverse

In this module, you will learn how to work with data using the Tidyverse set of packages. You will learn how to use tibbles (a Tidyverse alternative to data.frames), the pipe operator from the magrittr package, and how to clean and transform data using the powerful dplyr package. You will also learn how to efficiently work with strings using the stringr package.


WEEK 3

Supervised machine learning with the bag-of-words approach

In this module, you will learn how to obtain text data from Project Gutenberg, how to prepare text data for analysis. You will also learn how to use TF-IDF to find most distinctive words in a corpus of texts and how to build, interpret and evaluate supervised learning models for textual data.


WEEK 4

Unsupervised machine learning

Is this module, you will learn how to preprocess text data using the preText package that can compare many types of preprocessing for a particular corpus. You will also learn how train, interpret and compare topic models.


WEEK 5

Final Project

This module in its entirety is dedicated to the final project of the course, in which you will apply all the knowledge you've gained in this course to do a real analysis of real texts all on your own. You will have to download data from the Project Gutenberg database, explore it, and then apply both supervised and unsupervised machine learning techniques. You will then have to review and grade the work of your peers.



MOOC List is learner-supported. When you buy through links on our site, we may earn an affiliate commission.

Course Auditing
41.00 EUR/month
No specific background required

MOOC List is learner-supported. When you buy through links on our site, we may earn an affiliate commission.