Data Wrangling with MongoDB (Udacity)

Data Wrangling with MongoDB (Udacity)

In this course, we will explore how to wrangle data from diverse sources and shape it to enable data-driven applications. Some data scientists spend the bulk of their time doing this! Students will learn how to gather and extract data from widely used data formats. They will learn how to assess the quality of data and explore best practices for data cleaning. We will also introduce students to MongoDB, covering the essentials of storing data and the MongoDB query language together with exploratory analysis using the MongoDB aggregation framework.

Class Deals by MOOC List - Click here and see Udacity's Active Discounts, Deals, and Promo Codes.

This is a great course for those interested in entry-level data science positions as well as current business/data analysts looking to add big data to their repertoire, and managers working with data professionals or looking to leverage big data.
This course is also a part of our Data Analyst Nanodegree.

What You Will Learn

Lesson 1
Data Extraction Fundamentals

  • Assessing the Quality of Data
  • Intro to Tabular Formats
  • Parsing CSV

Lesson 2
Data in More Complex Formats

  • XML Design Principles
  • Parsing XML
  • Web Scraping

Lesson 3
Data Quality

  • Sources of Dirty Data
  • A Blueprint for Cleaning
  • Auditing Data

Lesson 4
Working with MongoDB

  • Data Modelling in MongoDB
  • Introduction to PyMongo
  • Field Queries

Lesson 5
Analyzing Data

  • Examples of Aggregation Framework
  • The Aggregation Pipeline
  • Aggregation Operators: $match
  • $project
  • $unwind
  • $group

Lesson 6
Case Study - OpenStreetMap Data

  • Using iterative parsing for large datafiles
  • Open Street Map XML Overview
  • Exercises around OpenStreetMap data

Prerequisites and Requirements
The ideal student should have the following skills:
Programming experience in Python or a willingness to read a little documentation to understand examples and exercises throughout the course.
The ability to perform rudimentary system administration on Windows or Unix
At least some experience using a unix shell or Windows PowerShell will be helpful, but is not required. No prior experience with databases is needed.

Why Take This Course
At the end of the class, students should be able to:

  • Programmatically extract data stored in common formats such as csv, Microsoft Excel, JSON, XML and scrape web sites to parse data from HTML.
  • Audit data for quality (validity, accuracy, completeness, consistency, and uniformity) and critically assess options for cleaning data in different contexts.
  • Store, retrieve, and analyze data using MongoDB.

This course concludes with a final project where students incorporate what they have learned to address a real-world data analysis problem.

Go to Class
MOOC List is learner-supported. When you buy through links on our site, we may earn an affiliate commission.

Related Courses

Kotlin Bootcamp for Programmers (Udacity) Udacity
Udacity,Google

Kotlin Bootcamp for Programmers (Udacity)

Language fundamentals for developers. Learn the essentials of the Kotlin programming language from Kotlin experts at Google. Kotlin is a modern and concise JVM language that supports functional programming paradigms. Whether you’re a Java developer or a programmer in another object-oriented language, this course will teach you the essential language features that has made Kotlin so popular with developers.

Self Paced
Self-Paced
Model Building and Validation (Udacity) Udacity
Udacity

Model Building and Validation (Udacity)

Advanced Techniques for Analyzing Data. This course will teach you how to start from scratch in answering questions about the real world using data. Machine learning happens to be a small part of this process. The model building process involves setting up ways of collecting data, understanding and paying attention to what is important in the data to answer the questions you are asking, finding a statistical, mathematical or a simulation model to gain understanding and make predictions.

Self Paced
Self-Paced
Java Programming Basics (Udacity) Udacity
Udacity

Java Programming Basics (Udacity)

Take your first steps towards becoming a Java developer! Learn Java syntax and create conditional statements, loops, and functions. Taking this course will provide you with a basic foundation in Java syntax, which is the first step towards becoming a successful Java developer. You’ll learn how computers make decisions and how Java keeps track of information through variables and data types.

Self Paced
Self-Paced
Spark (Udacity) Udacity
Udacity,Insight

Spark (Udacity)

Master how to work with big data and build machine learning models at scale using Spark! In this course, you’ll learn how to use Spark to work with big data and build machine learning models at scale, including how to wrangle and model massive datasets with PySpark, the Python library for interacting with Spark. In the first lesson, you will learn about big data and how Spark fits into the big data ecosystem. In lesson two, you will be practicing processing and cleaning datasets to get comfortable with Spark’s SQL and dataframe APIs. In the third lesson, you will debug and optimize your Spark code when running on a cluster. In lesson four, you will use Spark’s Machine Learning Library to train machine learning models at scale.

Self Paced
Self-Paced
HTML5 Canvas (Udacity) Udacity
Udacity

HTML5 Canvas (Udacity)

From Pixels to Animation! Canvas is an HTML5 element which gives you drawable surface inside your web pages you can control with JavaScript. Powerful enough to use for compositing images and even creating games. In this course, through several sample projects, you’ll learn how to use the canvas; how to make compositions using shapes, images, and text; how to create effects and filters on images and how to create animations.

Self Paced
Self-Paced
Swift for Beginners (Udacity) Udacity
Udacity

Swift for Beginners (Udacity)

Your First Programming Language. In this course, you’ll begin learning Swift, Apple's programming language for building iOS applications. You'll start with fundamentals and work your way towards understanding all the core principles necessary to get started creating your first app. At the end of the course, you'll complete a problem set of exercises designed to challenge your understanding of Swift and give you the opportunity to apply what you've learned.

Self Paced
Self-Paced
Kotlin for Android Developers (Udacity) Udacity
Udacity

Kotlin for Android Developers (Udacity)

Convert an Android app from Java to Kotlin. In this course, Aaron Sarazan, Lead Software Engineer at Capital One and a leading advocate for Kotlin, demonstrates how to take a basic Android app in Java and convert it to Kotlin, teaching you key features of the Kotlin programming language along the way. This is an efficient, fast-paced introduction to Kotlin for experienced Java programmers.

Self Paced
Self-Paced
Objective-C for Swift Developers (Udacity) Udacity
Udacity

Objective-C for Swift Developers (Udacity)

This course is designed to teach students how to understand and identify the differences between the Objective-C and Swift programming languages, and especially, how to rewrite from the former to the latter. Understanding communications between the two languages—called "interoperability"—is becoming more and more important for developers, particularly as we prepare for the arrival of Swift 3.0.

Self Paced
Self-Paced
Data Analysis with R (Udacity) Udacity
Udacity,Facebook

Data Analysis with R (Udacity)

Visually Analyze and Summarize Data Sets. Exploratory data analysis is an approach for summarizing and visualizing the important characteristics of a data set. Promoted by John Tukey, exploratory data analysis focuses on exploring data to understand the data’s underlying structure and variables, to develop intuition about the data set, to consider how that data set came into existence, and to decide how it can be investigated with more formal statistical methods.

Self Paced
Self-Paced
Object Oriented Programming in Java (Udacity) Udacity
Udacity

Object Oriented Programming in Java (Udacity)

Build Interactive Java Programs. This course will introduce you to some of the most powerful programming concepts in Java, including: objects, inheritance and collections. You will learn how to use these object-oriented programming concepts in code examples, discover how these concepts are used in applications that require user input, and understand the benefits of mastering these concepts in Java.

Self Paced
Self-Paced
Data Analysis and Visualization (Udacity) Udacity
Georgia Institute of Technology,Udacity

Data Analysis and Visualization (Udacity)

Data and visual analytics is an emerging field concerned with analyzing, modeling, and visualizing complex high dimensional data. This course will introduce students to the field by covering state­-of-­the-art modeling, analysis and visualization techniques. It will emphasize practical challenges involving complex real world data and include several case studies and hands-on work with the R programming language.

Self Paced
Self-Paced