Udacity

Data Wrangling with MongoDB (Udacity)

Offered by Udacity, MongoDB University,

In this course, we will explore how to wrangle data from diverse sources and shape it to enable data-driven applications. Some data scientists spend the bulk of their time doing this! Students will learn how to gather and extract data from widely used data formats. They will learn how to assess the quality of data and explore best practices for data cleaning. We will also introduce students to MongoDB, covering the essentials of storing data and the MongoDB query language together with exploratory analysis using the MongoDB aggregation framework.

Class Deals by MOOC List - Click here and see Udacity's Active Discounts, Deals, and Promo Codes.

This is a great course for those interested in entry-level data science positions as well as current business/data analysts looking to add big data to their repertoire, and managers working with data professionals or looking to leverage big data.
This course is also a part of our Data Analyst Nanodegree.

What You Will Learn

Lesson 1
Data Extraction Fundamentals

Assessing the Quality of Data
Intro to Tabular Formats
Parsing CSV

Lesson 2
Data in More Complex Formats

XML Design Principles
Parsing XML
Web Scraping

Lesson 3
Data Quality

Sources of Dirty Data
A Blueprint for Cleaning
Auditing Data

Lesson 4
Working with MongoDB

Data Modelling in MongoDB
Introduction to PyMongo
Field Queries

Lesson 5
Analyzing Data

Examples of Aggregation Framework
The Aggregation Pipeline
Aggregation Operators: $match
$project
$unwind
$group

Lesson 6
Case Study - OpenStreetMap Data

Using iterative parsing for large datafiles
Open Street Map XML Overview
Exercises around OpenStreetMap data

Prerequisites and Requirements
The ideal student should have the following skills:
Programming experience in Python or a willingness to read a little documentation to understand examples and exercises throughout the course.
The ability to perform rudimentary system administration on Windows or Unix
At least some experience using a unix shell or Windows PowerShell will be helpful, but is not required. No prior experience with databases is needed.

Why Take This Course
At the end of the class, students should be able to:

Programmatically extract data stored in common formats such as csv, Microsoft Excel, JSON, XML and scrape web sites to parse data from HTML.
Audit data for quality (validity, accuracy, completeness, consistency, and uniformity) and critically assess options for cleaning data in different contexts.
Store, retrieve, and analyze data using MongoDB.

This course concludes with a final project where students incorporate what they have learned to address a real-world data analysis problem.

Go to Class

MOOC List is learner-supported. When you buy through links on our site, we may earn an affiliate commission.

Related Courses

Udacity

Udacity,Google

Kotlin Bootcamp for Programmers (Udacity)

CS: Programming

Language fundamentals for developers. Learn the essentials of the Kotlin programming language from Kotlin experts at Google. Kotlin is a modern and concise JVM language that supports functional programming paradigms. Whether you’re a Java developer or a programmer in another object-oriented language, this course will teach you the essential language features that has made Kotlin so popular with developers.

Self Paced

Self-Paced

Programming Kotlin Kotlin Programming

Udacity

Model Building and Validation (Udacity)

Statistics & Data Analysis Data Science

Advanced Techniques for Analyzing Data. This course will teach you how to start from scratch in answering questions about the real world using data. Machine learning happens to be a small part of this process. The model building process involves setting up ways of collecting data, understanding and paying attention to what is important in the data to answer the questions you are asking, finding a statistical, mathematical or a simulation model to gain understanding and make predictions.

Self Paced

Self-Paced

Machine Learning Modeling Data Analysis

Udacity

Java Programming Basics (Udacity)

CS: Software Engineering CS: Programming

Take your first steps towards becoming a Java developer! Learn Java syntax and create conditional statements, loops, and functions. Taking this course will provide you with a basic foundation in Java syntax, which is the first step towards becoming a successful Java developer. You’ll learn how computers make decisions and how Java keeps track of information through variables and data types.

Self Paced

Self-Paced

Programming Java Debugging

Udacity

Udacity,Insight

Spark (Udacity)

Data Science

Master how to work with big data and build machine learning models at scale using Spark! In this course, you’ll learn how to use Spark to work with big data and build machine learning models at scale, including how to wrangle and model massive datasets with PySpark, the Python library for interacting with Spark. In the first lesson, you will learn about big data and how Spark fits into the big data ecosystem. In lesson two, you will be practicing processing and cleaning datasets to get comfortable with Spark’s SQL and dataframe APIs. In the third lesson, you will debug and optimize your Spark code when running on a cluster. In lesson four, you will use Spark’s Machine Learning Library to train machine learning models at scale.

Self Paced

Self-Paced

Python Debugging Machine Learning

Udacity

Udacity,Google

JavaScript Promises (Udacity)

CS: Programming Computer Science

Async Work Made Easy. Learn how to handle asynchronous work with ease! In this course, you'll use Native JavaScript Promises to write asynchronous code that is easy to read, easy to write and easy to debug. Along the way, you'll be using Promises to make a webapp come to life!

Self Paced

Self-Paced

Programming Javascript Asynchronous Programming

Udacity

HTML5 Canvas (Udacity)

CS: Software Engineering CS: Programming

From Pixels to Animation! Canvas is an HTML5 element which gives you drawable surface inside your web pages you can control with JavaScript. Powerful enough to use for compositing images and even creating games. In this course, through several sample projects, you’ll learn how to use the canvas; how to make compositions using shapes, images, and text; how to create effects and filters on images and how to create animations.

Self Paced

Self-Paced

Game Programming HTML

Udacity

Swift for Beginners (Udacity)

CS: Programming

Your First Programming Language. In this course, you’ll begin learning Swift, Apple's programming language for building iOS applications. You'll start with fundamentals and work your way towards understanding all the core principles necessary to get started creating your first app. At the end of the course, you'll complete a problem set of exercises designed to challenge your understanding of Swift and give you the opportunity to apply what you've learned.

Self Paced

Self-Paced

Programming Functions Strings

Udacity

Kotlin for Android Developers (Udacity)

CS: Software Engineering Computer Science

Convert an Android app from Java to Kotlin. In this course, Aaron Sarazan, Lead Software Engineer at Capital One and a leading advocate for Kotlin, demonstrates how to take a basic Android app in Java and convert it to Kotlin, teaching you key features of the Kotlin programming language along the way. This is an efficient, fast-paced introduction to Kotlin for experienced Java programmers.

Self Paced

Self-Paced

Programming Android Android Apps

Udacity

Objective-C for Swift Developers (Udacity)

CS: Programming Computer Science

This course is designed to teach students how to understand and identify the differences between the Objective-C and Swift programming languages, and especially, how to rewrite from the former to the latter. Understanding communications between the two languages—called "interoperability"—is becoming more and more important for developers, particularly as we prepare for the arrival of Swift 3.0.

Self Paced

Self-Paced

Programming Swift iOS App

Udacity

Udacity,Facebook

Data Analysis with R (Udacity)

Statistics & Data Analysis

Visually Analyze and Summarize Data Sets. Exploratory data analysis is an approach for summarizing and visualizing the important characteristics of a data set. Promoted by John Tukey, exploratory data analysis focuses on exploring data to understand the data’s underlying structure and variables, to develop intuition about the data set, to consider how that data set came into existence, and to decide how it can be investigated with more formal statistical methods.

Self Paced

Self-Paced

Statistics Data Analysis EDA

Udacity

Object Oriented Programming in Java (Udacity)

CS: Programming

Build Interactive Java Programs. This course will introduce you to some of the most powerful programming concepts in Java, including: objects, inheritance and collections. You will learn how to use these object-oriented programming concepts in code examples, discover how these concepts are used in applications that require user input, and understand the benefits of mastering these concepts in Java.

Self Paced

Self-Paced

Programming Java Object-Oriented Programming

Udacity

Georgia Institute of Technology,Udacity

Data Analysis and Visualization (Udacity)

Statistics & Data Analysis Data Science

Data and visual analytics is an emerging field concerned with analyzing, modeling, and visualizing complex high dimensional data. This course will introduce students to the field by covering state-of-the-art modeling, analysis and visualization techniques. It will emphasize practical challenges involving complex real world data and include several case studies and hands-on work with the R programming language.

Self Paced

Self-Paced

Data Structures Regression Data Analysis