EdX

AI Skills for Engineers: Data Engineering and Data Pipelines (edX)

AI Skills for Engineers: Data Engineering and Data Pipelines (edX)

Good data is central to effective AI applications. This course teaches the basics of data for AI, covering what data is needed, how to extract data from existing databases and basic data skills including setup of a Python notebook environment, basic data exploration and simple data visualizations.

Class Deals by MOOC List - Click here and see EdX's Active Discounts, Deals, and Promo Codes.

Artificial Intelligence and Machine Learning have become central techniques for most services and products, ranging from web-based systems to medical procedures, self-driving cars – even intelligent coffee makers.
Alongside algorithms, data is central to AI applications. Without solid data management, AI projects typically underperform or even fail. Unfortunately, the relevance and complexity of handling data is frequently underestimated.
That’s why we developed this course which covers foundational questions like “Why is data important to AI?” and “What data does AI need?” and covers more application-oriented topics and skills like how to extract, load and query data using an SQL pipeline.
In the second part of the course, you will learn basic data engineering skills, including how to setup your Python notebook environment, explore data with advanced pandas functions, and create simple and clear data visualizations.
This introductory course is targeted at learners with little experience in data management or Python-based data management who want to develop Python-based AI applications in the future. The course covers a brief introduction into data management for AI, relational data management (e.g., SQL), and practical data handling skills in Python, pandas, and Jupyter.
This allows you to build a foundation to prepare for future AI and Machine Learning development with Python.

What you'll learn

  • Why Data Management is central to AI applications
  • What kind of data these applications need
  • How to obtain data for AI applications
  • How to extract and query data from existing databases using SQL
  • How to setup your Python notebooks
  • How to use the pandas library to work with tabular data
  • How to visualize data using the Seaborn library

Syllabus

Week 1:
We ask why we should care about data management for Artificial Intelligence and Machine Learning (ML) systems.
We examine which data are needed in the ML lifecycle and what properties that data should have.
We discuss the effort and time needed for data management activities, and look at possible data sources.

Week 2:
The basic key concepts of data management, such as databases, data models and data schemas are all introduced.
The Relational Data Model is explained and contrasted with the Single-Table Model (like CSV and Excel) and Document Models.

Week 3:
We show how to extract data from existing relational databases using SQL queries and converting the query results into CSV files for further processing using pandas in Python notebooks.

Week 4:
The different ways setoff setting up and running Python notebooks are covered, including cloud-based notebooks and local notebooks.
We will take you step by step through the process of setting up your conda environment and installing Jupyter and pandas libraries.
You will learn how to run notebooks in VS code.

Week 5:
Become a pandas expert.
Explore the essential functionalities of pandas and, most importantly, write elegant and efficient Python pandas code to process and engineer tabular data.

Week 6:
You will learn how to make simple and clear scientific figures in Python using the Seaborn library.
Use the core functions provided by Seaborn to make beautiful statistical plots.

Go to Class
MOOC List is learner-supported. When you buy through links on our site, we may earn an affiliate commission.

Related Courses

Robotic process and intelligent automation for finance (edX) EdX
ACCA

Robotic process and intelligent automation for finance (edX)

In this course we explain how automation can play a key role in delivering the requirement to have robust processes and clean data. By using automation tools and machine learning, finance leaders can identify, implement and configure the right solutions for their organisation. It also shows how tools, such as Python, can be applied to finance processes and the benefits this will bring.

Self Paced
Self-Paced
Introduction to Computer Science and Programming (edX) EdX
Tokyo Institute of Technology,TokyoTechX

Introduction to Computer Science and Programming (edX)

The term “Computation” refers to the action performed by a computer. A computation can be a basic operation and it can also be a sophisticated computer simultation requiring a large amount of data and substantial resources. This course aims at introducing learners with no prior knowledge to basics and key concepts of computer science. By following the lectures and exercises of this course you will have an understanding of algorithms and you will get a real experience of programming using the language Ruby.

Self Paced
Self-Paced
Python for Data Engineering Project (edX) EdX
IBM

Python for Data Engineering Project (edX)

An opportunity to apply your foundational Python skills via a project, using various techniques to collect and work with data. Journey into the realm of becoming a Data Engineer and apply your basic Python knowledge of working with data. You will exercise various techniques in Python to extract data in multiple file formats from different sources, transform it into specific datatypes, and then prepare it for loading it into a database.

Self Paced
Self-Paced
Understanding Artificial Intelligence through Algorithmic Information Theory (edX) EdX
Institut Mines-Telecom,IMTx

Understanding Artificial Intelligence through Algorithmic Information Theory (edX)

Can we characterize intelligent behavior? Are there theoretical foundations on which Artificial Intelligence can be grounded? This course on Algorithmic Information will offer you such a theoretical framework. You will be able to see machine learning, reasoning, mathematics, and even human intelligence as abstract computations aiming at compressing information. This new power of yours will not only help you understand what AI does (or can’t do!) but also serve as a guide to design AI systems.

Self Paced
Self-Paced
Excel for Everyone: Data Management (edX) EdX
The University of British Columbia,UBCx

Excel for Everyone: Data Management (edX)

Further your Excel skills to manage larger datasets and more complex data wrangling, management and modelling. This intermediate Excel course builds on the teachings of the introductory Core Foundations course, teaching you to leverage the power of data calculations and reports to make informed personal or organizational decisions.

Self Paced
Self-Paced
Computer Vision and Image Processing Fundamentals (edX) EdX
IBM

Computer Vision and Image Processing Fundamentals (edX)

Learn about computer vision, one of the most exciting fields in machine learning. artificial intelligence and computer science. Computer Vision is one of the most exciting fields in Machine Learning, computer science and AI. It has applications in many industries such as self-driving cars, robotics, augmented reality, face detection in law enforcement agencies.

Self Paced
Self-Paced
CS50's Introduction to Artificial Intelligence with Python (edX) EdX
HarvardX,Harvard University

CS50's Introduction to Artificial Intelligence with Python (edX)

Learn to use machine learning in Python in this introductory course on artificial intelligence. AI is transforming how we live, work, and play. By enabling new technologies like self-driving cars and recommendation systems or improving old ones like medical diagnostics and search engines, the demand for expertise in AI and machine learning is growing rapidly. This course will enable you to take the first step toward solving important real-world problems and future-proofing your career.

Self Paced
Self-Paced