Coursera

Introduction to Parallel Programming with CUDA (Coursera)

Offered by Johns Hopkins University,

This course will help prepare students for developing code that can process large amounts of data in parallel on Graphics Processing Units (GPUs). It will learn on how to implement software that can solve complex problems with the leading consumer to enterprise-grade GPUs available using Nvidia CUDA. They will focus on the hardware and software capabilities, including the use of 100s to 1000s of threads and various forms of memory.

Class Deals by MOOC List - Click here and see Coursera's Active Discounts, Deals, and Promo Codes.

Course 2 of 4 in the GPU Programming Specialization.

What You Will Learn

Students will learn how to utilize the CUDA framework to write C/C++ software that runs on CPUs and Nvidia GPUs.
Students will transform sequential CPU algorithms and programs into CUDA kernels that execute 100s to 1000s of times simultaneously on GPU hardware.

Syllabus

WEEK 1
Course Overview
The purpose of this module is for students to understand how the course will be run, topics, how they will be assessed, and expectations.

WEEK 2
Threads, Blocks and Grids
The single most important concept for using GPUs to solve complex and large-scale problems, is management of threads. CUDA provides two- and three-dimensional logical abstractions of threads, blocks and grids. Students will develop programs that utilize threads, blocks, and grids to process large 2 to 3-dimensional data sets.

WEEK 3
Host and Global Memory
To manage the access and modification of data in physical memory effectively, students will need to load data into CPU (host) and GPU (global) general-purpose memory. Students will create software that allocates host memory and transfers it into global memory for use by threads. Students will also learn the capabilities and speeds of these types of memories.

WEEK 4
Shared and Constant Memory
To improve performance in GPU software, students will need to utilized mutable (shared) and static (constant) memory. They will use them to apply masks to all items of a data set, to manage the communication between threads, and use for caching in complex programs.

WEEK 5
Register Memory
In this module, students will learn the benefits and constraints of GPUs most hyper-localized memory, registers. While using this type of memory will be natural for students, gaining the largest performance boost from it, like all forms of memory, will require thoughtful design of software. Students will develop implementations of algorithms using each type of memory and generate performance analysis.

Go to Class

MOOC List is learner-supported. When you buy through links on our site, we may earn an affiliate commission.

Related Courses

Coursera

Johns Hopkins University

CUDA at Scale for the Enterprise (Coursera)

CS: Software Engineering

This course will aid in students in learning in concepts that scale the use of GPUs and the CPUs that manage their use beyond the most common consumer-grade GPU installations. They will learn how to manage asynchronous workflows, sending and receiving events to encapsulate data transfers and control signals. Also, students will walk through application of GPUs to sorting of data and processing images, implementing their own software using these techniques and libraries.

Aug 3rd 2026

5-12 Weeks

GPU Image Processing CPU

FutureLearn

Partnership for Advanced Computing in Europe - PRACE

MPI: A Short Introduction to One-sided Communication (FutureLearn)

Computer Science

Learn the details of one-sided communication in MPI programming. Discover the advantages to one-sided communication in parallel programming. Message Passing Interface (MPI) is a key standard for parallel computing architectures. On this course, you’ll learn the essential concepts of one-sided communication in MPI, as well as the advantages of the MPI communication model.

No sessions available

2 Weeks

MPI Parallel Programming One-sided Communication

Coursera

École Polytechnique Fédérale de Lausanne

Initiation à la programmation (en Java) (Coursera)

CS: Software Engineering CS: Programming

Ce cours initie aux bases de la programmation en utilisant le langage Java : variables, boucles, fonctions, ... Il s'appuie sur de nombreux éléments pédagogiques : vidéos sous-titrées, quizz dans et hors vidéos, exercices, devoirs notés automatiquement, notes de cours.

Aug 24th 2026

5-12 Weeks

Programming Java Object-Oriented

OpenHPI

Hasso-Plattner-Institut

Parallel Programming Concepts (openHPI)

CS: Software Engineering

The openHPI online course “Parallel Programming Concepts” presents relevant theoretical and practical foundations for parallel programming. We show crucial theoretical ideas such as semaphores and actors, the architecture of modern parallel hardware, different programming models such as task parallelism, message passing and functional programming, and several patterns and best practices.

Self Paced

Self-Paced

Parallel Programming

Udacity

Georgia Institute of Technology,Udacity

High Performance Computing (Udacity)

Computer Science

The goal of this course is to give you solid foundations for developing, analyzing, and implementing parallel and locality-efficient algorithms. This course focuses on theoretical underpinnings. To give a practical feeling for how algorithms map to and behave on real systems, we will supplement algorithmic theory with hands-on exercises on modern HPC systems, such as Cilk Plus or OpenMP on shared memory nodes, CUDA for graphics co-processors (GPUs), and MPI and PGAS models for distributed memory systems.

Self Paced

Self-Paced

Algorithms OpenMP MPI

Using GPUs to Scale and Speed-up Deep Learning (edX)

EdX

IBM

Using GPUs to Scale and Speed-up Deep Learning (edX)

Statistics & Data Analysis Data Science

Training complex deep learning models with large datasets takes a long time. In this course, you will learn how to use accelerated GPU hardware to overcome the scalability problem in deep learning. Training a complex deep learning model with a very large dataset can take hours, days and occasionally weeks to train. So, what is the solution? Accelerated hardware.

No sessions available

5-12 Weeks

GPU TensorFlow Deep Learning

Shell Programming - A necessity for all Programmers (edX)

EdX

IIT Bombay,IITBombayX

Shell Programming - A necessity for all Programmers (edX)

Computer Science

Unleash your Linux scripting skills and amaze others with your productivity level. Various programming languages have gained popularity since 1970. Starting with Assembly, C, C++, and moving towards Java, Python, and finally building of backend and frontend frameworks, all of these became popular and were or getting replaced with some other language or framework. Shell programming (scripting) is the only programming language that has been popular and the choice of programmers, testers, system administrators, etc., from 1970 to date (21st century).

Self Paced

Self-Paced

Arithmetic Linux Java Programming

Coursera

JetBrains

Kotlin for Java Developers (Coursera)

CS: Software Engineering Computer Science

The Kotlin programming language is a modern language that gives you more power for your everyday tasks. Kotlin is concise, safe, pragmatic, and focused on interoperability with Java code. It can be used almost everywhere Java is used today: for server-side development, Android apps, and much more. This course aims to share with you the power and the beauty of Kotlin.

Aug 17th 2026

5-12 Weeks

Programming Java Functional Programming

Coursera

University of Melbourne,The Chinese University of Hong Kong

Advanced Modeling for Discrete Optimization (Coursera)

CS: Software Engineering

Optimization is a common form of decision making, and is ubiquitous in our society. Its applications range from solving Sudoku puzzles to arranging seating in a wedding banquet. The same technology can schedule planes and their crews, coordinate the production of steel, and organize the transportation of iron ore from the mines to the ports. Good decisions in manpower and material resources management also allow corporations to improve profit by millions of dollars.

Aug 17th 2026

5-12 Weeks

Debugging Discrete Optimization Modeling

Coursera

Johns Hopkins University

Introduction to Concurrent Programming with GPUs (Coursera)

CS: Software Engineering CS: Programming

This course will help prepare students for developing code that can process large amounts of data in parallel. It will focus on foundational aspects of concurrent programming, such as CPU/GPU architectures, multithreaded programming in C and Python, and an introduction to CUDA software/hardware.

Aug 3rd 2026

4 Weeks

Programming Python Parallel Programming

Coursera

University of Melbourne,The Chinese University of Hong Kong

Basic Modeling for Discrete Optimization (Coursera)

CS: Software Engineering

Aug 17th 2026

4 Weeks

Programming Software Discrete Optimization

Coursera

École Polytechnique Fédérale de Lausanne

Introduction à la programmation orientée objet (en C++) (Coursera)

CS: Software Engineering CS: Programming

Ce cours introduit la programmation orientée objet (encapsulation, abstration, héritage, polymorphisme) en l'illustrant en langage C++. Il présuppose connues les bases de la programmation (variables, types, boucles, fonctions, ...). Il est conçu comme la suite du cours « Initiation à la programmation (en C++) ».

Aug 17th 2026

5-12 Weeks

C++ Encapsulation Heritage