Introduction to Parallel Programming with CUDA (Coursera)

Introduction to Parallel Programming with CUDA (Coursera)
Course Auditing
Categories
Effort
Certification
Languages
S​ome experience in C/C++ programming.
Misc

MOOC List is learner-supported. When you buy through links on our site, we may earn an affiliate commission.

Introduction to Parallel Programming with CUDA (Coursera)
This course will help prepare students for developing code that can process large amounts of data in parallel on Graphics Processing Units (GPUs). It will learn on how to implement software that can solve complex problems with the leading consumer to enterprise-grade GPUs available using Nvidia CUDA. They will focus on the hardware and software capabilities, including the use of 100s to 1000s of threads and various forms of memory.

MOOC List is learner-supported. When you buy through links on our site, we may earn an affiliate commission.

Course 2 of 4 in the GPU Programming Specialization.


What You Will Learn

- S​tudents will learn how to utilize the CUDA framework to write C/C++ software that runs on CPUs and Nvidia GPUs.

- S​tudents will transform sequential CPU algorithms and programs into CUDA kernels that execute 100s to 1000s of times simultaneously on GPU hardware.


Syllabus


WEEK 1

Course Overview

The purpose of this module is for students to understand how the course will be run, topics, how they will be assessed, and expectations.


WEEK 2

Threads, Blocks and Grids

The single most important concept for using GPUs to solve complex and large-scale problems, is management of threads. CUDA provides two- and three-dimensional logical abstractions of threads, blocks and grids. Students will develop programs that utilize threads, blocks, and grids to process large 2 to 3-dimensional data sets.


WEEK 3

Host and Global Memory

To manage the access and modification of data in physical memory effectively, students will need to load data into CPU (host) and GPU (global) general-purpose memory. Students will create software that allocates host memory and transfers it into global memory for use by threads. Students will also learn the capabilities and speeds of these types of memories.


WEEK 4

Shared and Constant Memory

To improve performance in GPU software, students will need to utilized mutable (shared) and static (constant) memory. They will use them to apply masks to all items of a data set, to manage the communication between threads, and use for caching in complex programs.


WEEK 5

Register Memory

In this module, students will learn the benefits and constraints of GPUs most hyper-localized memory, registers. While using this type of memory will be natural for students, gaining the largest performance boost from it, like all forms of memory, will require thoughtful design of software. Students will develop implementations of algorithms using each type of memory and generate performance analysis.



MOOC List is learner-supported. When you buy through links on our site, we may earn an affiliate commission.

Course Auditing
44.00 EUR/month
S​ome experience in C/C++ programming.

MOOC List is learner-supported. When you buy through links on our site, we may earn an affiliate commission.