Want to learn the basics of large-scale data processing? Need to make predictive models but don’t know the right tools? This course will introduce you to open source tools you can use for parallel, distributed and scalable machine learning.
Learn how to use Hadoop technologies in Microsoft Azure HDInsight to create predictive analytics and machine learning solutions. Are you ready for big data science? In this course, learn how to implement predictive analytics solutions for big data using Apache Spark in Microsoft Azure HDInsight. You will learn how to work with Scala or Python to cleanse and transform data, build machine learning models with Spark MLlib (the machine learning library in Spark), and create real-time machine learning solutions using Spark Streaming. Plus, find out how to use R Server on Spark to work with data at scale in the R language.
Note: To complete the hands-on elements in this course, you will require an Azure subscription and a Windows client computer. You can sign up for a free Azure trial subscription (a valid credit card is required for verification, but you will not be charged for Azure services). Note that the free trial is not available in all regions. It is possible to complete the course and earn a certificate without completing the hands-on practices.
What you'll learn:
- Preprocess data in Apache Spark
- Use Spark to build a machine learning solution
- Explore real-time machine learning solutions in Spark
- Use R at scale with R Server on Spark
- Familiarity with Hadoop clusters in HDInsight.
- Familiarity with database concepts and basic SQL query syntax.
- Familiarity with basic programming constructs (for example, variables, loops, conditional logic).
- A basic knowledge of mathematics, including linear equations and functions.
- A willingness to learn actively and persevere when troubleshooting technical problems is essential.