Class Deals by MOOC List - Click here and see FutureLearn's Active Discounts, Deals, and Promo Codes.
You’ll process a dataset with 10 million instances. You’ll mine a 250,000-word text dataset. You’ll analyze a supermarket dataset representing 5000 shopping baskets. You’ll learn about filters for preprocessing data, selecting attributes, classification, clustering, association rules, cost-sensitive evaluation. You’ll meet learning curves and automatically optimize learning parameters. Weka originated at the University of Waikato in NZ, and Ian Witten has authored a leading book on data mining.
What topics will you cover?
- Running large-scale data mining experiments
- Constructing and executing knowledge flows
- Processing very large datasets
- Analyzing collections of textual documents
- Mining association rules
- Preprocessing data using a range of filters
- Automatic methods of attribute selection
- Clustering data
- Taking account of different decision costs
- Producing learning curves
- Optimizing learning parameters in data mining
What will you achieve?
- Compare the performance of different mining methods on a wide range of datasets
- Demonstrate how to set up learning tasks as a knowledge flow
- Solve data mining problems on huge datasets
- Apply equal-width and equal-frequency binning for discretizing numeric attributes
- Identify the advantages of supervised vs unsupervised discretization
- Evaluate different trade-offs between error rates in 2-class classification
- Classify documents using various techniques
- Debate the correspondence between decision trees and decision rules
- Explain how association rules can be generated and used
- Discuss techniques for representing, generating, and evaluating clusters
- Perform attribute selection by wrapping a classifier inside a cross-validation loop
- Describe different techniques for searching through subsets of attributes
- Develop effective sets of attributes for text classification problems
- Explain cost-sensitive evaluation, cost-sensitive classification, and cost-sensitive learning
- Design and evaluate multi-layer neural networks
- Assess the volume of training data needed for mining tasks
- Calculate optimal parameter values for a given learning system
Who is the course for?
This course is aimed at anyone who deals in data. It follows on from Data Mining with Weka, and you should have completed that first (or have otherwise acquired a rudimentary knowledge of Weka). As with the previous course, it involves no computer programming, although you need some experience with using computers for everyday tasks. High-school maths is more than enough; some elementary statistics concepts (means and variances) are assumed.
What software or tools do you need?
Before the course starts, download the free Weka software. It runs on any computer, under Windows, Linux, or Mac. It has been downloaded millions of times and is being used all around the world.
(Note: Depending on your computer and system version, you may need admin access to install Weka.)