Enhance your data analysis skills using spreadsheets and data visualization. Increase your productivity and make better business decisions.
This course is for all of those struggling with data analysis. That crazy spreadsheet from your boss? Megabytes of sensor data to analyze? Looking for a smart way visualize your data in order to make sense out of it? We’ve got you covered!
Using video lectures and hands-on exercises, we will teach you techniques and best practices that will boost your data analysis skills.
We will take a deep dive into data analysis with spreadsheets: PivotTables, VLOOKUPS, Named ranges, what-if analyses, making great graphs - all those will be covered in the first weeks of the course. After, we will investigate the quality of the spreadsheet model, and especially how to make sure your spreadsheet remains error-free and robust.
Finally, once we have mastered spreadsheets, we will demonstrate other ways to store and analyze data. We will also look into how Python, a programming language, can help us with analyzing and manipulating data in spreadsheets.
This course will be created using Excel 2013, but the course can be followed using another spreadsheet program as well.
The goal of this course is it to help you to overcome data analysis challenges in your work, research or studies. Therefore we encourage you to participate actively and to raise real data analysis problems that you face in our discussion forums.
We assume some experience in working with spreadsheets. Not a lot of is needed, basically you should have opened a spreadsheets and seen a SUM before (in software such as Excel, OpenOffice, Numbers or even Google Spreadsheets).
Use R to learn the fundamental statistical topic of basic inferential statistics. In the second part of a two part course, we’ll learn how to take data and use it to make reasonable and useful conclusions. You’ll learn the basics of statistical thinking – starting with an interesting question and some data.
Large-scale biology projects such as the sequencing of the human genome and gene expression surveys using RNA-seq, microarrays and other technologies have created a wealth of data for biologists. However, the challenge facing scientists is analyzing and even accessing these data to extract useful information pertaining to the system being studied. This course focuses on employing existing bioinformatic resources – mainly web-based programs and databases – to access the wealth of data to answer questions relevant to the average biologist, and is highly hands-on.
This course introduces you to sampling and exploring data, as well as basic probability theory and Bayes' rule. You will examine various types of sampling methods, and discuss how such methods can impact the scope of inference. A variety of exploratory data analysis techniques will be covered, including numeric summary statistics and basic data visualization.
In business, data and algorithms create economic value when they reduce uncertainty about financially important outcomes. This course teaches the concepts and mathematical methods behind the most powerful and universal metrics used by Data Scientists to evaluate the uncertainty-reduction – or information gain - predictive models provide. We focus on the two most common types of predictive model - binary classification and linear regression - and you will learn metrics to quantify for yourself the exact reduction in uncertainty each can offer. These metrics are applicable to any form of model that uses new information to improve predictions cast in the form of a known probability distribution – the standard way of representing forecasts in data science.
This four-module course introduces users to Julia as a first language. Julia is a high-level, high-performance dynamic programming language developed specifically for scientific computing. This language will be particularly useful for applications in physics, chemistry, astronomy, engineering, data science, bioinformatics and many more.
This course focuses on one of the most important tools in your data analysis arsenal: regression analysis. Using either SAS or Python, you will begin with linear regression and then learn how to adapt when two variables do not present a clear linear relationship. You will examine multiple predictors of your outcome and be able to identify confounding variables, which can tell a more compelling story about your results. You will learn the assumptions underlying regression analysis, how to interpret regression coefficients, and how to use regression diagnostic plots and other tools to evaluate the quality of your regression model. Throughout the course, you will share with others the regression models you have developed and the stories they tell you.
Process mining is the missing link between model-based process analysis and data-oriented analysis techniques. Through concrete data sets and easy to use software the course provides data science knowledge that can be applied directly to analyze and improve processes in a variety of domains. Data science is the profession of the future, because organizations that are unable to use (big) data in a smart way will not survive. It is not sufficient to focus on data storage and data analysis. The data scientist also needs to relate data to process analysis.
In this course, you will develop and test hypotheses about your data. You will learn a variety of statistical tests, as well as strategies to know how to apply the appropriate one to your specific data and question. Using your choice of two powerful statistical software packages (SAS or Python), you will explore ANOVA, Chi-Square, and Pearson correlation analysis. This course will guide you through basic statistical principles to give you the tools to answer questions you have developed. Throughout the course, you will share your progress with others to gain valuable feedback and provide insight to other learners about their work.