We live today in a world flooded with data. In just one short decade, we have gone from a data-poor world to a data-rich one. The buzzword Big Data captures this phenomenon, and it’s one of the few cases where the reality actually can match the hype. Big Data is transforming every industry and human activity, including commerce, entertainment, agriculture, government, and the sciences.
For the past decade at Stanford, Jure Leskovec, Jeff Ullman, and I have been teaching a popular course called “Mining of Massive Datasets,” where we teach the fundamental techniques and tools to deal with Big Data. This class has trained a whole generation of data scientists and engineers who work at many major Silicon Valley companies and startups.
The Stanford course is popular, and attracts hundreds of students. But the course textbook, also called Mining of Massive Datasets and published by Cambridge University Press, has been downloaded by hundreds of thousands of students and practitioners. This helped us realize that our Stanford students are just a small fraction of the vast number of people worldwide who might benefit from the course.
So we are now making this class available online, on Coursera, for the entire world. In this class we will introduce fundamental algorithms and techniques to deal with Big Data, such as MapReduce, Locality Sensitive Hashing, Page Rank, and algorithms for Large Graphs and Data Streams. We will also show how to apply our toolkit to important practical applications, such as Web Search, Recommender Systems and Online Advertising.
The class starts September 29 and runs for 9 weeks. One of the key decisions we made is to not “water down” the material in any way from the course we teach at Stanford; the MOOC contains exactly the same material as the Stanford class. You can sign up for the class on the Coursera page. Here’s a short introductory video we recorded for the class.
In addition to the materials provided with this MOOC, the second edition of our textbook is now available for free download here. If you liked the first edition of the book, you should definitely check out the second edition -- we’ve added lots of new material, including graph algorithms, social network analysis, large-scale machine learning, and dimensionality reduction.
Marc Andreessen has famously pointed out that software is eating the world. Data is the fuel that powers software’s conquests. Data is created whenever humans and software interact, or when software interacts with other software. This virtuous cycle -- the success of software creates more data, and more data makes software even more powerful -- is a dynamic that is transforming the world we live in. Join us on Coursera to learn how to harness the power of data so that you can be an active participant, rather than a mere spectator, in this transformation.