Cloud Computing (Spring 2020) |
|||||||||||||||||||||||||||||||||||||||||||||||
Instructor |
Lei Deng, Ph.D., Professor Location: Rm.404, No.2 Comprehensive Laboratory Building Office hour: 9:00-17:00 Email: leideng@csu.edu.cn |
||||||||||||||||||||||||||||||||||||||||||||||
Time and location | Week 2-9 Online and offline mixed teaching |
||||||||||||||||||||||||||||||||||||||||||||||
Course description |
What is the "cloud"? How do we build software systems and components that scale to
millions of users and petabytes of data, and are "always available"? In the modern Internet, virtually all large Web services run atop multiple geographically distributed data centers: Google, Yahoo, Facebook, iTunes, Amazon, eBay, Bing, etc. Services must scale across thousands of machines, tolerate faults, and support thousands of concurrent requests. Increasingly, the major providers (including Amazon, Google, Microsoft, HP, and IBM) are looking at "hosting" third-party applications in their data centers - forming so-called "cloud computing" services. A significant number of these services also process "streaming" data: geocoding information from cell phones, tweets, streaming video, etc. This course, aimed at a sophomore with exposure to basic programming within the context of a single machine, focuses on the issues and programming models related to such cloud and distributed data processing technologies: data partitioning, storage schemes, stream processing, and "mostly shared-nothing" parallel algorithms. |
||||||||||||||||||||||||||||||||||||||||||||||
Topics covered | Google cloud computing, the MapReduce programming model, Hadoop, Spark, Amazon cloud, ... | ||||||||||||||||||||||||||||||||||||||||||||||
Format | The format will be 4-hour lecture per week, plus assigned readings. There will be regular homework assignments and a term project. | ||||||||||||||||||||||||||||||||||||||||||||||
Prerequisites |
JAVA/C++ Programming Discrete Mathematics Data Structures Databases |
||||||||||||||||||||||||||||||||||||||||||||||
Texts and readings |
Hadoop: The Definitive Guide, Fourth Edition, by Tom White (O'Reilly) Data-Intensive Text Processing with MapReduce, by Jimmy Lin and Chris Dyer (Morgan & Claypool) Cloud Computing (3rd edition), by Peng Liu (in Chinese, Tsinghua University Press) Additional materials will be provided as handouts or in the form of light technical papers. |
||||||||||||||||||||||||||||||||||||||||||||||
Grading | Homework/Participation/Presentation 50%, Final Project/Paper 50% | ||||||||||||||||||||||||||||||||||||||||||||||
Policies |
You are encouraged to discuss your homework assignments with your
classmates; however, any code you submit must be your own work. You may
not share code with others or copy code from outside sources,
except where the assignment specifically allows it. Plagiarism
can have serious consequences |
||||||||||||||||||||||||||||||||||||||||||||||
Final project/paper |
Option 1: Build a small Facebook-like application using Amazon's SimpleDB. Based on network analysis, the application should make friend
recommendations; it should also visualize the social network. A report with at least 6 pages in English is needed. Option 2: Create an experiment by using hadoop mapreduce or spark to process some big data. And then write a paper with at least 6 pages in English. The paper shoud include sections of introduction, methods, results, conclusion and references. |
||||||||||||||||||||||||||||||||||||||||||||||
Schedule |
Below is the tentative schedule for the course:
|
||||||||||||||||||||||||||||||||||||||||||||||
Experiment materials |
Virtural machine (VM) accounts Client software Linux Fundamentals Unix Tutorial Unix/Linux Command Reference Hadoop tutorial Experiment 1: set up Hadoop Hadoop brief installation manual Hadoop JDK 1.7 Experiment 2: HDFS commands & APIs hdfs java manual (by Nassor) Frequently used HDFS shell commands hdfs java api apache-ant Experiment 3: Process data with Hadoop/Mapreduce Assignment: download Data: download Experiment 4: Spark Spark manual 课件下载 |
||||||||||||||||||||||||||||||||||||||||||||||