CS 564 Database Management Systems: Design and Implementation

Noland Hall 132 on MonWedFri 2:30-3:45pm

Chat with us on the course Piazza site if you have any questions!

Description

CS 564 is designed to give students a solid background in database management systems, particularly relational database management systems (DBMSs). We will examine such systems from two perspectives: that of a DBMS user, and that of a DBMS implementor.

There will be 2 programming assignments that will explore database design and implementation and 2 SQL based assignments to investigate DBMS querying and optimization.

Class Logistics

Course Prerequisites

  • CS 367 (or equivalent data structures course) is absolutely essential. CS 537 might be helpful.

Programming Tools

The in-class activities, homework, and projects will require the use of C++:

  • The programming language C++ will be used for the DBMS internals project. To brush up your C++ skills, you can go through the lecture material for CS 368: C++ for Java Programmers, or the material from a more recent class found here. Check here and here for more C++ tutorials. Keep the links handy throught the semester as you may need to go to it for C++ help.
  • If many of you are coming from a Java background this FAQ might be particularly helpful to understand the syntax differences.
  • CLion is a C++ IDE from JetBrains that you may find useful for organizing the larger C++ projects. The professional edition of all of JetBrains' IDEs are available free with a student account, which you can sign up for here.

Recommended Textbook

  • Database management systems (3rd edition), by Raghu Ramakrishnan and Johannes Gehrke (also called the “cow book”).

Additional books that can be used

  • Database Systems: The Complete Book (2nd edition), by Hector Garcia-Molina, Jennifer Widom, and Jeffrey Ullman.

Assignments

  • All programming assignments are individual assignments. Each student must send us an individual submission.
  • Programming assignments are due by the end of day on the indicated dates.
  • All assignments will be submitted via Canvas.
  • The honor code described below will be enforced for both types of assignments.

Lecture Notes

  • Because Professor Graefe does not use lecture slides, we've set up a dropbox with pictures of the blackboard notes.
  • The dropbox can be found here.

Lecture Plan

Here is the link to detailed lecture schedule. The lecture plan is subject to change.

Lecture Materials


# Date Topic Lecture Materials Assignments
1 9/4 Course Logistics and DBMS Uses

Cow book: Chapter 1

2 9/6 Assignment #1 Introduction
References: See C++ references above.

Assignment #1: C++ Word Locator
3 9/9 ER Diagrams and Database Design
Cow book
  • 2.1, 2.3, 2.4.1 (ER Diagrams)
  • 3.2.1, 3.2.2 (Key Constraints)

4 9/11 SQL I
Cow book
  • 4.2 (Relational Algebra)
  • 5.1, 5.2 (SQL Query Basics)

5 9/16 SQL II
Cow book
  • 5.3 - 5.5 (More Advanced SQL)

6 9/18 DBMS Buffer Management
Cow book
  • 9.1 - 9.4 (Storing Data)

7 9/20 Assignment #2 Introduction
References: See posted assignment slides.

Assignment #2: SQL
8 9/23 Tree-based Storage Structures
Cow book
  • Chapter 10 (B Trees)

9 9/25 Sorting Lecture
References: Slides from Thanh Do

10 9/30 Query execution algorithms I
References: Parts of Cow book chapters 12-14. Look at the Lecture Plan for topics to focus on in in these chapters.

Assignment 2 due @ 11:59 pm
11 10/2 Query execution algorithms II
References: Parts of Cow book chapters 12-14. Look at the Lecture Plan for topics to focus on in in these chapters.

12 10/4 Assignment #3 Introduction
References:
  • See posted assignment slides
  • Cow book 10.3-10.5 will also be very useful


Assignment #3: B+ Tree
13 10/7 Database Metadata
Cow book
  • Section 12.1 (System catalog)
  • Section 15.1 (Relational algebra)
  • Section 15.2 (Estimating plan sizes)

14 10/9 Query Optimization I
Cow book
  • Chapter 15
10/11 slides
15 10/14 Query Optimization II
Cow book
  • Chapter 15

16 10/16 Query Optimization III
Cow book
  • Chapter 15

17 10/21 Review Session 1


N/A 10/25 Midterm Exam Released
References:
  • See all posted book chapters
  • Lecture blackboard pictures in dropbox (link above)

Midterm Exam
18 10/28 Transaction Management I
Cow book
  • Chapter 16.7 (Crash recovery and rollback)
  • Chapter 18.4 (Write-ahead logging)
  • Chapter 18.6 (Restarting after crash)


References: Instant Recovery Paper

19 10/30 Transaction Management II
See 10/28 resources

20 11/4 Concurrency Control I
Cow book
  • Chapter 16.1 (ACID)
  • Chapter 16.2 - 16.4 (Concurrency Control)
  • Chapter 17.1 - 17.4 (Lock Management)

21 11/6 Concurrency Control II
See 11/4 resources

22 11/8 Indexes and other DBMSs
Cow book
  • Chapter 8.3.1 (Hash Indexes)
  • Chapter 25.6.1 (Bitmap Indexes)
Key-value stores: comparison with RDBMSs here. Amazon's explanation here.
23 11/13 Columnar Storage and Compression
Complete Book Prof. Rekatsinas' Fall 2017 Lecture Slides
24 11/15 Assignment 4 out + merge forests Blog post overview of log-structured merge trees. Assignment #4: Query Optimization
25 11/18 Scaling & parallel query execution
Cow book
  • Chapter 22.1-22.3 (Parallelism)
  • Chapter 22.14 (Distributed Recovery)
Complete book
26 11/20 Cloud services
Online resources
27 11/22 In-Class Practice Final
In-Class Practice Final
28 11/25 In-memory databases
Online resources
29 11/27 Review Session
30 12/2 Data cleaning
31 12/4 Data analytics
33 12/9 In-class TA Review Session I
  • Prof. Rekatsinas' Join algorithm slides
  • Prof. Rekatsinas' external sort slides

  • Keep in mind that Prof. Rekatsinas' slides may contain material that won't appear on our exam. You can find the rest of his slides on his course website from 2017.

    34 12/11 In-class TA Review Session II


    Midterm Exam
    The midterm exam will be a take-home assignment. It will be assigned on Friday, October 25th and will be due Friday, November 1st. It can be found in the semester schedule above.
    Final Exam
    The final exam will be on Friday, December 13th from 10:05am - 12:05pm in room 105 Psychology. To focus on what to study, look at the lecture plan and make sure you know the main ideas behind each concept (when it should/shouldn't be used and why it's used). The possible exam questions at the end of the document are a good place to start. Also make sure to review the midterm and practice final.
    Grading
    Programming Assignments 55%
    Midterm 15%
    Final 30%
    Staff
    Goetz Graefe (graefe2@wisc.edu)
    Kyle Klassy (klassy@cs.wisc.edu)
    Zhihan Guo (zhihan@cs.wisc.edu)
    Ruohui Wang (ruohui@cs.wisc.edu)
    Office Hours

    Goetz Graefe: Monday 3:45 pm - 4:45 pm, Wednesday 1:15 pm - 2:15 pm @ Noland 132 (our classroom)

    • Also available by appointment.

    Kyle Klassy: Monday 9:00 am - 10:00 am @ CS4243

    Zhihan Guo: Friday 9:30 am - 10:30 am @ CS4241

    Ruohui Wang: Tuesday 2:15 pm - 3:15 pm @ CS3393

    Note: the schedule of office hours may change from time to time, in which case an announcement will be made on the course Piazza.

    Late Policy
    We will not accept late assignments or homework. Late work will be given 0 points.
    Honor Code and Collaboration Policy

    We encourage you to discuss the Programming Assignments with other students; it's fine to discuss overall strategy and collaborate with a partner or in a small group, as both giving and receiving advice will help you to learn.

    However, you must write your own solutions to all of the problems, and you must cite all people you worked with.

    It's not OK to share code or write code collaboratively. (This includes posting and/or sharing your code publicly, such as on GitHub!)

    If you do not do so, we will consider this a violation of the University of Wisconsin Honor Code.

    If you consult any resources outside of the materials provided in class, you must cite these sources. We reserve the right to assign a penalty if your answers are substantially derivative, but, as long as you provide appropriate citations, we will not consider this an Honor Code violation.