CMSC848E: Machine Learning for Data Management Systems

Amol Deshpande;        Tue-Thu 12:30pm-1:45pm


[Home] [Schedule] [Assignments] [Resources]


Schedule and Readings

**This is a tentative schedule with relevant readings listed, and will be filled out with more details over the first two weeks of the semester. Only a few of these readings will be required readings (1-2 per class).**
  • Weeks 1 and 2: Background [show/hide]

  • Weeks 3, 4, 5: Learned Indexes, Storage Layouts [show/hide]

  • Weeks 6-9: Query Processing, Query Optimization [show/hide]
  • Slides/Notes: [Sorting; Joins], [Eddies], [UCB; UCT; SkinnerDB], [AQP; Cardinality Estimation 1]

  • Weeks 10, 11: Natural Language to SQL [show/hide]

  • Pre-trained Models and Table-oriented Tasks [show/hide]

    • Readings - April 20, 2023:
      • (Required) TaBERT: Pretraining for Joint Understanding of Textual and Tabular Data; ArXiv 2020.
    • Readings - April 25, 2023:
      • (Required) Deep Entity Matching with Pre-Trained Language Models; VLDB 2021.
    • Readings - April 27, 2023:
      • (Required) TCN: Table Convolutional Network for Web Table Interpretation; WWW 2021.
    • Readings - May 2, 2023:
      • (Required) TUTA: Tree-based Transformers for Generally Structured Table Pre-training; KDD 2021.
      • (Required) Annotating Columns with Pre-trained Language Models; SIGMOD 2022.
    • Readings - May 4, 2023:
      • (Required) DeepJoin: Joinable Table Discovery with Pre-trained Language Models;
    • Readings - May 9, 2023:
      • Integrating Data Lake Tables; VLDB 2022.
    • Readings - May 11, 2023:
      • (Required) Symphony: Towards Natural Language Query Answering over Multi-modal Data Lakes; CIDR 2023.
      • (Required) Can Foundation Models Wrangle Your Data?; ArXiv 2022.

  • Miscellaneous [show/hide]