CSE 5449: Intermediate Studies in Scientific Data Management

Course Number: CSE 5449

Class Number: 36664 and 36665

Class title: Intermediate Studies in Scientific Data Management

Instructor: Dr. Suren Byna

Credits: 2

Class Time: TR 11:15am - 12:10pm

Classroom: BE 180

Office Hours: Immediately after the class
By appointment -- Planning on being in my office (Dreese Labs, Room # 791) most of the semester except when I’m on travel

Course Description:

The objective of this course is to understand the principles and the practice of data management in science, HPC, AI, and cloud. We will study life cycles of scientific data in various scientific applications, advances in storage and I/O technologies, file systems, and the data management demands of current and emerging applications on different platforms (HPC systems and cloud computing environments). This course will cover various data management software stacks aimed for HPC (HDF5, h5py, MPI-IO, netCDF, POSIX, parallel file systems such as Lustre, GPFS, BeeGFS, etc.) and for cloud (S3, Ceph, etc.). The course will also take a closer look at fundamental problems that are impacting data management in science, gaps in existing technologies, challenges that are facing scientists, and explore potential next generation solutions aiming these challenges. This course will discuss state-of-the-art technologies from the USA's Exascale Computing Project (ECP) and future research directions in managing scientific data efficiently.

Topics to be Covered (Tentative)

Reading materials:

Reference Books (NOT required; will discuss some topics in the class)

Reading list

Selected papers from the literature including papers focusing on past and on-going research activities in my group at LBNL.
(Dr. Byna's home page).

Please refer to the following reading list and presentation schedule. (In development; more to be added)

Lectures:

Grading Plan:

There are four components:
  
- Attendance, participation in class discussion,   (15%)
  and evaluation of class presentations
- Class presentation                               (20%)                      
- Final exam (Time : TBD)      			   (25%)
  (comprehensive, open book) 
- Class project                                    (40%)

Class Presentation:

Each student will make one class presentation. Please refer to the reading list (see the list above) and presentation schedule (TBD) and select one date (with the corresponding set of papers) for presentation. If you are working on a given project, it will be preferable to choose a different set of papers for the presentations. Similarly, you can select papers from two different topics. This will allow you to learn different things in-depth.

The presentation schedule will be filled-up based on the order in which I receive the preferences. To avoid conflicts with others, you can give me preferences for 3-4 dates. The class presentation will be evaluated based on preparation (knowledge assimilation), presentation style (organization, smoothness, and clarity), finishing the presentation in time, and answering questions during the discussion.

We will be using an evaluation form to evaluate every student presentation. Each student needs to fill out this form for every student speaker and return it (hardcopy only) to me by the following class. I will give them back to the student after removing the evaluator's information. I will use this information to evaluate the speaker's presentation as well as the evaluators' skill to evaluate a presentation.

You must consult me when preparing your slides and finalizing them. This is to ensure that the presentations are compact and provide a smooth flow. Please discuss about your presentation plans with me two weeks before the presentation date and the slides with me one week before the presentation date.

Class Project:

Due to the research-oriented nature of this course, the project will play an important part in the learning experience and in the grading process. Projects will be evaluated based on their technical quality, originality, depth of analysis, and completeness.

Projects will be mostly done in an individual manner. Maximum two people may be in a group if the scope of the project is big and there is sufficient understanding between the members that both will contribute equally to the success of the project (members will get the same grade in project). The project will be research-oriented. Depending on the nature of the topic, it may consist of one or more of the following components: theory, design, analysis, simulation, or experimental results.

You are free to consult with me while defining the project and working on it during the quarter. I will provide a list of possible projects during the second week of classes. Those who have been already involved in scientific data management research in earlier and current semester can carry out their earlier projects after consulting me. New students can meet with me to discuss their research interests and we will define project topics.

I will be meeting with each of you frequently during the semester to discuss the progress of your project.

Project Schedule:

  
End of 3rd/4th week     - discussion with the instructor to select/focus topic
End of 4th/5th week     - proposal (around two pages) due
End of 5th week         - feedback from instructor and finalizing the topic
Project Report due      - final exam week 


Last Updated: March 14th, 2023