Machine Learning on Biomedical Data: Data Science Workshop at UKC 2023

Reported by

Benjamin C. Lee, DSW Chair                         June Park, DSW Co-Chair

The 4-hour Data Science Workshop (DSW) was held at UKC in Dallas, TX in the afternoon of Saturday, August 5, 2023. The sixth DSW aimed to provide a hands-on crash-course on data science, machine learning, and deep learning for those UKC attendees with little to no prior data science experience. This year’s workshop had 29 paid registrants and 20 total attendees during the workshop.  Of the participants, half were faculty or researchers from academia, a quarter were industry professionals, another quarter were graduate and undergraduate students. While most attendees were from the United States, 4 attendees traveled from Korea.

This year’s DSW was themed “Machine Learning on Biomedical Data” which was open to all non-biomedical and biomedical backgrounds.  Also new this year was a team-based mini-project to tackle a data science problem from start to finish using real-world biomedical data.  Participants began with data cleaning, built a machine learning model, and ended with presenting their own trained models.  Instructors assisted teams of 2-3 participants on each mini task to achieve the final goal of presenting their finished projects.  Participants were required to have some programming experience, ultimately using the free Google Colab computing environment with the popular Python programming language.

The first half of the program was the “Data Analysis of Tabular Biomedical Data” session from 1:30-3:45pm.  This included an introduction to data science basics and machine learning concepts such as the definition of AI vs machine learning vs deep learning, supervised vs unsupervised learning, and accuracy vs interpretability. Participants also performed hands-on review of data handling using the Pandas package and machine learning models including logistic regression, random forests, and gradient boosting using the SciKit-Learn package on a real-world breast cancer histopathology tabular dataset.

The second half of the program was the “Machine Learning Modeling and Project Presentations” session from 4:00-6:00pm. Two new biomedical datasets were provided which included the “Heart Failure Prediction” and “Fetal Health Classification” datasets. Team members chose one dataset that provided cleaned data and trained machine learning models with code from templates. Four instructors and teaching assistants went around the room to help the team members.  At the end of this session, 3 teams presented their slides describing their model building process using the heart failure dataset and their resulting performance metric of 69% to 72% (F1 score) to predict heart failure.  At the conclusion, a bonus demonstration of deep learning models such as artificial neural networks and convolutional neural networks for image classification were presented and discussed. All participants were able to keep a copy of their code on their personal Google drives for future reference.

The instructors and teaching assistants included Benjamin Lee (Sr. Research Associate, Weill Cornell Medicine, NY), June Park (Data Engineer, Daugherty Business Solutions, TX), DK Kim (Senior Data Analytics Consultant, Zurich North America, IL), and Karl Kwon (Engineering Lead, MITRE, NJ).  This year’s team-based Data Science Workshop with a focus on biomedical data was a success, and we hope to reach more KSEA members who wish to gain hands-on practical experience in data science in various fields with future events.

What DSW Participants Had to Say:

  • Though brief, the Data Science Workshop was an invaluable experience. I extend my gratitude to the organizers and instructors!” – Hakjoo Kim (Ph.D. Student, Texas A&M University)
  • I would highly recommend the workshop for researchers trying to learn and start using data science for their research. DSW was a well-organized workshop for beginners. Short project at the end was especially helpful and I felt I should be able to tackle more complicated projects in the future. Also, other online resources (YouTube, forums, etc.) should now make more sense to me when I try to learn something more specific for my research need.” – Hyun Jin Kim (Assistant Professor, University of Alabama)

Leave a comment