PharmKDD: Knowledge Discovery and Data Mining for Pharmaceutical Research and Development
A KDD 2025 Workshop
DescriptionPharmaceutical research and development (PRD) refers to the process of discovering and developing medicines and treatments. It is an expensive ($1-2.6 billion on average) and time consuming (10-15 years on average) process. Despite the time and monetary investments, historical data shows that the success rate of a new drug from discovery to final approval from the Food and Drug Administration is only around 10%. This fact highlights the urgent need for innovative methods to improve the efficiency and success rate of the PRD process.
There are many steps in the PRD pipeline, which includes target identification, molecule design and synthesis, pre-clinical development, human clinical trials, and post-marketing surveillance. Over the years, large volumes of data have been accumulated from these different steps, which encode evidence and insights of the PRD process. This provides an unprecedented opportunity for developing effective data mining and knowledge discovery (KDD) methods to extract insights from those data to improve the PRD process. Furthermore, advances in deep phenotyping using AI have greatly expanded the disease landscape, capturing a richer spectrum of disease attributes and patient subtypes, which in turn elevates drug discovery efforts by refining target identification and validation—effectively transforming the other side of the therapeutic equation into a more precise, data-driven realm of innovation.
There are lots of examples of recent research developing KDD methods for PRD. However, the existing research has been mostly isolated into different communities focusing on a particular intermediate step, while we cannot have any of these steps fail in order to successfully develop a drug. Therefore, there is an urgent need for a forum to bring together researchers and practitioners from both academia and industry working on different aspects of KDD for PRD, discuss the state-of-the-art research and technologies, and chart the future agenda.
Organizers
![]() |
Fei Wang is currently a tenured Professor of Health Informatics in Department of Population Health Sciences at Weill Cornell Medicine (WCM), where he also holds a secondary appointment as a Professor in Department of Emergency Medicine. Dr. Wang is the Founding Director of the WCM Institute of AI for Digital Health (AIDH) and an Adjunct Scientist at Hospital for Special Surgery (HSS). His research interest is machine learning and artificial intelligence in biomedicine. Dr. Wang has published over 350 papers on the major venues of AI and biomedicine, which have received more than 35K citations to date. His H-index is 86. Dr. Wang is an elected fellow of American Medical Informatics Association (AMIA), American College of Medical Informatics (ACMI) and International Academy of Health Sciences and Informatics (IAHSI), and a distinguished member of Association for Computing Machinery (ACM). |
---|---|
![]() |
Jian Tang is currently an associate professor at Mila - Quebec AI Institute, the leading AI Institute in Canada founded by A.M. Turing Award laureate Yoshua Bengio. He is a Canada CIFAR AI Research Chair. Dr. Tang is also the founder and CEO of BioGeometry, an AI startup focusing on protein design, with application in antibody and enzyme design in synthetic biology. His main research interests are deep generative models, graph neural networks, geometric deep learning, and their applications drug discovery. His work LINE on node representation learning has been widely recognized and has been cited more than 6,000 times. He has also done many pioneering work on AI for drug discovery including the first machine learning framework for drug discovery, TorchDrug and TorchProtein. He was the recipient of the best paper award at ICML and nominated for the best paper award at WWW. He serves as area chairs in NeurIPS, ICML, AAAI etc, and is an action editor of JMLR. |
![]() |
Jie Shen is a Research Fellow and Director of Digital Science at AbbVie, where he spearheads the integration of digital health technologies and advanced data analytics in clinical development. With a strong commitment to accelerating drug development through innovative technologies and analytics, including AI and ML, Dr. Shen brings a wealth of experience and a passion for transformative solutions in healthcare. Before his recent roles at AbbVie, Dr. Shen holds several roles within Eli Lilly and Company and the US FDA, driving AI applications in drug discovery and development. Dr. Shen is a key developer of Lilly’s internal deep learning tools for drug ADMET properties predictions and FDA estrogenic activity database. He has published 50+ papers with over 8,000 citations. |
![]() |
Ying Li is a Director of Health Economics and Outcomes Research at Regeneron Pharmaceuticals, Inc. With over 15 years of experience in medical informatics research, Dr. Li specializes in extracting, integrating, and transforming both structured and unstructured data into actionable insights to address critical healthcare challenges. After earning her PhD in Biomedical Informatics from Columbia University, Dr. Li spent five years at IBM Research as a Research Staff Member in the Center for Computational Health, where she led the development of the Watson for Patient Safety research prototype. In recent years, her work has focused on leveraging real-world data and AI techniques to meet diverse business needs across the drug development lifecycle. Dr. Li has authored 20+ peer-reviewed publications in top-tier, cross-disciplinary journals such as Nature Biotechnology, Diabetes Care, Movement Disorders, Nature Scientific Reports, JAMIA, and TKDE, as well as presented at leading conferences including AMIA and AAAI. She is also the inventor of four patents. |
![]() |
Benjamin Glicksberg is an Associate Professor at the Icahn School of Medicine at Mount Sinai in the Department of AI and Human Health. He holds secondary appointments in the Hasso Plattner Institute for Digital Health at Mount Sinai and the Mindich Child Health and Development Institute. Dr. Glicksberg’s research interests broadly span machine learning in health, focusing on translating multi-modal and multi-omic models into clinical practice. He has published over 185 papers in biomedical AI-related journals culminating in over 17K citations to date. Prior to this role, he served on the leadership team at Character Biosciences. As VP and Head of Data Science and Machine Learning, he led efforts analyzing patient clinical and genomic data from observational trials to create digital biomarkers of disease progression. These biomarkers, integrated with genomic data, helped identify novel drug targets soon to be entering human clinical trials. |
- 8-8:05am: Introduction
- 8:05-9am: Keynote Talk
- 9-10am: Invited Talks
- 10-10:15am: Break
- 10:15-11am: Panel Discussion
- 11am-12pm: Paper Presentations
Timeline
- Paper Submission: May 8th, 2025.
- Paper Notification: June 8th, 2025.
© Fei Wang