PharmKDD: Knowledge Discovery and Data Mining for Pharmaceutical Research and Development

A KDD 2025 Workshop

Description
Pharmaceutical research and development (PRD) refers to the process of discovering and developing medicines and treatments. It is an expensive ($1-2.6 billion on average) and time consuming (10-15 years on average) process. Despite the time and monetary investments, historical data shows that the success rate of a new drug from discovery to final approval from the Food and Drug Administration is only around 10%. This fact highlights the urgent need for innovative methods to improve the efficiency and success rate of the PRD process.

There are many steps in the PRD pipeline, which includes target identification, molecule design and synthesis, pre-clinical development, human clinical trials, and post-marketing surveillance. Over the years, large volumes of data have been accumulated from these different steps, which encode evidence and insights of the PRD process. This provides an unprecedented opportunity for developing effective data mining and knowledge discovery (KDD) methods to extract insights from those data to improve the PRD process. Furthermore, advances in deep phenotyping using AI have greatly expanded the disease landscape, capturing a richer spectrum of disease attributes and patient subtypes, which in turn elevates drug discovery efforts by refining target identification and validation—effectively transforming the other side of the therapeutic equation into a more precise, data-driven realm of innovation.

There are lots of examples of recent research developing KDD methods for PRD. However, the existing research has been mostly isolated into different communities focusing on a particular intermediate step, while we cannot have any of these steps fail in order to successfully develop a drug. Therefore, there is an urgent need for a forum to bring together researchers and practitioners from both academia and industry working on different aspects of KDD for PRD, discuss the state-of-the-art research and technologies, and chart the future agenda.

Organizers
Fei Wang is currently a tenured Professor of Health Informatics in Department of Population Health Sciences at Weill Cornell Medicine (WCM), where he also holds a secondary appointment as a Professor in Department of Emergency Medicine. Dr. Wang is the Founding Director of the WCM Institute of AI for Digital Health (AIDH) and an Adjunct Scientist at Hospital for Special Surgery (HSS). His research interest is machine learning and artificial intelligence in biomedicine. Dr. Wang has published over 350 papers on the major venues of AI and biomedicine, which have received more than 35K citations to date. His H-index is 86. Dr. Wang is an elected fellow of American Medical Informatics Association (AMIA), American College of Medical Informatics (ACMI) and International Academy of Health Sciences and Informatics (IAHSI), and a distinguished member of Association for Computing Machinery (ACM).
Jian Tang is currently an associate professor at Mila - Quebec AI Institute, the leading AI Institute in Canada founded by A.M. Turing Award laureate Yoshua Bengio. He is a Canada CIFAR AI Research Chair. Dr. Tang is also the founder and CEO of BioGeometry, an AI startup focusing on protein design, with application in antibody and enzyme design in synthetic biology. His main research interests are deep generative models, graph neural networks, geometric deep learning, and their applications drug discovery. His work LINE on node representation learning has been widely recognized and has been cited more than 6,000 times. He has also done many pioneering work on AI for drug discovery including the first machine learning framework for drug discovery, TorchDrug and TorchProtein. He was the recipient of the best paper award at ICML and nominated for the best paper award at WWW. He serves as area chairs in NeurIPS, ICML, AAAI etc, and is an action editor of JMLR.
Jie Shen is a Research Fellow and Director of Digital Science at AbbVie, where he spearheads the integration of digital health technologies and advanced data analytics in clinical development. With a strong commitment to accelerating drug development through innovative technologies and analytics, including AI and ML, Dr. Shen brings a wealth of experience and a passion for transformative solutions in healthcare. Before his recent roles at AbbVie, Dr. Shen holds several roles within Eli Lilly and Company and the US FDA, driving AI applications in drug discovery and development. Dr. Shen is a key developer of Lilly’s internal deep learning tools for drug ADMET properties predictions and FDA estrogenic activity database. He has published 50+ papers with over 8,000 citations.
Ying Li is a Director of Health Economics and Outcomes Research at Regeneron Pharmaceuticals, Inc. With over 15 years of experience in medical informatics research, Dr. Li specializes in extracting, integrating, and transforming both structured and unstructured data into actionable insights to address critical healthcare challenges. After earning her PhD in Biomedical Informatics from Columbia University, Dr. Li spent five years at IBM Research as a Research Staff Member in the Center for Computational Health, where she led the development of the Watson for Patient Safety research prototype. In recent years, her work has focused on leveraging real-world data and AI techniques to meet diverse business needs across the drug development lifecycle. Dr. Li has authored 20+ peer-reviewed publications in top-tier, cross-disciplinary journals such as Nature Biotechnology, Diabetes Care, Movement Disorders, Nature Scientific Reports, JAMIA, and TKDE, as well as presented at leading conferences including AMIA and AAAI. She is also the inventor of four patents.
Benjamin Glicksberg is an Associate Professor at the Icahn School of Medicine at Mount Sinai in the Department of AI and Human Health. He holds secondary appointments in the Hasso Plattner Institute for Digital Health at Mount Sinai and the Mindich Child Health and Development Institute. Dr. Glicksberg’s research interests broadly span machine learning in health, focusing on translating multi-modal and multi-omic models into clinical practice. He has published over 185 papers in biomedical AI-related journals culminating in over 17K citations to date. Prior to this role, he served on the leadership team at Character Biosciences. As VP and Head of Data Science and Machine Learning, he led efforts analyzing patient clinical and genomic data from observational trials to create digital biomarkers of disease progression. These biomarkers, integrated with genomic data, helped identify novel drug targets soon to be entering human clinical trials.
Keynote Speakers
Xia Ning is a Professor in the Biomedical Informatics Department (BMI), the Computer Science and Engineering Department, and the College of Pharmacy, The Ohio State University. She is the Division Chief of AI in Digital Health at BMI, and the Associate Director of Biomedical Informatics at OSU Clinical and Translational Science Institute (CTSI). She received her Ph.D. in Computer Science and Engineering from the University of Minnesota, Twin Cities, in 2012. Ning’s research is on Artificial Intelligence (AI) and Machine Learning with applications in drug discovery, health care, and e-commerce. Specific applications include new molecule generation and drug candidate prioritization for drug discovery, drug repurposing for Alzheimer’s disease, cancer drug selection for precision medicine, clinical decision support systems, and EHR predictive analysis and phenotyping. Her research has received several recognitions, such as Sanofi iDEA-TECH Award, a 10-year highest impact paper award, AWS Machine Learning Research Award, OSU President Research Excellence Award, etc. Ning is a Fellow of the American Medical Informatics Association (AMIA) and a Senior Member of the National Academy of Inventors.
Rinol Alaj is a Senior Director and Head of Clinical Outcomes Assessment and Patient Innovation at Regeneron Pharmaceuticals, Inc. Rinol’s professional focus is centered around integrated Innovation, design thinking, digital endpoint, and clinical operation with 15 years of experience in startup and pharma industry. Savvy, results-oriented leader with proven success in transforming and building eCOA organizations from the ground up. .
Zhiyong Lu is a tenured Senior Investigator at the NIH/NLM, leading research in biomedical text/image processing, AI, and machine learning. In addition, Dr. Lu is Adjunct Professor of Computer Science at the University of Illinois Urbana-Champaign (UIUC) and Associate Editor for the journal JAMIA and Bioinformatics. Dr. Lu is a highly cited researcher with over 400 publications, and his AI research has been deployed in production resources like PubMed & LitCovid, serving millions each day. Dr. Lu’s research has been frequently featured in major news outlets and recognized with numerous awards, including the NIH Director’s Challenge Award, Clinical Center CEO Award, and NLM Regents Award. Dr. Lu has been elected to the American College of Medical Informatics (ACMI) and the International Academy of Health Sciences Informatics (IAHSI).
Jianying Hu is IBM Fellow and Global Science Leader, AI for Health and Director of HCLS Research at IBM. She is also an Adjunct Professor at Icahn School of Medicine at Mount Sinai. She has conducted and led extensive research in machine learning, data mining, statistical pattern recognition, and signal processing, with applications to healthcare analytics and medical informatics, business analytics, and multimedia content analysis, with recent efforts focusing on developing AI technologies for accelerated discovery of therapeutics. Dr. Hu is a fellow of the American College of Medical Informatics (ACMI), International Academy of Health Sciences Informatics (IAHSI), IEEE, and the International Association of Pattern Recognition (IAPR). She received the Asian American Engineer of the Year Award in 2013.
Bin Chen is a tenured associate professor leading a multidisciplinary lab at Michigan State University, with a mission to leverage advanced machine learning and emerging big data to discover new therapeutics. He is also the Founding Director of the Center for AI-Enabled Drug Discovery in the College of Human Medicine at Michigan State University. He was a faculty member at UCSF and pursued the postdoc training at Stanford. His current research areas include machine learning method development, integrative bioinformatics, and EHR mining. He has training in informatics, chemistry, and biology, and working experience in big pharmaceutical companies and small startups. His lab strives to pioneer transcriptomics-based drug discovery, develop foundation models to understand how individual cells respond to perturbations, and utilize massive real-world data to assess drug efficacy.
Agenda

Timeline

© Fei Wang