The 3rd Workshop on Data Science with Human in the Loop @ KDD 2021

Program (August 15, 2021. All times are PDT.)

7:45 – 8:00: Workshop introduction

Lucian Popa, IBM Research

8:00 – 9:00: Keynote talk 1 (Sunita Sarawagi, IIT Bombay)

Title: Elevating the role of the human in human-in-the-loop learning

Abstract: The evolution of deep learning has eliminated the need for human ingenuity and domain expertise that went into designing informative features and pipelines around light statistical models. Challenging tasks like image recognition, speech recognition, and translation can now be learned via generic end-to-end models where you feed raw input and get back the prediction. The primary role of humans in this learning loop is providing labeled examples. Anyone who has engaged in actually labeling examples would certify to the mind-numbing tedium of this task. While techniques like active learning attempt to reduce the number of labeled examples, we ask if we can elevate the role of humans that is commensurate with their capability of higher-level abstraction. We present multiple paradigms of high-level human supervision including top-down rules with quality guides, and bottom-up rules with exemplars. We discuss algorithms for learning deep models from such noisy yet efficient modes of supervision.

Session Chair: Yunyao Li, IBM Research

9:00 – 10:00: Paper Presentation (5 papers)

Session Chair: Slobodan Vucetic, Temple University

10:00 – 10:10: Break

10:10 – 11:00: Invited papers (2 paper highlights from recent conferences focused on Computer Human Interaction)

Session Chair: Slobodan Vucetic, Temple University

11:00 – 13:00: Lunch Break

13:00 – 14:00: Keynote Talk 2 (Jiawei Han, University of Illinois at Urbana-Champaign)

Title: On the Power of Human Guidance at Turning Unstructured Text to Structured Knowledge (slides)

Abstract: The real-world big data are largely dynamic, interconnected and unstructured text. It is highly desirable to transform such massive unstructured data into structured knowledge. Many researchers rely on labor-intensive labeling and curation to extract knowledge from such data. Such approaches, however, are not scalable. We vision that massive text data itself may disclose a large body of hidden structures and knowledge. Equipped with pretrained language models and text embedding methods, it is promising to transform unstructured data into structured knowledge. On the other hand, human guidance may still play a critical role in this process. In this talk, we study how minor human guidance may play a big role at discriminative topic mining, taxonomy construction, text classification, and taxonomy-guided text analysis. We show that data-driven approach plus minimal human guidance can be promising at transforming massive text data into structured knowledge.

Session Chair: Eduard Dragut, Temple University

14:00 – 14:50: Invited Talks

Eser Kandogan, Megagon Labs

Title: Human(s)-in-the-Loop(s): Observations from the Data Science Practice

Abstract: Over the last several years at Megagon Labs we conducted several data science projects ranging from exploratory to production work. Examining from the human-computer interaction perspective we observed that in the data science practice human-in-the-loop is very much present, in fact I would argue that there are many loops and many humans with different kinds of roles and input into the practice, impacting how machine learning solutions are developed and deployed in practical settings. In this talk I will present some of the patterns we observed in the data science practice and also how human(s)-in-the-loop(s) impacted projects that leveraged traditional machine learning algorithms as well as advanced neural network architectures.

Robert (Munro) Monarch

Title: Open Problems in Human-in-the-Loop Machine Learning

Abstract: This talk will feature excerpts from my recently published book "Human-in-the-Loop Machine Learning: Active learning and annotation for human-centered AI". I'll cover some of the most exciting problems in Human-in-the-Loop Machine Learning and promising recent advances that address some of these problems. The talk will start with one of the most basic and long-standing questions in machine learning: what are the different ways that we can interpret uncertainty in our models? The talk will then discuss recent advances in transfer learning, including active transfer learning for adaptive sampling and the implications of intermediate task transfer learning on the choice of annotation task and annotation workforce(s). Finally, I will talk about advances in annotation quality control and annotation interfaces, including ways to identify annotators with rare but valid subjective interpretations and human-computer interaction strategies for combining machine learning predictions with human annotations.

Session Chair: Eduard Dragut, Temple University

14:50 – 15:00: Break

15:00 – 16:00 Panel: Open challenges in human-computer cooperation in data science

Session Chair: Yunyao Li, IBM Research

16:00 – 16:15pm Open up to the audience and wrap up the workshop