Human Factors in Data-Intensive Applications for Health

My work focus on the integration of machine and human intelligence to design healthcare data-intensive systems more scalable, efficient, effective, and sustainable. This include the investigation on how to integrate different kinds of data (from social media, sensors, lab studies, clinical trials, interviews, etc..), how to embed technology in a usually human-driven process, and how to provide trustworthy human-machine interactions in data-intensive applications. I combine and integrate different HCI and data management areas including crowdsourcing, gamification, user-generated content analysis on one side, and algorithms for data quality, integration and querying on the other.

Human factors in Data Collection and Pre-Processing

Data collection is the first and the most crucial step in any data-intensive application. It is the foundation of any data analysis pipeline, both the ones based on machine learning technology and the traditional data querying and integration approaches. Both academia and industry are aware that the quality of the collected data is a crucial factor affecting the outcome of any analysis performed on the data.

In this line I study metrics and algorithms to evaluate the quality of data during the data collection process. These metrics can be either automatically computed at data collection time, generated through manual labeling performed by humans or hybrid. The assumption here is that humans are knowledgeable about the data because they are either the domain expert, the owner of the collected data, or - in general - able to provide additional insight on the quality of data that complement automatic algorithms.

Human Factors in Data Processing

The issue of the quality of the data is a crucial problem of any data processing pipeline, from the traditional query-based ones to the ones employing machine-learning algorithms. However, in absence of a ground truth or domain knowledge, it may be impossible to know how to correctly fix inconsistencies in the data.

In this line I’m interested into:

  • the development of a human-in-the-loop algorithm to repair inconsistencies in the data and enrich the information already present - both by engaging and directing the users and exploiting user-generated content present on other sources (e.g., Web and Social Media).
  •  the development of hybrid data querying methodologies to allow the users to decide on different query plans according to different cost models that integrate the cost of the query with the information on the quality of the data.

Human Factors in Data Analysis

Data analysis tools are often built around maximizing their performances and the number of functionalities often developed with in mind end-users having data analytics and programming skills. However, domain experts often lack programming skills or are not proficient with query languages and data analysis API. So it may be difficult for them to translate their information needs into a format understood by existing data analysis tools.

Here I’interested in investigating how to allow users to be at the center of the data analysis process by developing better algorithms, methods, and tools to allow the user to manipulate the data.

  • We develop frameworks to allow the users - e.g, domain experts - to develop their own query language or data analysis API to analyze their own data. Domain experts are often not proficient or familiar with query/programming languages or data analysis ecosystems (such as Spark, etc..). By directly involving the end-user in the definition of the languages 1) we can make sure that the query and analysis language we create satisfy the requirements of the domain experts and 2) they have the opportunity to familiarize with the tools very early in their development stage.
  • We develop innovative user modeling techniques to understand additional aspects of the user asking the query and novel ways to allow the user to interact (e.g., explore) with the data. Here we will explore novel interaction paradigms based on adaptive and interactive data visualization and conversational agents.