Research Topics

My research is on Responsible AI, Data Science, and Data Engineering. Here you may find my current research activities. A complete list of publications is here.

Algorithmic Fairness

Fairness of Recourse

We propose novel fairness definitions concerning algorithmic recourse. For an individual that receives an undesirable outcome (e.g., loan application rejected), recourse is a way to reverse the outcome (e.g., increase down payment). Recourse incurs a cost for the individual that can be measured quantitatively. Our definitions investigate whether subpopulations have comparable costs, i.e., bear the same burden, for recourse. We have developed a method, termed FACTS, to audit a model for fairness, i.e., find subgroups where unfairness exists.

Our approach is integrated within the IBM AIF 360 tool. See a demo notebook here.

Publication

Loukas Kavouras, Konstantinos Tsopelas, Giorgos Giannopoulos, Dimitris Sacharidis, Eleni Psaroudaki, Nikolaos Theologitis, Dimitrios Rontogiannis, Dimitris Fotakis, Ioannis Z. Emiris (2023). Fairness Aware Counterfactuals for Subgroups. NeurIPS 2023 - 37th Conference on Neural Information Processing Systems.

PDF Cite Code DOI arXiv URL Rank A*

Spatial Fairness

In many cases, it is important to ensure that a model does not discriminate against individuals on the basis of their location (place of origin, home address, etc.). We consider location as the protected attribute and we want the algorithm to exhibit spatial fairness For example, consider a model that predicts whether mortgage loan applications are accepted. Its decisions should not discriminate based on the home address of the applicant. This could be to avoid redlining, i.e., indirectly discriminating based on ethnicity/race due to strong correlations between the home address and certain ethnic/racial groups, or to avoid gentrification, e.g., when applications in a poor urban area are systematically rejected to attract wealthier people.

*Discovered areas in the USA where spatial unfairness exists for mortgage loan applications.*

This work introduces PROMIS, a post-processing optimization framework designed to reduce spatial bias while maintaining predictive performance. Building on threshold-based equal opportunity adjustments and a robust definition of spatial fairness, PROMIS formulates an optimization problem that minimizes a normalized Spatial Bias Index (SBI), which quantifies expected spatial bias across regions. Unlike heuristic correction methods, PROMIS derives globally optimal, interpretable, and computationally efficient fairness adjustments through mathematical optimization, and, unlike white-box approaches, it can be applied to any classification model.

Overview of the PROMIS framework for spatial bias mitigation. Regions are colored by bias based on true positive rates (TPRs): blue (unfavored), red (favored), and grey (fair). Left: a global threshold (0.5) produces TPR disparities across regions. Right: PROMIS applies region-specific thresholds, aligning TPRs and reducing spatial bias, as reflected by the shift to grey in the bar plots.

Publications

Dimitris Sacharidis, Giorgos Giannopoulos, George Papastefanatos, Kostas Stefanidis (2023). Auditing for Spatial Fairness. Proceedings 26th International Conference on Extending Database Technology, EDBT 2023, Ioannina, Greece, March 28-31, 2023.

PDF Cite Code Slides DOI arXiv URL Rank A

Dimitris Kyriakopoulos, Dimitris Sacharidis, Giorgos Giannopoulos, Dimitrios Gunopulos, Theodore Dalamagas (2025). PROMIS: A Post-Processing Framework for Mitigating Spatial Bias. Proceedings of the 33rd ACM International Conference on Advances in Geographic Information Systems, SIGSPATIAL 2025, The Graduate Hotel Minneapolis, Minneapolis, MN, USA, November 3-6, 2025.

PDF Cite DOI URL

Fairness in Recommender Systems

In recommender systems, fairness may concern either the consumers (end users, buyers, etc.) that receive recommendations, or the providers (producers, sellers, etc.) of the items being recommended. We have developed a common method that post-processes recommendations so as to ensure either consumer or provider fairness.

When recommendations concern a group of people, rather than an invidivual, the system must also consider fairness within the group. That means each member of the group should receive roughly the same utility from the recommendations.

Publications

Dimitris Sacharidis, Kyriakos Mouratidis, Dimitrios Kleftogiannis (2019). A Common Approach for Consumer and Provider Fairness in Recommendations. ACM Conference on Recommender Systems (RecSys), Late-Breaking Results.

PDF Cite Rank A

Dimitris Sacharidis (2019). Top-N group recommendations with fairness. ACM/SIGAPP Symposium on Applied Computing (SAC).

PDF Cite Code DOI URL

Model Explainability

Global Counterfactual Explainability

We propose a method for global explainability of black box models using counterfactual explanations. A counterfactual explanation locally explains an outcome by providing the minimal changes necessary to reverse the outcome, e.g., “if you had five more years of experience, your job application would have been accepted”. We develop a method, termed GLANCE, that summarizes all counterfactual explanations for a given model.

*Three actions that summarize all counterfactual explanations.*

Solving this global version of counterfactual explainability is different than finding the local counterfatual explanations and picking among them.

A toy example depicting two negative instances x1, x2, and five actions. (a) The feature space; the line is the decision boundary. (b) The action space; l1, l2 depict the de- cision boundary from the perspective of x1, x2, respectively.

Publication

Loukas Kavouras, Eleni Psaroudaki, Konstantinos Tsopelas, Dimitrios Rontogiannis, Nikolaos Theologitis, Dimitris Sacharidis, Giorgos Giannopoulos, Dimitrios Tomaras, Kleopatra Markou, Dimitrios Gunopulos, Dimitris Fotakis, Ioannis Z. Emiris (2026). GLANCE: Global Actions in a Nutshell for Counterfactual Explainability. The Fortieth AAAI Conference on Artificial Intelligence (AAAI-26).

PDF Cite Code Rank A*

Example-Based Explanations

For many use-cases, it is often important to explain the prediction of a black-box model by identifying the most influential training data samples. We propose AIDE, Antithetical, Intent-based, and Diverse Example-Based Explanations, an approach for providing antithetical (i.e., contrastive), intent-based, diverse explanations for opaque and complex models. AIDE distinguishes three types of explainability intents: interpreting a correct, investigating a wrong, and clarifying an ambiguous prediction. For each intent, AIDE selects an appropriate set of influential training samples that support or oppose the prediction either directly or by contrast. To provide a succinct summary, AIDE uses diversity-aware sampling to avoid redundancy and increase coverage of the training data.

*To explain an ambiguous prediction, AIDE provides influential training samples that support and oppose it.*

Publication

Ikhtiyor Nematov, Dimitris Sacharidis, Katja Hose, Tomer Sagi (2024). AIDE: Antithetical, Intent-based, and Diverse Example-Based Explanations. AIES.

PDF Cite DOI URL

Ikhtiyor Nematov, Dimitris Sacharidis, Tomer Sagi, Katja Hose (2024). The Susceptibility of Example-Based Explainability Methods to Class Outliers. CoRR.

PDF Cite DOI URL

Counterfactual Explanations for Recommendations

We develop a post-hoc, model-agnostic explanation mechanism for recommender systems. It returns counterfactual explanations defined as those minimal changes to the user’s interaction history that would result in the system not making the recommendation that is to be explained. Because counterfactuals achieve the desired output on the recommender itself, rather than a proxy, our explanation mechanism has the same fidelity as model-specific post-hoc explanations. Moreover, it is completely private, since no other information besides the user’s interaction history is required to extract counterfactuals. Finally, owing to their simplicity, counterfactuals are scrutable, as they present specific interactions from the user’s history, and potentially actionable.

*Counterfactual explanations for recommender systems.*

Publication

Vassilis Kaffes, Dimitris Sacharidis, Giorgos Giannopoulos (2021). Model-Agnostic Counterfactual Explanations of Recommendations. Proceedings of the 29th ACM Conference on User Modeling, Adaptation and Personalization, UMAP 2021, Utrecht, The Netherlands, June, 21-25, 2021.

PDF Cite Code Slides Video DOI URL

Text Mining

Entity Extraction using Structured Data

We propose THOR a novel method to extract information from text, that unlike related approaches, neither relies on complex rules nor models trained with large annotated corpus. Instead, THOR is lightweight and exploits integrated data and its schema without the need for human annotations. THOR significantly outperforms state-of-the-art Large Language Models in text conceptualization, particularly in entity recognition tasks, for data integration settings.

*An example where THOR conceptualizes text based on available structured data.*

Publication

Md Ataur Rahman, Sergi Nadal, Oscar Romero, Dimitris Sacharidis (2024). Mitigating Data Sparsity in Integrated Data through Text Conceptualization. ICDE 2024 - 40th International Conference on Extending Database Technology.

PDF Cite Code Rank A*

Record Linkage for Complex Records

We propose TokenJoin, a method for linking complex records, i.e., identifying similar pairs among a collection of complex records. A complex record is a set of simpler text entities, such as a set of addresses. To increase robustness, our approach is based on a relaxed match criterion, the fuzzy set similarity join, which calculates the similarity of two complex records based on maximum weighted bipartite matching instead of overlap.

*TokenJoin is able to match records where exact matching would fail.*

Publication

Alexandros Zeakis, Dimitrios Skoutas, Dimitris Sacharidis, Odysseas Papapetrou, Manolis Koubarakis (2022). TokenJoin: Efficient Filtering for Set Similarity Join with Maximum Weighted Bipartite Matching. Proc. VLDB Endow..

PDF Cite Code DOI Rank A*

Data Intensive Pipelines

Cost-Aware Automated Machine Learning

Selecting an effective machine learning pipeline requires searching through many alternatives that differ in algorithms, hyperparameters, and data preparation steps. This complexity has led to the development of Automated Machine Learning (AutoML), which systematically explores pipeline configurations with minimal human input. However, AutoML is computationally expensive, often evaluating hundreds or thousands of pipelines per dataset, even though only a small fraction are ultimately selected for deployment.

In an experiment using a popular AutoML with a five-hour budget per task, approximately 3,000 pipelines were evaluated based on classification accuracy and execution time. The results categorized pipelines into four groups: Baseline (low cost, low performance), Efficient (low cost, high performance), Premium (high cost, high performance), and Waste (high cost, low performance). Notably, about 14% of pipelines fell into the Waste category but consumed 60% of the total computational budget, highlighting substantial inefficiencies in the AutoML search process.

*Traditional AutoML is wasteful. About 14% of pipelines are wasteful and consume 60% of the total computational budget.*

Cost-Aware ML Pipeline Selection (CAPS) enhances AutoML by introducing an intermediate selection step between pipeline generation and evaluation, enabling explicit prioritization based on a performance–cost trade-off. While the generation phase focuses on exploring diverse and promising candidates, the selection phase filters them according to desired cost efficiency, significantly reducing wasted computation and allowing more pipelines to be explored within the same budget. CAPS achieves this by estimating both predictive performance and execution cost at a fine-grained level, modeling the runtime of individual pipeline functions and leveraging execution-environment data from prior runs. This detailed cost modeling overcomes limitations of black-box approaches and accounts for system-level optimizations that can obscure true execution costs.

*CAPS is a cost-aware AutoML system that selects pipelines for execution by balancing their predicted performance and computational cost.*

Publication

Antonios Kontaxakis, Dimitris Sacharidis, Alberto Abelló, Sergi Nadal, Alkis Simitsis (2026). CAPS: Cost-Aware ML Pipeline Selection. Proc. VLDB Endow..

PDF Cite Rank A*

Optimizing Machine Learning Pipelines

We propose HYPPO, a novel system to optimize pipelines encountered in exploratory machine learning. HYPPO exploits alternative computational paths of artifacts from past executions to derive better execution plans while reusing materialized artifacts. Adding alternative computations introduces new challenges for exploratory machine learning regarding workload representation, system architecture, and optimal execution plan generation. To this end, we present a novel workload representation based on directed hypergraphs, and we formulate the problem of discovering the optimal execution plan as a search problem over directed hypergraphs and that of selecting artifacts to materialize as an optimization problem.

*Overview of the pipeline optimization process in HYPPO.*

Publication

Antonios Kontaxakis, Dimitris Sacharidis, Alkis Simitsis, Alberto Abelló, Sergi Nadal (2024). HYPPO - Using Equivalences to Optimize Pipelines in Exploratory Machine Learning. ICDE 2024 - 40th International Conference on Extending Database Technology.

PDF Cite Code Rank A*

Summarizing Streaming Big Data

When analyzing big data, it is often necessary to work with data synopses, approximate summaries of the data that come with guarantees. We propose a novel synopsis-as-a-service paradigm and design a Synopses Data Engine as a Service (SDEaaS) system, built on top of Apache Flink, that combines the virtues of parallel processing and stream summarization towards delivering interactive analytics at extreme scale.

*Architecture of the Synopses Data Engine as a Service.*

Publication

Antonios Kontaxakis, Nikos Giatrakos, Dimitris Sacharidis, Antonios Deligiannakis (2023). And synopses for all: A synopses data engine for extreme scale analytics-as-a-service. Inf. Syst..

PDF Cite DOI URL