My professional experience includes impactful internships at
Glassdoor and
Expedia
as a Machine Learning Scientist, and a year-long experience as a Data Scientist at
MAPFRE Economics.
I have also contributed to two European research projects,
InteGrid and
RAYUELA.
Additionally, I am proud to have published my work at
ICML (2024 and 2025),
one of the top AI conferences worldwide.
Research
My research focuses on machine learning and deep learning, particularly in representation learning, generative AI, graph analytics, and their practical applications.
I aim to develop innovative solutions to real-world challenges by leveraging cutting-edge technology, bridging the gap between theory and practice.
Discovering Global False Negatives On the Fly for Self-supervised Contrastive Learning
Vicente Balmaseda, Bokun Wang, Ching-Long Lin, Tianbao Yang
ICML 2025 — Forty-second International Conference on Machine Learning, February 2025
code /
poster /
arXiv /
venue /
In self-supervised contrastive learning, negative samples may inadvertently share semantics with the anchor, leading to false negatives that degrade representation quality. We introduce GloFND, a scalable optimization-based method that discovers false negatives during training. It identifies them globally across the dataset while keeping computation confined to each mini-batch and independent of dataset size. GloFND integrates seamlessly with unimodal and bimodal contrastive frameworks, consistently improving representation quality with minimal overhead.
Discriminative Finetuning of Generative Large Language Models without Reward Models and Preference Data
Siqi Guo, Ilgee Hong, Vicente Balmaseda, Changlong Yu, Liang Qiu, Xin Liu, Haoming Jiang, Tuo Zhao, Tianbao Yang
ICML 2025 — Forty-second International Conference on Machine Learning, February 2025
code /
poster /
arXiv /
venue /
We introduce Discriminative Fine-Tuning (DFT), an approach for fine-tuning LLMs without preference data or reward models. Unlike Supervised Fine-Tuning (SFT), DFT adopts a discriminative paradigm to efficiently optimize the likelihood of an answer among all possible outputs given an input.
Combinatorial Approximations for Cluster Deletion: Simpler, Faster, and Better
Vicente Balmaseda, Ying Xu, Yixin Cao, Nate Veldt
ICML 2024 — Forty-first International Conference on Machine Learning, April 2024
paper /
code /
poster /
arXiv /
venue /
We provide improved deterministic approximation algorithms and guarantees for Cluster Deletion, and the first combinatorial algorithm for the Strong Triadic Closure (STC) relaxation, enabling us to handle graphs with millions of nodes and edges on a standard 16GB laptop.
Predicting systemic risk in financial systems using Deep Graph Learning
Vicente Balmaseda, María Coronado, Gonzalo de Cadenas-Santiago
Intelligent Systems with Applications, June 2023
paper /
code /
We propose Graph Neural Networks (GNNs) for systemic risk analysis in financial networks, leveraging network structure to overcome ML limitations. Additionally, we introduce C2R approach to reduce pre-labeling efforts for costly continuous systemic risk metrics, enabling modeling these metrics (e.g., quantiles) using data labeled into just a few classes.
Other Projects
These include coursework and side projects.
TransformerKart: A Full SuperTuxKart AI Controller