Research/May 17, 2026

OTPO: Refining LLM Alignment through Optimal Transport Policy Optimization

Researchers have introduced Optimal Transport Policy Optimization (OTPO), a method that replaces uniform weighting in preference learning with dynamic, smart weights. By prioritizing the most informative data points during the alignment phase, OTPO improves model performance and stability compared to traditional Direct Preference Optimization.

ORIGINAL SOURCE

OTPO: Refining LLM Alignment through Optimal Transport Policy Optimization

View at Towards Data Science