Skip to content
Sections
Back to stories
Research/

OTPO: Refining LLM Alignment through Optimal Transport Policy Optimization

Researchers have introduced Optimal Transport Policy Optimization (OTPO), a method that replaces uniform weighting in preference learning with dynamic, smart weights. By prioritizing the most informative data points during the alignment phase, OTPO improves model performance and stability compared to traditional Direct Preference Optimization.