According to the authors, taking away the middleman would make DPO among a few and 6 times far more effective than RLHF, and effective at far better general performance at duties which include text summarisation. Its simplicity of use is already enabling lesser companies to tackle the trouble of alignment, https://large-language-models86318.blog5.net/67656493/about-leading-machine-learning-companies