Member-only story
Google’s BLEURT is BERT for Evaluating Natural Language Generation Models
The new method uses BERT pretrained models to evaluate the quality of the output of NLG models.
Natural language generation(NLG) is one of the fastest growing areas of research in deep learning. NLG applications are everywhere around us in areas such as text summarization, question-answering, translation and many others. One of the regular challenges in the NLG space is the quality evaluation of models. Most methods todays rely on human evaluation which has obvious subjectivity and scale limitations. In a recent paper, Google Research proposed BLEURT, a transfer learning model that can achieve human quality levels in the scoring of NLG systems.
The idea of BLEURT is to address some of the limitations of human evaluation in NLG systems while helping improve NLG models. Transformer architectures like Google BERT achieved record levels in different natural language understanding(NLU) tasks. In order to do that, BERT had to built implicit knowledge about text quality. BLEURT tries to leverage these capabilities of BERT in order to develop a scoring method for NLG systems that matches human performance.