Evaluating Natural Language using Fine-tuned LLMs

Technical paper deep dive on natural language generation evaluations: JudgeLM