Self-RAG | Pikachu

type

status

date

slug

summary

SELF_RAG: Learning to Retrieve, Generate, and Critique through Self-Reflection

@cs.washington & ibm

Brief Intro

LLM 经常会因为过于依赖自身的参数而犯一些事实上的错误。RAG（Retrieval- Augmented Generation）以及用相关知识增强LLM能够减少这些错误。然而无差别的返还相关文档可能会导致生成一些没用的结果。

因此引入 Self-Reflective Retrieval-Augmented Generation (Self-RAG)

This framework trains a single arbitrary LM that adaptively retrieves passages on-demand, and generates and reflects on retrieved passages and its own generations using special tokens - reflection tokens

Reflection tokens 被分成 retrieval 和 critique tokens 两部分

Self-RAG首先根据需求输出retrieval token，然后处理这些retrieve的文档，生成critique tokens来评判自身的输出并选择最好的。

generator - LM（trained）

critic model

框架描述

模型的输入输出可以用下表表示

其中底部三行是 token种类，加粗部分是最想要的part。结合输入输出后，Self-RAG算法可以描述如下：

这个框架建议结合Figure1一起理解

Self-RAG Training

Data Collection for Critic model

GPT-4 可以有效生成一些feed back，但这样导致的API价格太贵，同时减少了可再现性

本文prompt GPT-4来生成reflection tokens然后提取其中知识到

以为例子：

给GPT-4 “Given an instruction, make a judgement on whether finding some external documents from the web helps to generate a better response” + few-shot demonstrations + original task input + output 来预测一个合适的reflection token:

人工后期查看发现GPT-4的意见和人工意见呈高度一致

还有两个Critic Model类似

Critic Learning

收集training data 后，初始化initial model，使用最大似然估计训练reflection tokens。尽管initial model可以是任何模型，但本文使用和后续generator一样的模型。本文最后训练出的critic模型和GPT-4的prediction 90% 相似

Training The Generator Model

Data Collection For Generator

在一个input-output pair （X，Y）中，对于每个片段，首先根据Retrieve判断是否需要，如果Retrieve = True，那么就返回Top K passages，。然后进一步判断文档是否相关→IsRel，如果相关，进一步判断文档是否能够支持model generation→IsSup。

最后Critic Model再综合判断overall utility→IsUse

根据上述结果最后生成的与输出X组合成

Generator Learning

使用数据进行模型训练，同时使用standard next token objective进行训练。

和Critic训练不同，Generator的训练学习预测目标输出和reflection tokens。在训练中还会标出retrieved text chunks （用<p></p>）来计算损失，同时用一组reflection tokens{Critique, Retrieve} 扩张 original vocabulary V

Self-RAG Inference

在模型产生输出的过程中生成reflection tokens来自我评估有助于帮助提升模型的处理多样任务的能力。

Adaptive Retrieval With Threshold

模型能够自己判断是否需要Retrieve额外的文档，同时本文的framework还允许设置Threshold。具体而言如果Probability of Generating Retrieve=Yes 超过了一个预设的阈值，就开始retrieval

Tree-Decoding With Critique Tokens

对每个segment引入这样一个计算法则，里面的权重参数可调整，来选择我们想要的

总结

这个Self-RAG Framework挺新颖的work，与传统的RAG不同点在于它引入自我评估部分。

与CRAG有点像，但不同点在于还引入了判断是否需要retrieve的环节

以及在一些细节上还是和CRAG有所不同的