You may have noticed in the previous lesson that a you ran a unit test used that validated the interaction with the LLM.
Unit tests are not only a convenient way to test individual elements of an application. They also provide an automated way to test the response.
The actual response, however, can be more challenging to validate.
LLM Evaluation
The test uses an evaluation chain to determine whether the response from the LLM has answered the original question. The test is a take on a technique called LLM as Judge, where the LLM is used to judge the quality of the response.
Before running any tests, the beforeAll()
function creates an instance of the following chain.
evalChain = RunnableSequence.from([
PromptTemplate.fromTemplate(`
Does the following response answer the question provided?
Question: {question}
Response: {response}
Respond simply with "yes" or "no".
`),
llm,
new StringOutputParser(),
]);
Each test in the suite uses this chain to evaluate the answer provided in the test.
const evaluation = await evalChain.invoke({ question, response });
expect(`${evaluation.toLowerCase()} - ${response}`).toContain("yes");
The test uses a concatenation of the output of the evaluation chain and the original response, so if the evaluation does not return a yes, the response is appended to the test output to help debug the issue.
Prompt Iteration
As you develop your application and test different models, you may need to modify the prompt iteratively until the model returns a consistent response. Running the tests in watch mode will allow you to receive instant feedback.
This repository includes a npm run test:watch
^ command.
npm run test:watch
Check your understanding
Purpose of LLM Evaluation
What is the purpose of using a unit test in the context of evaluating LLM responses?
-
❏ To manually review the responses for accuracy.
-
✓ To automate the testing of individual elements of an application and its interaction with the LLM.
-
❏ To replace the need for an evaluation chain in determining the quality of LLM responses.
-
❏ To solely test the computational efficiency of the LLM.
Hint
The correct answer focuses on automating tests for application components, especially their interaction with an LLM, to ensure they work as expected.
Solution
You can use unit tests to automate the testing of individual elements of an application and its interaction with the LLM.
Summary
In this lesson, you learned how to run automated tests that evaluate the response generated by your chains.
In the next module, you will learn how to use conversation history to rephrase questions and provide more accurate responses.