Evaluation (beta)
The Evaluation module is an integral part of the Chat Engine of the AutoFlow, designed to assess the performance and reliability of the Chat Engine’s outputs.
Currently, the module provides evaluations based on two key metrics:
-
Factual Correctness: This metric measures the degree to which the generated responses align with verified facts. It ensures that the Chat Engine delivers accurate and trustworthy information.
-
Semantic Similarity: This metric evaluates the closeness in meaning between the generated responses and the expected outputs. It helps gauge the contextual relevance and coherence of the Chat Engine’s performance.
With these metrics, the Evaluation component empowers developers and users to analyze and optimize the Chat Engine’s capabilities effectively.
Prerequisites
- An admin account to access the Evaluation panel.
- (Optional) A CSV dataset with at least two columns:
query
: i.e. question.reference
: i.e. expected answer.
How to Evaluate
To evaluate the Chat Engine, follow these steps:
-
Create an evaluation dataset:
-
Click on the Evaluation in the left panel, and then click the Datasets button.
-
Click on the New Evaluation Dataset button.
-
Type in the dataset name, and if you have a CSV file with the required columns, you can upload it to initial the evaluation dataset.
-
Click on the Create button.
-
-
Create an evaluation task:
-
Click on the Evaluation in the left panel, and then click the Tasks button.
-
Click on the New Evaluation Task button.
-
Type in the task name, select the evaluation dataset, select the evaluation targeting Chat Engine, and type in the run size.
Note:
The Run Size is a parameter that can cut your dataset into smaller amount to evaluation task.
- For example, your dataset has 1000 rows, and you set the run size to 100, then the evaluation task will only evaluate the first 100 rows.
- Run size cannot change the evaluation dataset, it only changes the amount of data to evaluation task.
-
Click on the Create button.
-
-
Waiting for the evaluation task to finish, and you can see the evaluation result in the task detail.
-
Click on the Evaluation in the left panel, and then click the Tasks button.
-
Click on the Name of the task you want to see the result.
-
Make your insight from the evaluation result.
-