- setting 3.1: Select the best-performing pair from Setting 2.1. Initialize the core evaluator and critic as follows:
- setting 3.1.1: Self-evaluation. Use the same LLM from the selected pair as both the core evaluator and the critic. total experiments: 1 combination x ___ #datasets = ___ total runs
- setting 3.1.2: Cross-evaluation. Assign one LLM from the selected pair as the core evaluator and the other as the critic. total experiments: 1 combination x ___ #datasets = ___ total runs
- once we see the results from task 3, other settings will be defined...
total experiments: