Create Evaluations from Templates

For developers, the SDK provides two powerful ways to create evaluations programmatically using templates:

Option 1: Using Template ID

1

Copy the Template ID from the Web

Navigate to the “Templates” Tab on the Workspace, select the template you want to use, and copy the Template ID.

2

Initialize the SDK

Initialize with your API key. For details, refer to Get API Key

client = podonos.init("<API_KEY>")
3

Create an evaluation using the template

Use the Template ID to create a new evaluation.

evaluator = client.create_evaluator_from_template(
    name="Evaluate the english voice", 
    desc="model_1_and_model_2", 
    num_eval=10, 
    template_id="HTGRqU"  # Template ID from the web
)
4

Add Files to the Evaluation

Add files with relevant metadata to the evaluator.

for i in range(60):
    file = File(path=f"path/speech_{i}.mp3", tags=["man", "noisy"],
                model_tag="Model 1")
    evaluator.add_file(file)
5

Finalize the evaluation

Close the evaluator to finalize the setup.

evaluator.close()

Option 2: Using Template JSON

You can create evaluations by defining the template structure either directly in your code using a JSON dictionary or by loading it from a JSON file.

Intro

A template consists of two main sections:

  1. Questions Section: Contains the core evaluation questions

    • SCORED: Rating scale questions (e.g., 1-5 Likert scale)
    • NON_SCORED: Multiple/Single choice questions
    • COMPARISON: Comparison questions (for double evaluations)
  2. Instructions Section (Optional): Contains guidance messages to the evaluators.

    • These messages help ensure that evaluators understand how to conduct the evaluation properly. Providing clear and appropriate guidance can significantly enhance the quality of the evaluation results.
    • Use one of the following types: DO, WARNING, DONT

Types

{
    "type": "SCORED",
    "question": "Please evaluate the overall quality of the audio",
    "description": "You can ignore the wrong pronunciation of the audio",
    "options": [
        {"label_text": "Excellent"},
        {"label_text": "Good"},
        {
            "label_text": "Fair", 
            "reference_file": "path/to/reference/file" # Helps the evaluator to select the quality of the audio
        }, 
        {"label_text": "Poor"},
        {"label_text": "Bad"},
    ]
}

For options in SCORED questions:

  • score is automatically generated only for SCORED questions. If there are 5 options, the first option in the list receives a score of 5, the second option receives a score of 4, and so on, down to a score of 1 for the last option.
  • order is the index of the option in the list, starting from 0.
  • For more detailed explanations, please refer to the reference documentation.

Example

Here’s a step-by-step guide to create an evaluation using the template JSON:

1

Prepare Template

Define your template as a Python dictionary or save it as a JSON file:

# Single evaluation template example
TEMPLATE = {
    "instructions": [
        {
            "type": "DO",
            "instruction": "Use the provided reference file to assess the quality of the audio.",
            "reference_file": "path/to/reference/file" # Helps the evaluator to select the quality of the audio
        },
        {
            "type": "WARNING",
            "instruction": "Please evaluate only the audio's natural pronunciation",
            "description": "Just focus on the audio's naturalness, the background noise or audio quality is not considered for the evaluation",
        },
    ],
    "questions": [
        {
            "type": "SCORED",
            "question": "Please evaluate the overall quality of the audio",
            "description": "You can ignore the wrong pronunciation of the audio",
            "options": [
                {"label_text": "Excellent"},
                {"label_text": "Good"},
                {"label_text": "Fair", "reference_file": "path/to/reference/file"},
                {"label_text": "Poor"},
                {"label_text": "Bad"},
            ]
        },
        {
            "type": "NON_SCORED",
            "question": "Select all the bad audio characteristics",
            "description": "Select only the audio characteristics that are distracting or undesirable.",
            "options": [
                {"label_text": "Background Noise"},
                {"label_text": "Echo", "reference_file": "path/to/reference/file"},
                {"label_text": "Distortion"}
            ],
            "allow_multiple": true,
            "has_other": true
        }
    ]
}

When using a JSON file:

  • Save the file with .json extension
  • Ensure proper JSON formatting
  • Use UTF-8 encoding
2

Create Evaluation

For single and double evaluations (evaluating one or two files at a time):

# Single evaluation template example
evaluator = client.create_evaluator_from_template_json(
    json=SINGLE_TEMPLATE,
    name="Single Audio Quality Test",
    desc="model_a_and_model_b",
    num_eval=10,
    custom_type="SINGLE" # Specify SINGLE type
)

# Add single files
for i in range(3):
    file = File(path=f"./audio/speech_{i}.mp3", 
                model_tag="Model A",
                tags=["clean", "male"])
    evaluator.add_file(file)

For single evaluations:

  • Use add_file() method for adding individual files
  • Each file is evaluated independently
  • Avoid using COMPARISON type questions

For double evaluations:

  • Use add_files() method to add pairs of files
  • Files are always evaluated in pairs
  • COMPARISON type questions are supported
  • Consider using clear model_tag for distinguishing each file
3

Finalize the Evaluation

Close the evaluator to finalize the setup:

evaluator.close()
Tip: Using the SDK is ideal for integrating evaluations into automated workflows.