Available Test Containers¶

ASQI provides several pre-built test containers for different testing scenarios. All containers are available on Docker Hub and can be pulled using the installation commands.

Container Overview¶

Mock Tester (asqiengineer/test-container:mock_tester-latest): Basic test container for development and validation
Mock RAG Tester (asqiengineer/test-container:mock_rag_tester-latest): Mock test container for RAG systems development and validation
Garak Security Tester (asqiengineer/test-container:garak-latest): LLM security vulnerability assessment with 40+ attack vectors
Inspect Evals (asqiengineer/test-container:inspect_evals-latest): Comprehensive evaluation suite with 80+ tasks across cybersecurity, mathematics, reasoning, knowledge, bias, and safety domains
Chatbot Simulator (asqiengineer/test-container:chatbot_simulator-latest): Persona-based conversational testing with multi-turn dialogue
TrustLLM (asqiengineer/test-container:trustllm-latest): Comprehensive trustworthiness evaluation framework
DeepTeam (asqiengineer/test-container:deepteam-latest): Red teaming library for adversarial robustness testing
LLMPerf (asqiengineer/test-container:llmperf-latest): Token throughput and latency benchmark using Ray LLMPerf against OpenAI-compatible LLM APIs
Resaro Judge (asqiengineer/test-container:resaro_judge-latest): Judge generated answers against gold answers with optional LLM-as-judge for accuracy evaluation

All containers are available on Docker Hub and can be pulled using the commands shown in the installation section.

Test Container Examples¶

Note: Certain tests include volume mounting to save detailed logs. You might need to configure the volume output mount accordingly.

ASQI provides ready-to-use example configurations for each test container. Download and run these examples to get started quickly:

Mock Tester Example¶

Basic test container for development and validation:

# Download and run the basic demo
curl -O https://raw.githubusercontent.com/asqi-engineer/asqi-engineer/main/config/suites/demo_test.yaml
asqi execute-tests -t demo_test.yaml -s demo_systems.yaml -o results.json

Garak Security Testing Example¶

LLM security vulnerability assessment with multiple attack probes:

# Download security test configuration
curl -O https://raw.githubusercontent.com/asqi-engineer/asqi-engineer/main/config/suites/garak_test.yaml

# Run security tests (includes encoding attacks and prompt injection)
asqi execute-tests -t garak_test.yaml -s demo_systems.yaml -o security_results.json

Note: Certain tests requires a OPENAI_API_KEY so it is recommended to pass it in via the env_file field as part of the system config.

Inspect Evals Example¶

Comprehensive evaluation suite with 80+ academic benchmarks:

# Download inspect evals configuration (cybersecurity focus)
curl -O https://raw.githubusercontent.com/asqi-engineer/asqi-engineer/main/config/suites/inspect_evals/cybersecurity.yaml

# Run cybersecurity evaluation
asqi execute-tests -t cybersecurity.yaml -s demo_systems.yaml -o inspect_results.json

# Or run knowledge evaluation
curl -O https://raw.githubusercontent.com/asqi-engineer/asqi-engineer/main/config/suites/inspect_evals/knowledge.yaml
asqi execute-tests -t knowledge.yaml -s demo_systems.yaml -o knowledge_results.json

Chatbot Simulator Example¶

Persona-based conversational testing with multi-turn dialogue:

# Download chatbot simulation configuration  
curl -O https://raw.githubusercontent.com/asqi-engineer/asqi-engineer/main/config/suites/chatbot_simulator_test.yaml

# Run conversational tests 
asqi execute-tests -t chatbot_simulator_test.yaml -s demo_systems.yaml -o chatbot_results.json

TrustLLM Example¶

Comprehensive trustworthiness evaluation across multiple dimensions:

# Download trustworthiness evaluation configuration
curl -O https://raw.githubusercontent.com/asqi-engineer/asqi-engineer/main/config/suites/trustllm_test.yaml

# Run trustworthiness evaluation
asqi execute-tests -t trustllm_test.yaml -s demo_systems.yaml -o trustllm_results.json

DeepTeam Red Teaming Example¶

Advanced adversarial robustness testing:

# Download red teaming configuration
curl -O https://raw.githubusercontent.com/asqi-engineer/asqi-engineer/main/config/suites/deepteam_test.yaml

# Run red teaming tests
asqi execute-tests -t deepteam_test.yaml -s demo_systems.yaml -o redteam_results.json

LLMPerf Benchmark Example¶

Token throughput and latency benchmark for LLM performance testing:

# Download LLMPerf benchmark configuration
curl -O https://raw.githubusercontent.com/asqi-engineer/asqi-engineer/main/config/suites/llmperf_test.yaml

# Run performance benchmark
asqi execute-tests -t llmperf_test.yaml -s demo_systems.yaml -o llmperf_results.json

Resaro Judge Example¶

Judge generated answers against gold answers for accuracy evaluation:

# Resaro Judge requires custom dataset configuration
# Create a test configuration file with question-answer pairs or summaries
# See test_containers/resaro_judge/manifest.yaml for input schema details

# Run with facts test type (question-answer evaluation)
asqi execute-tests -t your_resaro_config.yaml -s demo_systems.yaml -o judge_results.json

Evaluating Score Cards¶

Score cards provide automated assessment of test results against business-relevant criteria. ASQI engineer includes a flexible grading engine that evaluates individual test executions and provides structured feedback.

How Score Cards Work¶

Score cards consist of indicators that evaluate specific metrics from test results:

Apply to specific tests: Target individual test ids from your test suite
Extract metrics: Pull any field from test container JSON output
Assessment criteria: Define pass/fail thresholds with business-friendly outcomes
Individual evaluation: Each test execution is assessed separately (no aggregation)

Basic Score Card Example¶

Using the simple example score card for mock tester results:

# First run a test to generate results
curl -O https://raw.githubusercontent.com/asqi-engineer/asqi-engineer/main/config/suites/demo_test.yaml
asqi execute-tests -t demo_test.yaml -s demo_systems.yaml -o test_results.json

# Download and apply basic score card
curl -O https://raw.githubusercontent.com/asqi-engineer/asqi-engineer/main/config/score_cards/example_score_card.yaml
asqi evaluate-score-cards --input-file test_results.json -r example_score_card.yaml -o results_with_grades.json

# Or run end-to-end (tests + score card evaluation)
asqi execute -t demo_test.yaml -s demo_systems.yaml -r example_score_card.yaml -o complete_results.json

For detailed information about specific test containers, see LLM Test Containers.