Skip to main content
Authoring is invite-only private beta — request access. Generation uses an LLM to compile your sources into constraint logic; you supply an Anthropic API key per request and it is never stored.

The TDD loop

Rule generation is test-driven. The engine generates rules from your source text. Your test suite validates the output. Failing tests become the basis for targeted guidance — feedback that points to the specific source clause the engine missed. The loop repeats until all tests pass.
aethis_generate_and_test → tests failing → aethis_refine (with guidance) → repeat → aethis_publish
Rules are compiled from your source text and guidance — not reverse-engineered from your tests. Tests validate the output. Better tests catch more edge cases; better guidance converges faster.

Step 1 — Create a ruleset

If you completed Phase 2 (field vocabulary), you already have confirmed field names and a project_id — write your test cases using those names and skip to Step 2. If starting fresh on a single-section domain, you need test cases before creating the ruleset. Two approaches:
  1. Inspect an existing ruleset in the same domain with aethis_schema to learn the field naming convention, then write tests using those names.
  2. Make your best guess at field names, create the ruleset with preliminary tests, then run aethis_discover_fields to confirm. If names don’t match, add corrected tests via aethis_create_ruleset with a new project — the cost is low.
Either way, the field names in your test cases must exactly match what the engine discovers. Phase 2 exists to prevent this mismatch.
aethis_create_ruleset({
  name: "Child eligibility gate",
  section_id: "child_eligibility",
  domain: "uk_fsm",
  source_text: "...",  // text of the source documents for this section
  test_cases: [
    {
      name: "Age 3 — too young, not eligible",
      field_values: { "child.age": 3, "child.school_type": "state_funded" },
      expected_outcome: "not_eligible"
    },
    {
      name: "Age 4 — minimum age, eligible",
      field_values: { "child.age": 4, "child.school_type": "state_funded" },
      expected_outcome: "eligible"
    },
    {
      name: "Age 15 — maximum age, state school, eligible",
      field_values: { "child.age": 15, "child.school_type": "state_funded" },
      expected_outcome: "eligible"
    },
    {
      name: "Age 16 — over maximum age, not eligible",
      field_values: { "child.age": 16, "child.school_type": "state_funded" },
      expected_outcome: "not_eligible"
    },
    {
      name: "Age 10, independent school — not eligible",
      field_values: { "child.age": 10, "child.school_type": "independent" },
      expected_outcome: "not_eligible"
    },
    {
      name: "Age 10, state school — eligible",
      field_values: { "child.age": 10, "child.school_type": "state_funded" },
      expected_outcome: "eligible"
    }
  ]
})
Returns:
{ "project_id": "proj_8CzLVwyx53rTGEJv" }
Write test cases after running aethis_discover_fields. Test field names must exactly match the engine’s discovered field names. A mismatched name causes the engine to treat that field as absent — the test may silently pass or fail for the wrong reason.

Step 2 — Generate and test

aethis_generate_and_test({ project_id: "proj_8CzLVwyx53rTGEJv" })
Takes 60–120 seconds for most sections. Returns (first attempt — partial failure):
{
  "ruleset_id": "aethis/uk-fsm/child-eligibility",
  "tests_passing": 4,
  "tests_total": 6,
  "failures": [
    {
      "name": "Age 4 — minimum age, eligible",
      "expected": "eligible",
      "got": "not_eligible",
      "hint": "The lower age bound may be using strict > rather than ≥ 4"
    },
    {
      "name": "Age 10, independent school — not eligible",
      "expected": "not_eligible",
      "got": "eligible",
      "hint": "The school type restriction may not be captured. Check Regulation 3(2)(b)."
    }
  ]
}
Two failures: the boundary condition at age 4 is wrong (strict > instead of ), and the school type restriction isn’t compiled. Each failure includes a hint pointing to the likely cause.

Step 3 — Refine with guidance

Add guidance that references the specific source clause:
aethis_refine({
  project_id: "proj_8CzLVwyx53rTGEJv",
  feedback: "FSM Regulations 2014 Regulation 3(1): eligibility applies to children aged 4 to 15 inclusive. The lower bound is ≥ 4, not > 4. Regulation 3(2)(b): eligibility is restricted to children attending state-funded schools. The child.school_type field must be checked — only 'state_funded' qualifies. Independent and home-educated do not."
})
aethis_refine adds the guidance, then makes the minimal edit to fix the failing tests — seeded from the section’s active ruleset and keeping the passing tests green, rather than re-authoring the whole section. Returns (second attempt — all passing):
{
  "ruleset_id": "aethis/uk-fsm/child-eligibility",
  "tests_passing": 6,
  "tests_total": 6,
  "failures": []
}
All tests pass after one refinement. Move to publish.

Guidance variants

Add guidance without regenerating

Useful when you want to accumulate several pieces of guidance before triggering a generation run:
aethis_add_guidance({
  project_id: "proj_8CzLVwyx53rTGEJv",
  guidance_text: "Regulation 3(2)(b): school_type must be state_funded. Independent and home_educated do not qualify.",
  process_type: "rule_generation"
})
Then trigger generation separately:
aethis_generate_and_test({ project_id: "proj_8CzLVwyx53rTGEJv" })

Check accumulated guidance

aethis_list_guidance({ project_id: "proj_8CzLVwyx53rTGEJv" })
Returns:
{
  "guidance": [
    {
      "id": "hint_001",
      "process_type": "rule_generation",
      "guidance_text": "Regulation 3(2)(b): school_type must be state_funded. Independent and home_educated do not qualify.",
      "created_at": "2026-04-16T10:22:00Z"
    }
  ]
}

Domain-level guidance

Add once, applies to all projects in the domain — no need to repeat cross-section principles on each ruleset:
aethis_add_domain_guidance({
  domain: "uk_fsm",
  guidance_text: "Discretionary clauses (where the authority 'may' act) must produce 'undetermined', not 'not_eligible'. The system flags for human review — it never exercises discretion on behalf of the decision-maker.",
  process_type: "rule_generation"
})

Guidance examples

Real guidance from the UK Free School Meals household criteria section (11 tests, 8 qualifying routes). Each hint addresses a specific compilation gap.
Problem: The engine treats all qualifying criteria as AND conditions — a household must meet all routes to qualify, when it should be any one.Guidance:
This section uses OR logic across multiple qualifying routes. A household
qualifies if it meets ANY ONE of the following: Universal Credit with net
earnings at or below £7,400/year; Income Support; income-based JSA;
income-related ESA; Child Tax Credit only (no Working Tax Credit) with
income at or below £16,190/year; NASS support; the child is looked-after;
or the child is a care leaver.
Problem: The Universal Credit route passes when receives_universal_credit is true, ignoring the income cap.Guidance:
The Universal Credit route requires TWO conditions simultaneously:
household.receives_universal_credit must be true AND
household.annual_net_earnings must be less than or equal to 7400
(pounds sterling, annual, after tax and NI). These are AND conditions
within the UC route, which is then OR'd with other routes.
Problem: Looked-after children are being checked against benefit criteria when they should bypass them entirely.Guidance:
Looked-after children (child.is_looked_after) and care leavers
(child.is_care_leaver) qualify automatically with no income or benefit
requirement. These are unconditional boolean fields — if either is true,
Section B passes regardless of all other fields.
Problem: The Child Tax Credit route qualifies anyone receiving CTC, missing the requirement that they must not also receive Working Tax Credit.Guidance:
The Child Tax Credit route (household.receives_child_tax_credit_only)
requires that the household receives CTC but does NOT receive Working
Tax Credit. Use a single boolean field that encodes "CTC without WTC".
Pattern: Every effective hint references a specific regulatory clause, names the fields involved, and states the logical relationship (AND, OR, NOT, unconditional). Vague feedback (“fix the UC check”) doesn’t converge.

Diagnosing a specific failure

If a test is failing but the hint isn’t clear enough, use aethis_explain_failure to get a deeper diagnosis:
aethis_explain_failure({
  ruleset_id: "aethis/uk-fsm/child-eligibility",
  field_values: { "child.age": 10, "child.school_type": "independent" },
  expected_outcome: "not_eligible",
  test_name: "Age 10, independent school — not eligible"
})
Returns:
{
  "test_name": "Age 10, independent school — not eligible",
  "expected": "not_eligible",
  "got": "eligible",
  "failing_criterion": "school_type_check",
  "compiled_form": "child.age IN [4..15]",
  "diagnosis": "The compiled rule checks age only. The school_type constraint is missing — Regulation 3(2)(b) restricts eligibility to state-funded schools but is not yet reflected in the compiled rules.",
  "suggested_guidance": "Add guidance referencing Regulation 3(2)(b): eligibility requires child.school_type = state_funded."
}

Test coverage strategy

Good coverage catches failures before production:
  • Boundary values — test just below, at, and above every threshold (age 3, 4, 15, 16)
  • Every Enum value — one test per enum value for each Enum field
  • Every code path — combinations of fields that exercise a distinct branch
  • Five tests minimum per section — more is better; sparse test suites let edge-case bugs through

Step 4 — Publish

Once all tests pass:
aethis_publish({ project_id: "proj_8CzLVwyx53rTGEJv" })
Returns:
{
  "ruleset_id": "aethis/uk-fsm/child-eligibility",
  "name": "Child Eligibility",
  "version": "v1",
  "tests_passing": 6,
  "tests_total": 6
}
The name is auto-derived from the section ID (e.g. child_eligibility"Child Eligibility"). To override it, pass name in the publish call:
aethis_publish({ project_id: "proj_8CzLVwyx53rTGEJv", name: "FSM Child Gate" })
The ruleset_id is now ready for aethis_decide. Published rulesets are locked and versioned.
To update rules (e.g. for a legislative change), generate a new ruleset in the same project and publish again. The previous ruleset remains available by its specific version ID.

Sources: budget, duplicates, and lifecycle

Generation builds its context from every active source on the project, so source hygiene directly affects rule quality. Token budget. Every upload response includes estimated_tokens per file, a project_estimated_tokens running total, and the generation_token_budget it must fit (the model’s context window minus headroom reserved for the generation loop). Before any generation starts, the engine counts the exact prompt — if it exceeds the budget, the request is rejected with 422 token_budget_exceeded (including the count, the budget, and the largest sources) before any model cost is incurred. Duplicates. Re-uploading identical content (or a same-named file) is flagged in the response under possible_duplicates — surfaced, never silently merged. Conflicting versions of the same guidance are a correctness hazard: resolve them by superseding the stale copy. Lifecycle. GET /projects/{project_id}/sources lists every source with its status; PATCH /projects/{project_id}/sources/{source_id} sets it:
  • active — included in generation (the default)
  • superseded — replaced by a newer upload (optionally point at it with superseded_by); kept for provenance, excluded from generation
  • reference_only — kept on the project, excluded from generation
DELETE removes a source outright; prefer superseded when generated rulesets already cite it.

Grounding report

Every generated rule cites the source passages it is grounded in. After generation, the job (and the generate-and-test response) carries a provenance_report:
  • totals — how many citations resolved against the uploaded sources (verified), cited passages that don’t exist (flagged), and rules or fields with no citation (uncited)
  • coverage — per source: how many passages are cited by at least one rule, plus a sample of passages nothing cites
The coverage list is the review signal golden tests can’t give you: a statutory exception that no rule cites — and no test exercises — stays green in testing but shows up here. The report never blocks generation; treat it as the reviewer’s punch list.

Handling generation timeouts

Rule generation takes 5–15 minutes for complex sections. If the client times out, the server continues generating. Do not re-trigger generation — it creates a duplicate run.
  1. Wait 10–15 minutes
  2. Call aethis_list_rulesets({ project_id: "proj_8CzLVwyx53rTGEJv" }) to check if a new ruleset appeared
  3. If a ruleset is present, run your test suite and publish
  4. If not, wait and check again

After all sections are published

If your domain has multiple sections, compose them into a rulebook — a container that references published rulesets and defines outcome logic.

Create a rulebook

Via the REST API:
curl -X POST https://api.aethis.ai/api/v1/public/rulebooks/ \
  -H "Content-Type: application/json" \
  -H "x-api-key: ak_live_..." \
  -d '{
    "name": "UK Free School Meals Eligibility",
    "domain": "uk_fsm",
    "ruleset_refs": [
      { "section_id": "child_eligibility", "pin_mode": "latest_active" },
      { "section_id": "household_qualifying_criteria", "pin_mode": "latest_active" },
      { "section_id": "universal_infant_fsm", "pin_mode": "latest_active" }
    ],
    "outcome_logic": { "expr": "A AND (B OR C)" }
  }'

Ruleset references

Each ruleset_ref has:
FieldTypeRequiredDescription
section_idstringyesSection identifier matching the published ruleset
ruleset_idstringnoPin to a specific ruleset version. Omit to use pin_mode.
pin_modestringno"pinned" (specific version, default) or "latest_active" (auto-updates when new version published)

Outcome logic

Defines how section outcomes compose. Each section is assigned a letter (A, B, C…) in the order listed in ruleset_refs. Supported operators: AND, OR, NOT, parentheses for grouping.
ExampleMeaning
A AND BBoth sections must be eligible
A AND (B OR C)A required; either B or C sufficient
A AND B AND NOT CA and B required; C must not be eligible

Activate and evaluate

# Activate (validates all ruleset references exist)
curl -X POST .../rulebooks/{rulebook_id}/activate -H "x-api-key: ak_live_..."

# Evaluate against the composed rulebook (requires an API key with `decide` scope —
# anonymous /decide only accepts public `ruleset_id`/slug, not `rulebook_id`)
curl -X POST .../decide \
  -H "x-api-key: $AETHIS_API_KEY" \
  -d '{ "rulebook_id": "{rulebook_id}", "field_values": { ... } }'

# Get combined schema from all sections
curl .../rulebooks/{rulebook_id}/schema

# Get combined explanations
curl .../rulebooks/{rulebook_id}/explain
The /decide endpoint accepts either a ruleset_id (single section) or a rulebook_id (composed multi-section). When evaluating against a rulebook, the response includes section_results showing each section’s individual decision. See the UK Free School Meals worked example for a complete multi-section rulebook with live rulesets.