Worked examples

UK Free School Meals

The primary worked example. Three sections composing to A AND (B OR C), three source documents that cross-reference each other, 23 test cases. Source documents:

Education Act 1996 (s.512, s.512ZA) — child eligibility gate
The Education (Free School Meals) (England) Regulations 2014 (Reg 3, 4, 4A, 5) — appears in all three sections
Children and Families Act 2014 (s.105) — universal infant entitlement

Section structure:

Section	Covers	Source documents
A — `child_eligibility`	Age 4–15, state-funded school	Education Act 1996 + Free School Meals Regulations Reg 3
B — `household_qualifying_criteria`	7 benefit routes + looked-after/care leaver	Free School Meals Regulations Reg 4 + Reg 4A
C — `universal_infant_fsm`	Reception, Year 1, Year 2 — automatic, no income test	Children and Families Act 2014 + Free School Meals Regulations Reg 5

Live composed rulebook: aethis/uk-fsm — combines the three sections under one outcome_logic. Authenticated decide (rulebook_id: aethis/uk-fsm). Live public section ruleset (anonymous decide OK): aethis/uk-fsm/child-eligibility — the Section A gate, queryable by itself. Sections B (household_criteria) and C (universal_infant) live as rulesets inside the aethis/uk-fsm rulebook (Phase B.2.2 of the converged 2-term model) rather than as standalone published slugs. Hit them via rulebook_id: aethis/uk-fsm and the rulebook’s outcome_logic combines them with Section A. Try a decision now (no API key needed):

CLI
Python SDK
curl
MCP

aethis decide \
  -b aethis/uk-fsm/child-eligibility \
  -i '{"child.age": 10, "child.school_type": "state_funded"}' \
  --explain

from aethis_sdk import Aethis

with Aethis() as client:
    response = client.decide(
        ruleset_id="aethis/uk-fsm/child-eligibility",
        field_values={"child.age": 10, "child.school_type": "state_funded"},
        include_trace=True,
    )
    print(response.decision)

curl -X POST https://api.aethis.ai/api/v1/public/decide \
  -H "Content-Type: application/json" \
  -d '{
    "ruleset_id": "aethis/uk-fsm/child-eligibility",
    "field_values": { "child.age": 10, "child.school_type": "state_funded" },
    "include_trace": true
  }'

Ask your coding agent in natural language:

“Use Aethis to check whether a 10-year-old at a state-funded school qualifies for free school meals under aethis/uk-fsm/child-eligibility. Include the trace.”

Your agent invokes aethis_decide for you.

{
  "decision": "eligible",
  "trace": {
    "status": "eligible",
    "path": "school_type_state_funded",
    "answered": ["child.age", "child.school_type"],
    "group_statuses": {
      "school_type_check": "satisfied",
      "age_check": "satisfied",
      "age_upper_check": "satisfied"
    }
  }
}

What this example demonstrates:

Multi-section composition with shared source documents
OR logic across sections (B or C is sufficient)
Automatic entitlement override (Section C has no income test)
Integer arithmetic with threshold comparison (£7,400 UC threshold)
Enum fields (child.school_type, child.year_group)
Unconditional boolean flags (child.is_looked_after, child.is_care_leaver)

Full source, test cases, and guidance: github.com/Aethis-ai/aethis-examples

How Section A was authored

The child eligibility section is the simplest — two fields, six tests, no refinement needed. Here is the complete authoring journey. Step 1 — Source documents. Two statutory texts were provided:

Education Act 1996 (s.512, s.512ZA) — defines “relevant school” (maintained schools, Academies, non-maintained special schools, pupil referral units) and compulsory school age
Free School Meals Regulations 2014 (Reg 3) — establishes entitlement for children aged 4–15 at a relevant school

Step 2 — Domain guidance. Before authoring began, two domain-level hints were added that apply to all three sections:

aethis_add_domain_guidance({
  domain: "uk_fsm",
  guidance_text: "Use child.* prefix for child fields, household.* for household fields.",
  process_type: "field_extraction",
  adherence: "exact"
})

Step 3 — Section-level guidance. Three hints for this section:

“This section determines only whether the child is eligible based on age and school type. It does not assess household income — that is Section B.”
“child.age should represent the child’s age in whole years at the start of the academic year (1 September).”
“child.school_type should be an enum with values: state_funded, independent, home_educated. Only state_funded schools are within scope.”

Step 4 — Test cases. Six scenarios covering both dimensions (age range and school type):

tests:
  - name: "Age 4 at state-funded school — eligible"
    inputs: { child.age: 4, child.school_type: state_funded }
    expect: { outcome: eligible }

  - name: "Age 15 at state-funded school — eligible"
    inputs: { child.age: 15, child.school_type: state_funded }
    expect: { outcome: eligible }

  - name: "Age 3 — too young"
    inputs: { child.age: 3, child.school_type: state_funded }
    expect: { outcome: not_eligible }

  - name: "Age 16 — above upper limit"
    inputs: { child.age: 16, child.school_type: state_funded }
    expect: { outcome: not_eligible }

  - name: "Age 10 at independent school — not eligible"
    inputs: { child.age: 10, child.school_type: independent }
    expect: { outcome: not_eligible }

  - name: "Age 8, home educated — not eligible"
    inputs: { child.age: 8, child.school_type: home_educated }
    expect: { outcome: not_eligible }

Test strategy: boundary values (4 and 15), below boundary (3), above boundary (16), and every excluded enum value with an age that would otherwise pass. Step 5 — Generate and test. All 6 tests passed on the first generation — no refinement loop needed. The source text was unambiguous and the guidance hints were specific enough. Step 6 — Publish. Ruleset aethis/uk-fsm/child-eligibility published with label “v1 — child eligibility gate (age 4–15, state-funded schools)”.

Composition

The three published sections compose into a rulebook with outcome logic A AND (B OR C):

sections:
  - section_id: child_eligibility
    pin_mode: latest_active
  - section_id: household_qualifying_criteria
    pin_mode: latest_active
  - section_id: universal_infant_fsm
    pin_mode: latest_active

outcome_logic: "A AND (B OR C)"

Section A is a prerequisite gate — both routes (means-tested and universal infant) require it to pass. A Year 1 child passes both A and C automatically. A Year 6 child must pass both A and B.

Construction All Risks insurance

Benchmark domain. A five-level exception chain in a London market endorsement — the failure pattern used to test frontier LLMs.

Access damage is excluded (Clause 8)
  → unless project value ≥ £100M — enhanced cover reinstates it (Clause 9(1))
  → unless defect is a design defect — enhanced cover doesn't apply (Clause 9(2))
  → unless project value ≥ £500M — pioneer override reinstates it (Clause 9(3))
  → unless defect was known prior — pioneer override is blocked (Clause 9A(1))
  → unless there's an engineer assessment — the block is lifted (Clause 9A(2))

Frontier LLM accuracy on the v3.8 adversarial CAR extension (20 newly-authored scenarios, Simpson et al. v3.8 2026, Table 8c):

Model	Accuracy (N=20)	Notes
Aethis Engine	20/20 (100%)	deterministic, <5ms, same answer every time
GPT-5.4 (`reasoning_effort=low`)	20/20 (100%)	16–126 reasoning tokens per scenario
Claude Sonnet 4.6	19/20 (95%)	fails E4 (DE3/LEG3 carveback gap)
GPT-5.4 (default)	19/20 (95%)	0 reasoning tokens on every scenario — short-circuits on E4
Claude Opus 4.7 (current Anthropic strongest)	18/20 (90%)	fails E4 + B3 (£499 M boundary)

Three of four frontier configurations fail the same scenario across both Anthropic and OpenAI families. The Aethis engine is invariant by construction. The shifting-ground problem (v3.8, §6.5 Finding 6). Several v3.7 paper cells closed silently between March and April 2026 under the same model alias — GPT-5.4 on construction-CAR moved from 96.6% to 100%; Opus 4.6 on spacecraft from 89.7% to 98.5%. The v3.7 11-scenario exception-chain subset that earlier examples cited has been replaced by the v3.8 adversarial extension above because current frontier configurations all hit 100% on the smaller subset. Frontier-LLM accuracy on a fixed benchmark is a moving target — exactly what regulated workflows cannot tolerate. External validation (v3.8, §6.10). On the peer-reviewed LegalBench benchmark — 9 tasks, 949 held-out cases — the Aethis Engine is significantly more accurate than each of three frontier LLMs: combined paired-binomial McNemar’s p < 0.001 vs Sonnet 4.6, p = 0.003 vs Opus 4.7, p < 0.001 vs GPT-5.4. The structural advantage holds on randomly-sampled tasks chosen without fit inspection. Full harness: confidently-wrong-benchmark/legalbench/. Full benchmark data, paper, and reproduction scripts: github.com/Aethis-ai/confidently-wrong-benchmark. Research paper: Confidently Wrong: Exception Chain Collapse in Frontier LLM Rule Evaluation (Simpson, Kozak, Doake, v3.8, 2026).

Spacecraft Crew Certification Act 2049

A deliberately simple public demo domain — 11 fields across 7 rule groups, ideal for first experiments. Two demonstrations: a one-field short-circuit, and a fully-specified happy path.

One field, decision reached (Vogon)

A Vogon is disqualifying under §3(1) regardless of any other answer, so the engine short-circuits the moment space.crew.species: "Vogon" is provided:

CLI
Python SDK
curl
MCP

aethis decide -b aethis/spacecraft-crew-certification \
  -i '{"space.crew.species": "Vogon"}' \
  --explain

from aethis_sdk import Aethis

with Aethis() as client:
    response = client.decide(
        ruleset_id="aethis/spacecraft-crew-certification",
        field_values={"space.crew.species": "Vogon"},
        include_trace=True,
    )
    print(response.decision)
    print(response.trace["failure_reasons"])

curl -X POST https://api.aethis.ai/api/v1/public/decide \
  -H "Content-Type: application/json" \
  -d '{
    "ruleset_id": "aethis/spacecraft-crew-certification",
    "field_values": {"space.crew.species": "Vogon"},
    "include_trace": true
  }'

“Use Aethis to check whether a Vogon is eligible under aethis/spacecraft-crew-certification, with trace.”

{
  "decision": "not_eligible",
  "fields_provided": 1,
  "fields_evaluated": 11,
  "trace": {
    "status": "ineligible",
    "failure_reasons": [
      ["species_not_vogon", [
        { "type": "answer", "field": "space.crew.species", "value": "vogon" },
        { "type": "condition",
          "expression": "Not(spacecraft-crew-certification:v1:space.crew.species == Vogon)" }
      ]]
    ],
    "group_statuses": {
      "species_eligibility": "not_satisfied",
      "flight_readiness": "pending",
      "medical_certification": "pending",
      "medical_cert_validity": "pending",
      "radiation_certification": "pending",
      "propulsion_compliance": "pending",
      "towel_compliance": "pending"
    }
  }
}

fields_provided: 1, fields_evaluated: 11 — the engine reasoned across all 11 fields and discharged the case as soon as the species check failed. The other groups stay pending because no further questions need answering.

All fields, happy path (Human + Improbability Drive)

To pass every gate, the applicant needs a valid licence, medical, radiation cert, towel, and a vessel running on something more exciting than conventional propulsion (§7(2) — see aethis/spacecraft-crew-certification/explain):

CLI
Python SDK
curl
MCP

aethis decide -b aethis/spacecraft-crew-certification -i '{
  "space.crew.species": "Human",
  "space.crew.age": 35,
  "space.crew.flight_hours": 600,
  "space.crew.has_pilot_license": true,
  "space.crew.has_gaa_exam": true,
  "space.crew.has_approved_provider_cert": true,
  "space.medical.cert_valid": true,
  "space.mission.type": "orbital",
  "space.crew.has_radiation_cert": true,
  "space.vessel.propulsion_type": "Infinite Improbability Drive",
  "space.crew.has_towel": true
}'

response = client.decide(
    ruleset_id="aethis/spacecraft-crew-certification",
    field_values={
        "space.crew.species": "Human",
        "space.crew.age": 35,
        "space.crew.flight_hours": 600,
        "space.crew.has_pilot_license": True,
        "space.crew.has_gaa_exam": True,
        "space.crew.has_approved_provider_cert": True,
        "space.medical.cert_valid": True,
        "space.mission.type": "orbital",
        "space.crew.has_radiation_cert": True,
        "space.vessel.propulsion_type": "Infinite Improbability Drive",
        "space.crew.has_towel": True,
    },
)
print(response.decision)

curl -X POST https://api.aethis.ai/api/v1/public/decide \
  -H "Content-Type: application/json" \
  -d '{
    "ruleset_id": "aethis/spacecraft-crew-certification",
    "field_values": {
      "space.crew.species": "Human", "space.crew.age": 35,
      "space.crew.flight_hours": 600, "space.crew.has_pilot_license": true,
      "space.crew.has_gaa_exam": true, "space.crew.has_approved_provider_cert": true,
      "space.medical.cert_valid": true, "space.mission.type": "orbital",
      "space.crew.has_radiation_cert": true,
      "space.vessel.propulsion_type": "Infinite Improbability Drive",
      "space.crew.has_towel": true
    }
  }'

“Use Aethis to check whether a 35-year-old human with 600 flight hours, all valid certs, an orbital mission, an Infinite Improbability Drive vessel, and a towel is eligible under aethis/spacecraft-crew-certification.”

decision: eligible

Swap "Infinite Improbability Drive" for "Conventional" and the decision flips to not_eligible with propulsion_compliance: not_satisfied — the engine names the exact group that failed.

​UK Free School Meals

​How Section A was authored

​Composition

​Construction All Risks insurance

​Spacecraft Crew Certification Act 2049

​One field, decision reached (Vogon)

​All fields, happy path (Human + Improbability Drive)