Documentation Index
Fetch the complete documentation index at: https://docs.aethis.ai/llms.txt
Use this file to discover all available pages before exploring further.
UK Free School Meals
The primary worked example. Three sections composing to A AND (B OR C), three source documents that cross-reference each other, 23 test cases.
Source documents:
- Education Act 1996 (s.512, s.512ZA) — child eligibility gate
- The Education (Free School Meals) (England) Regulations 2014 (Reg 3, 4, 4A, 5) — appears in all three sections
- Children and Families Act 2014 (s.105) — universal infant entitlement
Section structure:
| Section | Covers | Source documents |
|---|
A — child_eligibility | Age 4–15, state-funded school | Education Act 1996 + Free School Meals Regulations Reg 3 |
B — household_qualifying_criteria | 7 benefit routes + looked-after/care leaver | Free School Meals Regulations Reg 4 + Reg 4A |
C — universal_infant_fsm | Reception, Year 1, Year 2 — automatic, no income test | Children and Families Act 2014 + Free School Meals Regulations Reg 5 |
Live ruleset IDs (all tests passing):
aethis/uk-fsm/child-eligibility # 6/6 tests
aethis/uk-fsm/household-criteria # 11/11 tests
aethis/uk-fsm/universal-infant # 6/6 tests
Try a decision now (no API key needed):
aethis_decide({
ruleset_id: "aethis/uk-fsm/child-eligibility",
field_values: {
"child.age": 10,
"child.school_type": "state_funded"
},
include_trace: true
})
{
"decision": "eligible",
"trace": {
"age_check": "PASS — age 10 is within 4–15",
"school_type_check": "PASS — school type is state_funded"
}
}
What this example demonstrates:
- Multi-section composition with shared source documents
- OR logic across sections (B or C is sufficient)
- Automatic entitlement override (Section C has no income test)
- Integer arithmetic with threshold comparison (£7,400 UC threshold)
- Enum fields (
child.school_type, child.year_group)
- Unconditional boolean flags (
child.is_looked_after, child.is_care_leaver)
Full source, test cases, and guidance: github.com/Aethis-ai/aethis-examples
How Section A was authored
The child eligibility section is the simplest — two fields, six tests, no refinement needed. Here is the complete authoring journey.
Step 1 — Source documents. Two statutory texts were provided:
- Education Act 1996 (s.512, s.512ZA) — defines “relevant school” (maintained schools, Academies, non-maintained special schools, pupil referral units) and compulsory school age
- Free School Meals Regulations 2014 (Reg 3) — establishes entitlement for children aged 4–15 at a relevant school
Step 2 — Domain guidance. Before authoring began, two domain-level hints were added that apply to all three sections:
aethis_add_domain_guidance({
domain: "uk_fsm",
guidance_text: "Use child.* prefix for child fields, household.* for household fields.",
process_type: "field_extraction",
adherence: "exact"
})
Step 3 — Section-level guidance. Three hints for this section:
- “This section determines only whether the child is eligible based on age and school type. It does not assess household income — that is Section B.”
- “child.age should represent the child’s age in whole years at the start of the academic year (1 September).”
- “child.school_type should be an enum with values: state_funded, independent, home_educated. Only state_funded schools are within scope.”
Step 4 — Test cases. Six scenarios covering both dimensions (age range and school type):
tests:
- name: "Age 4 at state-funded school — eligible"
inputs: { child.age: 4, child.school_type: state_funded }
expect: { outcome: eligible }
- name: "Age 15 at state-funded school — eligible"
inputs: { child.age: 15, child.school_type: state_funded }
expect: { outcome: eligible }
- name: "Age 3 — too young"
inputs: { child.age: 3, child.school_type: state_funded }
expect: { outcome: not_eligible }
- name: "Age 16 — above upper limit"
inputs: { child.age: 16, child.school_type: state_funded }
expect: { outcome: not_eligible }
- name: "Age 10 at independent school — not eligible"
inputs: { child.age: 10, child.school_type: independent }
expect: { outcome: not_eligible }
- name: "Age 8, home educated — not eligible"
inputs: { child.age: 8, child.school_type: home_educated }
expect: { outcome: not_eligible }
Test strategy: boundary values (4 and 15), below boundary (3), above boundary (16), and every excluded enum value with an age that would otherwise pass.
Step 5 — Generate and test. All 6 tests passed on the first generation — no refinement loop needed. The source text was unambiguous and the guidance hints were specific enough.
Step 6 — Publish. Ruleset aethis/uk-fsm/child-eligibility published with label “v1 — child eligibility gate (age 4–15, state-funded schools)”.
Composition
The three published sections compose into a rulebook with outcome logic A AND (B OR C):
sections:
- section_id: child_eligibility
pin_mode: latest_active
- section_id: household_qualifying_criteria
pin_mode: latest_active
- section_id: universal_infant_fsm
pin_mode: latest_active
outcome_logic: "A AND (B OR C)"
Section A is a prerequisite gate — both routes (means-tested and universal infant) require it to pass. A Year 1 child passes both A and C automatically. A Year 6 child must pass both A and B.
Construction All Risks insurance
Benchmark domain. A five-level exception chain in a London market endorsement — the failure pattern used to test frontier LLMs.
Access damage is excluded (Clause 8)
→ unless project value ≥ £100M — enhanced cover reinstates it (Clause 9(1))
→ unless defect is a design defect — enhanced cover doesn't apply (Clause 9(2))
→ unless project value ≥ £500M — pioneer override reinstates it (Clause 9(3))
→ unless defect was known prior — pioneer override is blocked (Clause 9A(1))
→ unless there's an engineer assessment — the block is lifted (Clause 9A(2))
Frontier LLM accuracy on the v3.8 adversarial CAR extension (20 newly-authored scenarios, Simpson et al. v3.8 2026, Table 8c):
| Model | Accuracy (N=20) | Notes |
|---|
| Aethis Engine | 20/20 (100%) | deterministic, <5ms, same answer every time |
GPT-5.4 (reasoning_effort=low) | 20/20 (100%) | 16–126 reasoning tokens per scenario |
| Claude Sonnet 4.6 | 19/20 (95%) | fails E4 (DE3/LEG3 carveback gap) |
| GPT-5.4 (default) | 19/20 (95%) | 0 reasoning tokens on every scenario — short-circuits on E4 |
| Claude Opus 4.7 (current Anthropic strongest) | 18/20 (90%) | fails E4 + B3 (£499 M boundary) |
Three of four frontier configurations fail the same scenario across both Anthropic and OpenAI families. The Aethis engine is invariant by construction.
The shifting-ground problem (v3.8, §6.5 Finding 6). Several v3.7 paper cells closed silently between March and April 2026 under the same model alias — GPT-5.4 on construction-CAR moved from 96.6% to 100%; Opus 4.6 on spacecraft from 89.7% to 98.5%. The v3.7 11-scenario exception-chain subset that earlier examples cited has been replaced by the v3.8 adversarial extension above because current frontier configurations all hit 100% on the smaller subset. Frontier-LLM accuracy on a fixed benchmark is a moving target — exactly what regulated workflows cannot tolerate.
External validation (v3.8, §6.10). On the peer-reviewed LegalBench benchmark — 9 tasks, 949 held-out cases — the Aethis Engine is significantly more accurate than each of three frontier LLMs: combined paired-binomial McNemar’s p < 0.001 vs Sonnet 4.6, p = 0.003 vs Opus 4.7, p < 0.001 vs GPT-5.4. The structural advantage holds on randomly-sampled tasks chosen without fit inspection. Full harness: confidently-wrong-benchmark/legalbench/.
Full benchmark data, paper, and reproduction scripts: github.com/Aethis-ai/confidently-wrong-benchmark. Research paper: Confidently Wrong: Exception Chain Collapse in Frontier LLM Rule Evaluation (Simpson, Kozak, Doake, v3.8, 2026).
Spacecraft Crew Certification Act 2049
A deliberately simple public demo domain — ideal for first experiments.
aethis decide -b aethis/spacecraft-crew-certification \
-i '{"space.crew.species": "Human", "space.crew.age": 35, "space.crew.flight_hours": 600}'
aethis decide -b aethis/spacecraft-crew-certification \
-i '{"space.crew.species": "Vogon"}'
Decision: not_eligible
Trace: species_check FAIL — Vogon is disqualifying (Section 3)
One field. Decision reached. A Vogon is disqualified regardless of flight hours, so no further questions are asked.