Skip to main content

Documentation Index

Fetch the complete documentation index at: https://docs.aethis.ai/llms.txt

Use this file to discover all available pages before exploring further.

UK Free School Meals

The primary worked example. Three sections composing to A AND (B OR C), three source documents that cross-reference each other, 23 test cases. Source documents:
  • Education Act 1996 (s.512, s.512ZA) — child eligibility gate
  • The Education (Free School Meals) (England) Regulations 2014 (Reg 3, 4, 4A, 5) — appears in all three sections
  • Children and Families Act 2014 (s.105) — universal infant entitlement
Section structure:
SectionCoversSource documents
A — child_eligibilityAge 4–15, state-funded schoolEducation Act 1996 + Free School Meals Regulations Reg 3
B — household_qualifying_criteria7 benefit routes + looked-after/care leaverFree School Meals Regulations Reg 4 + Reg 4A
C — universal_infant_fsmReception, Year 1, Year 2 — automatic, no income testChildren and Families Act 2014 + Free School Meals Regulations Reg 5
Live ruleset IDs (all tests passing):
aethis/uk-fsm/child-eligibility          # 6/6 tests
aethis/uk-fsm/household-criteria  # 11/11 tests
aethis/uk-fsm/universal-infant        # 6/6 tests
Try a decision now (no API key needed):
aethis_decide({
  ruleset_id: "aethis/uk-fsm/child-eligibility",
  field_values: {
    "child.age": 10,
    "child.school_type": "state_funded"
  },
  include_trace: true
})
{
  "decision": "eligible",
  "trace": {
    "age_check": "PASS — age 10 is within 4–15",
    "school_type_check": "PASS — school type is state_funded"
  }
}
What this example demonstrates:
  • Multi-section composition with shared source documents
  • OR logic across sections (B or C is sufficient)
  • Automatic entitlement override (Section C has no income test)
  • Integer arithmetic with threshold comparison (£7,400 UC threshold)
  • Enum fields (child.school_type, child.year_group)
  • Unconditional boolean flags (child.is_looked_after, child.is_care_leaver)
Full source, test cases, and guidance: github.com/Aethis-ai/aethis-examples

How Section A was authored

The child eligibility section is the simplest — two fields, six tests, no refinement needed. Here is the complete authoring journey. Step 1 — Source documents. Two statutory texts were provided:
  • Education Act 1996 (s.512, s.512ZA) — defines “relevant school” (maintained schools, Academies, non-maintained special schools, pupil referral units) and compulsory school age
  • Free School Meals Regulations 2014 (Reg 3) — establishes entitlement for children aged 4–15 at a relevant school
Step 2 — Domain guidance. Before authoring began, two domain-level hints were added that apply to all three sections:
aethis_add_domain_guidance({
  domain: "uk_fsm",
  guidance_text: "Use child.* prefix for child fields, household.* for household fields.",
  process_type: "field_extraction",
  adherence: "exact"
})
Step 3 — Section-level guidance. Three hints for this section:
  1. “This section determines only whether the child is eligible based on age and school type. It does not assess household income — that is Section B.”
  2. “child.age should represent the child’s age in whole years at the start of the academic year (1 September).”
  3. “child.school_type should be an enum with values: state_funded, independent, home_educated. Only state_funded schools are within scope.”
Step 4 — Test cases. Six scenarios covering both dimensions (age range and school type):
tests:
  - name: "Age 4 at state-funded school — eligible"
    inputs: { child.age: 4, child.school_type: state_funded }
    expect: { outcome: eligible }

  - name: "Age 15 at state-funded school — eligible"
    inputs: { child.age: 15, child.school_type: state_funded }
    expect: { outcome: eligible }

  - name: "Age 3 — too young"
    inputs: { child.age: 3, child.school_type: state_funded }
    expect: { outcome: not_eligible }

  - name: "Age 16 — above upper limit"
    inputs: { child.age: 16, child.school_type: state_funded }
    expect: { outcome: not_eligible }

  - name: "Age 10 at independent school — not eligible"
    inputs: { child.age: 10, child.school_type: independent }
    expect: { outcome: not_eligible }

  - name: "Age 8, home educated — not eligible"
    inputs: { child.age: 8, child.school_type: home_educated }
    expect: { outcome: not_eligible }
Test strategy: boundary values (4 and 15), below boundary (3), above boundary (16), and every excluded enum value with an age that would otherwise pass. Step 5 — Generate and test. All 6 tests passed on the first generation — no refinement loop needed. The source text was unambiguous and the guidance hints were specific enough. Step 6 — Publish. Ruleset aethis/uk-fsm/child-eligibility published with label “v1 — child eligibility gate (age 4–15, state-funded schools)”.

Composition

The three published sections compose into a rulebook with outcome logic A AND (B OR C):
sections:
  - section_id: child_eligibility
    pin_mode: latest_active
  - section_id: household_qualifying_criteria
    pin_mode: latest_active
  - section_id: universal_infant_fsm
    pin_mode: latest_active

outcome_logic: "A AND (B OR C)"
Section A is a prerequisite gate — both routes (means-tested and universal infant) require it to pass. A Year 1 child passes both A and C automatically. A Year 6 child must pass both A and B.

Construction All Risks insurance

Benchmark domain. A five-level exception chain in a London market endorsement — the failure pattern used to test frontier LLMs.
Access damage is excluded (Clause 8)
  → unless project value ≥ £100M — enhanced cover reinstates it (Clause 9(1))
  → unless defect is a design defect — enhanced cover doesn't apply (Clause 9(2))
  → unless project value ≥ £500M — pioneer override reinstates it (Clause 9(3))
  → unless defect was known prior — pioneer override is blocked (Clause 9A(1))
  → unless there's an engineer assessment — the block is lifted (Clause 9A(2))
Frontier LLM accuracy on the v3.8 adversarial CAR extension (20 newly-authored scenarios, Simpson et al. v3.8 2026, Table 8c):
ModelAccuracy (N=20)Notes
Aethis Engine20/20 (100%)deterministic, <5ms, same answer every time
GPT-5.4 (reasoning_effort=low)20/20 (100%)16–126 reasoning tokens per scenario
Claude Sonnet 4.619/20 (95%)fails E4 (DE3/LEG3 carveback gap)
GPT-5.4 (default)19/20 (95%)0 reasoning tokens on every scenario — short-circuits on E4
Claude Opus 4.7 (current Anthropic strongest)18/20 (90%)fails E4 + B3 (£499 M boundary)
Three of four frontier configurations fail the same scenario across both Anthropic and OpenAI families. The Aethis engine is invariant by construction. The shifting-ground problem (v3.8, §6.5 Finding 6). Several v3.7 paper cells closed silently between March and April 2026 under the same model alias — GPT-5.4 on construction-CAR moved from 96.6% to 100%; Opus 4.6 on spacecraft from 89.7% to 98.5%. The v3.7 11-scenario exception-chain subset that earlier examples cited has been replaced by the v3.8 adversarial extension above because current frontier configurations all hit 100% on the smaller subset. Frontier-LLM accuracy on a fixed benchmark is a moving target — exactly what regulated workflows cannot tolerate. External validation (v3.8, §6.10). On the peer-reviewed LegalBench benchmark — 9 tasks, 949 held-out cases — the Aethis Engine is significantly more accurate than each of three frontier LLMs: combined paired-binomial McNemar’s p < 0.001 vs Sonnet 4.6, p = 0.003 vs Opus 4.7, p < 0.001 vs GPT-5.4. The structural advantage holds on randomly-sampled tasks chosen without fit inspection. Full harness: confidently-wrong-benchmark/legalbench/. Full benchmark data, paper, and reproduction scripts: github.com/Aethis-ai/confidently-wrong-benchmark. Research paper: Confidently Wrong: Exception Chain Collapse in Frontier LLM Rule Evaluation (Simpson, Kozak, Doake, v3.8, 2026).

Spacecraft Crew Certification Act 2049

A deliberately simple public demo domain — ideal for first experiments.
aethis decide -b aethis/spacecraft-crew-certification \
  -i '{"space.crew.species": "Human", "space.crew.age": 35, "space.crew.flight_hours": 600}'
Decision: eligible
aethis decide -b aethis/spacecraft-crew-certification \
  -i '{"space.crew.species": "Vogon"}'
Decision: not_eligible
Trace: species_check FAIL — Vogon is disqualifying (Section 3)
One field. Decision reached. A Vogon is disqualified regardless of flight hours, so no further questions are asked.