Skip to main content

UK Free School Meals

The primary worked example. Three sections composing to A AND (B OR C), three source documents that cross-reference each other, 23 test cases. Source documents:
  • Education Act 1996 (s.512, s.512ZA) — child eligibility gate
  • The Education (Free School Meals) (England) Regulations 2014 (Reg 3, 4, 4A, 5) — appears in all three sections
  • Children and Families Act 2014 (s.105) — universal infant entitlement
Section structure:
SectionCoversSource documents
A — child_eligibilityAge 4–15, state-funded schoolEducation Act 1996 + Free School Meals Regulations Reg 3
B — household_qualifying_criteria7 benefit routes + looked-after/care leaverFree School Meals Regulations Reg 4 + Reg 4A
C — universal_infant_fsmReception, Year 1, Year 2 — automatic, no income testChildren and Families Act 2014 + Free School Meals Regulations Reg 5
Live composed rulebook: aethis/uk-fsm — combines the three sections under one outcome_logic. Authenticated decide (rulebook_id: aethis/uk-fsm). Live public section ruleset (anonymous decide OK): aethis/uk-fsm/child-eligibility — the Section A gate, queryable by itself. Sections B (household_criteria) and C (universal_infant) live as rulesets inside the aethis/uk-fsm rulebook (Phase B.2.2 of the converged 2-term model) rather than as standalone published slugs. Hit them via rulebook_id: aethis/uk-fsm and the rulebook’s outcome_logic combines them with Section A. Try a decision now (no API key needed):
aethis decide \
  -b aethis/uk-fsm/child-eligibility \
  -i '{"child.age": 10, "child.school_type": "state_funded"}' \
  --explain
{
  "decision": "eligible",
  "trace": {
    "status": "eligible",
    "path": "school_type_state_funded",
    "answered": ["child.age", "child.school_type"],
    "group_statuses": {
      "school_type_check": "satisfied",
      "age_check": "satisfied",
      "age_upper_check": "satisfied"
    }
  }
}
What this example demonstrates:
  • Multi-section composition with shared source documents
  • OR logic across sections (B or C is sufficient)
  • Automatic entitlement override (Section C has no income test)
  • Integer arithmetic with threshold comparison (£7,400 UC threshold)
  • Enum fields (child.school_type, child.year_group)
  • Unconditional boolean flags (child.is_looked_after, child.is_care_leaver)
Full source, test cases, and guidance: github.com/Aethis-ai/aethis-examples

How Section A was authored

The child eligibility section is the simplest — two fields, six tests, no refinement needed. Here is the complete authoring journey. Step 1 — Source documents. Two statutory texts were provided:
  • Education Act 1996 (s.512, s.512ZA) — defines “relevant school” (maintained schools, Academies, non-maintained special schools, pupil referral units) and compulsory school age
  • Free School Meals Regulations 2014 (Reg 3) — establishes entitlement for children aged 4–15 at a relevant school
Step 2 — Domain guidance. Before authoring began, two domain-level hints were added that apply to all three sections:
aethis_add_domain_guidance({
  domain: "uk_fsm",
  guidance_text: "Use child.* prefix for child fields, household.* for household fields.",
  process_type: "field_extraction",
  adherence: "exact"
})
Step 3 — Section-level guidance. Three hints for this section:
  1. “This section determines only whether the child is eligible based on age and school type. It does not assess household income — that is Section B.”
  2. “child.age should represent the child’s age in whole years at the start of the academic year (1 September).”
  3. “child.school_type should be an enum with values: state_funded, independent, home_educated. Only state_funded schools are within scope.”
Step 4 — Test cases. Six scenarios covering both dimensions (age range and school type):
tests:
  - name: "Age 4 at state-funded school — eligible"
    inputs: { child.age: 4, child.school_type: state_funded }
    expect: { outcome: eligible }

  - name: "Age 15 at state-funded school — eligible"
    inputs: { child.age: 15, child.school_type: state_funded }
    expect: { outcome: eligible }

  - name: "Age 3 — too young"
    inputs: { child.age: 3, child.school_type: state_funded }
    expect: { outcome: not_eligible }

  - name: "Age 16 — above upper limit"
    inputs: { child.age: 16, child.school_type: state_funded }
    expect: { outcome: not_eligible }

  - name: "Age 10 at independent school — not eligible"
    inputs: { child.age: 10, child.school_type: independent }
    expect: { outcome: not_eligible }

  - name: "Age 8, home educated — not eligible"
    inputs: { child.age: 8, child.school_type: home_educated }
    expect: { outcome: not_eligible }
Test strategy: boundary values (4 and 15), below boundary (3), above boundary (16), and every excluded enum value with an age that would otherwise pass. Step 5 — Generate and test. All 6 tests passed on the first generation — no refinement loop needed. The source text was unambiguous and the guidance hints were specific enough. Step 6 — Publish. Ruleset aethis/uk-fsm/child-eligibility published with label “v1 — child eligibility gate (age 4–15, state-funded schools)”.

Composition

The three published sections compose into a rulebook with outcome logic A AND (B OR C):
sections:
  - section_id: child_eligibility
    pin_mode: latest_active
  - section_id: household_qualifying_criteria
    pin_mode: latest_active
  - section_id: universal_infant_fsm
    pin_mode: latest_active

outcome_logic: "A AND (B OR C)"
Section A is a prerequisite gate — both routes (means-tested and universal infant) require it to pass. A Year 1 child passes both A and C automatically. A Year 6 child must pass both A and B.

Construction All Risks insurance

Benchmark domain. A five-level exception chain in a London market endorsement — the failure pattern used to test frontier LLMs.
Access damage is excluded (Clause 8)
  → unless project value ≥ £100M — enhanced cover reinstates it (Clause 9(1))
  → unless defect is a design defect — enhanced cover doesn't apply (Clause 9(2))
  → unless project value ≥ £500M — pioneer override reinstates it (Clause 9(3))
  → unless defect was known prior — pioneer override is blocked (Clause 9A(1))
  → unless there's an engineer assessment — the block is lifted (Clause 9A(2))
Frontier LLM accuracy on the v3.8 adversarial CAR extension (20 newly-authored scenarios, Simpson et al. v3.8 2026, Table 8c):
ModelAccuracy (N=20)Notes
Aethis Engine20/20 (100%)deterministic, <5ms, same answer every time
GPT-5.4 (reasoning_effort=low)20/20 (100%)16–126 reasoning tokens per scenario
Claude Sonnet 4.619/20 (95%)fails E4 (DE3/LEG3 carveback gap)
GPT-5.4 (default)19/20 (95%)0 reasoning tokens on every scenario — short-circuits on E4
Claude Opus 4.7 (current Anthropic strongest)18/20 (90%)fails E4 + B3 (£499 M boundary)
Three of four frontier configurations fail the same scenario across both Anthropic and OpenAI families. The Aethis engine is invariant by construction. The shifting-ground problem (v3.8, §6.5 Finding 6). Several v3.7 paper cells closed silently between March and April 2026 under the same model alias — GPT-5.4 on construction-CAR moved from 96.6% to 100%; Opus 4.6 on spacecraft from 89.7% to 98.5%. The v3.7 11-scenario exception-chain subset that earlier examples cited has been replaced by the v3.8 adversarial extension above because current frontier configurations all hit 100% on the smaller subset. Frontier-LLM accuracy on a fixed benchmark is a moving target — exactly what regulated workflows cannot tolerate. External validation (v3.8, §6.10). On the peer-reviewed LegalBench benchmark — 9 tasks, 949 held-out cases — the Aethis Engine is significantly more accurate than each of three frontier LLMs: combined paired-binomial McNemar’s p < 0.001 vs Sonnet 4.6, p = 0.003 vs Opus 4.7, p < 0.001 vs GPT-5.4. The structural advantage holds on randomly-sampled tasks chosen without fit inspection. Full harness: confidently-wrong-benchmark/legalbench/. Full benchmark data, paper, and reproduction scripts: github.com/Aethis-ai/confidently-wrong-benchmark. Research paper: Confidently Wrong: Exception Chain Collapse in Frontier LLM Rule Evaluation (Simpson, Kozak, Doake, v3.8, 2026).

Spacecraft Crew Certification Act 2049

A deliberately simple public demo domain — 11 fields across 7 rule groups, ideal for first experiments. Two demonstrations: a one-field short-circuit, and a fully-specified happy path.

One field, decision reached (Vogon)

A Vogon is disqualifying under §3(1) regardless of any other answer, so the engine short-circuits the moment space.crew.species: "Vogon" is provided:
aethis decide -b aethis/spacecraft-crew-certification \
  -i '{"space.crew.species": "Vogon"}' \
  --explain
{
  "decision": "not_eligible",
  "fields_provided": 1,
  "fields_evaluated": 11,
  "trace": {
    "status": "ineligible",
    "failure_reasons": [
      ["species_not_vogon", [
        { "type": "answer", "field": "space.crew.species", "value": "vogon" },
        { "type": "condition",
          "expression": "Not(spacecraft-crew-certification:v1:space.crew.species == Vogon)" }
      ]]
    ],
    "group_statuses": {
      "species_eligibility": "not_satisfied",
      "flight_readiness": "pending",
      "medical_certification": "pending",
      "medical_cert_validity": "pending",
      "radiation_certification": "pending",
      "propulsion_compliance": "pending",
      "towel_compliance": "pending"
    }
  }
}
fields_provided: 1, fields_evaluated: 11 — the engine reasoned across all 11 fields and discharged the case as soon as the species check failed. The other groups stay pending because no further questions need answering.

All fields, happy path (Human + Improbability Drive)

To pass every gate, the applicant needs a valid licence, medical, radiation cert, towel, and a vessel running on something more exciting than conventional propulsion (§7(2) — see aethis/spacecraft-crew-certification/explain):
aethis decide -b aethis/spacecraft-crew-certification -i '{
  "space.crew.species": "Human",
  "space.crew.age": 35,
  "space.crew.flight_hours": 600,
  "space.crew.has_pilot_license": true,
  "space.crew.has_gaa_exam": true,
  "space.crew.has_approved_provider_cert": true,
  "space.medical.cert_valid": true,
  "space.mission.type": "orbital",
  "space.crew.has_radiation_cert": true,
  "space.vessel.propulsion_type": "Infinite Improbability Drive",
  "space.crew.has_towel": true
}'
decision: eligible
Swap "Infinite Improbability Drive" for "Conventional" and the decision flips to not_eligible with propulsion_compliance: not_satisfied — the engine names the exact group that failed.