Generator — known limits, verification notes, and refinement backlog¶

Companion to README.md. Captures what the generator does well, where the room for error is, and what should happen before a generated pack is sent to a client.

Verification — what was checked¶

On the most recent run against ../rag-chatbot-client-questionnaire.md with the example-client metadata:

All 54 numbered questions from the markdown appear in the Word doc and as 54 rows in the Excel Questionnaire sheet — no gaps, no duplicates.
Soft-wrapped questions (where the markdown source wraps a single question across multiple lines) are folded into one logical paragraph.
Markdown emphasis (**bold**) is stripped from rendered text.
Sub-bullets under multi-choice questions render as bulleted options in the Word doc and as a newline-separated "Options" column in Excel.
The five metadata blocks (engagement / Eidos team / client / engagement context / confidentiality) flow into both Word tables and the Excel Cover sheet.
"What happens next" prose at the end of the source markdown is excluded from the questionnaire body (it belongs in the cover note, not as a numbered question).

Room for error — what to eyeball before sending¶

The generator is deterministic but the markdown source can drift in ways the parser will not catch. Spot-check the output before sending:

Question numbering gaps. If the source markdown has a gap (e.g. question 17 missing) or duplicates, the generator passes that through unchanged. Skim the # column in Excel before sending.
Section ordering vs question numbering. If a section is inserted mid-document, its question numbers must be higher than the section before — otherwise readers see questions appearing in non-monotonic order (e.g. ...36, 41-50, 51-54). This was a real bug, since fixed; remains a risk on future edits. Convention: never renumber existing questions, always extend at the end and keep section blocks contiguous.
Tables, images, code blocks in the source. The parser does not handle these. Anything wrapped in a fenced code block, table, or image syntax will either be dropped or appear as raw text in the intro. The current questionnaires have none — keep it that way, or extend the parser.
Nested bullets. Only one level of sub-bullet under a numbered question is supported. A bullet of a bullet will be flattened.
Non-ASCII punctuation. Em-dashes, curly quotes, etc. flow through correctly — but copy-paste from Word can inject zero-width characters that look correct on screen and break search later. Run a cat -v on a questionnaire if anything looks off.
Empty answer boxes look like an editing accident. Word users sometimes assume the shaded box was supposed to contain a prompt. The cover page now says "leave blank if unsure" — confirm that sentence survives any client-specific edits before sending.
Branding ceiling. The Word output uses Calibri + Eidos brand green; no logo, no header/footer styling, no document properties (author, company, title). If the client expects a fully brand-styled document, open the .docx in Word and apply the Eidos template, or paste the content into a Word template — easier than extending python-docx.
Confidentiality on the Excel file. The .xlsx contains the primary, technical and procurement contacts in plain text on the Cover sheet. If the client wants the spreadsheet shared widely internally, consider whether those names should be there.
Currency / region in worked examples. Any GBP numbers shown in the source markdown carry through unchanged. For non-UK clients, review and substitute before sending.

Refinement backlog — questions that should probably be in the questionnaire¶

Either add to the markdown so the next generation picks them up, or keep as a "ask on the discovery call" cheat sheet. Listed here so the team can debate them rather than them living in chat history.

Current ticket volume the chatbot is supposed to deflect. If the client cannot put a number on the inbox / helpdesk traffic the chatbot is meant to reduce, the success metric in Section 1 is weak. Add: "What is the current monthly volume of questions the chatbot is intended to deflect?"
Existing chatbot or AI pilot. "Have you tried any AI / chatbot tooling already, even informally?" Surfaces ChatGPT Enterprise, Copilot, M365 Chat usage that affects the buying argument.
Acceptable answer latency. Frontier API ~2-4s, self-hosted open model can be 5-15s depending on hardware. Some clients have hard ceilings ("must answer in under 3 seconds"). Worth asking before quoting hosting.
Multi-turn vs single-turn expectations. Does the client expect the chatbot to hold conversation state ("can you also tell me…"), or are these mostly one-shot lookups? Affects token spend and UX work.
Feedback loop. Will users be allowed to thumbs-up / thumbs-down answers? Will those signals be surfaced to admins? Drives 1-3 extra days of admin-UI work.
Out-of-hours expectation. Some clients expect the chatbot to answer 24/7; others only during working hours. Affects monitoring and SLA tier choice.
Branch / multi-site rollout pattern. If the client has many physical sites, do they want a phased site-by-site rollout or simultaneous? Affects communications and support sizing.
Existing knowledge ownership. "Who today is asked the same questions repeatedly that this should replace?" Naming those people surfaces both the success metric AND the political risk (the chatbot replacing their unique value).
Data classification scheme. If the client already classifies documents (public / internal / confidential / restricted), capture that vocabulary — the retrieval permissions will need to honour it.
Disaster scenario for incorrect answers. "What is the worst plausible outcome of the chatbot giving a wrong answer?" Often surfaces a regulator or liability concern the discovery missed.

Sessions that may need to follow up the questionnaire¶

A returned questionnaire is rarely enough on its own. Plan for one or more of these depending on what came back:

Document landscape workshop (90 minutes). Triggered when the answer to "where do documents live today?" is anything other than a single tidy source. Bring an SME from the client's content side plus IT. Outcome: a concrete in-scope list with owners.
Compliance / DPIA workshop (90-120 minutes). Triggered when the client names GDPR-DPIA, ISO 27001, or any sector-specific regulation. Outcome: agreed data-flow diagram and the evidence pack scope.
Identity & access workshop (60 minutes). Triggered when per-role permissions are required or when SSO is not already universal. Outcome: identity provider confirmed, permission model drawn, forward-auth scope agreed.
Hosting decision workshop (60 minutes). Triggered when the client picks position B / C / D on AI hosting (anything other than "frontier API is fine"). Outcome: residency and infrastructure decision in writing, GPU sizing or cloud region pinned.
Success-metric workshop (45 minutes). Triggered when Section 1 comes back vague. Outcome: 1-3 measurable metrics the client agrees will define go/no-go at 6 months.
Pilot scope agreement (45 minutes). Default for almost every deal. Outcome: phase-1 boundaries written down — which department, which document set, which users.
Pricing read-through (30 minutes). Always. Walk the client through the assumptions section of the proposal so the numbers do not surprise them later.

Operational hygiene¶

The generator is deliberately uncached. Re-running it on the same YAML produces a fresh output with the current markdown — so updating the questionnaire and regenerating is always safe.
Bump quote_version in the YAML on every send, even if only typos changed. Clients refer to documents by version when they reply.
The out/ directory is gitignored. Filled questionnaires returned by clients belong in the deal's CRM record or a designated SharePoint / deal folder, not in this repo.
If a question gets repeatedly skipped by clients, that is data — rework the question rather than nagging clients for it.