← Portfolio

Medico-Legal Agency EngagementProblem reframing, prioritisation, and AI-powered expert matching and response generation for a specialist consultancy

1. The Engagement

A medico-legal consultancy approached me to help improve their operational efficiency. They connect solicitors with medical expert witnesses: when a solicitor needs an independent medical opinion for a legal case, the consultancy identifies the right expert, arranges the instruction, and manages the process through to report delivery.

The majority of incoming enquiries were not converting into business. The client had set an ambitious target to reach roughly three times the current conversion rate, which would require a fundamentally different approach. The gap was not just about speed. Handlers were junior, had no medical background, and were expected to matchmake across clinical specialisms they did not understand. Their existing use of AI amounted to pasting enquiries into ChatGPT alongside the company website and hoping for a reasonable match. The results were inconsistent and the approach did not scale.

Before the first meeting, the client had prepared a structured spreadsheet of pain points across multiple departments. Each row described a problem, what it involved, why it was a current issue, a proposed solution, and a desired outcome.

2. Reframing the Problems

I took the original spreadsheet and reframed each problem on the same day. For every item I added a root problem analysis, a hypothesis about what would actually move the needle, an identification of the leverage user whose behaviour change would have the greatest impact, a concrete proposal, and MVP success criteria.

The shift in perspective was significant. The client’s original framing centred on tools: build a matching engine, build a QA assistant, build a proposal generator. My reframing centred on where the friction actually sat and who was best placed to reduce it. Here are three examples that illustrate the pattern:

ProblemOriginal framingReframed root cause
Expert matchingBuild stronger AI matchmaking from a knowledge baseThe expert data itself is unstructured, inconsistent, and manually maintained. Build a structured, expert-curated profile system and matching accuracy improves as a natural consequence.
Written proposalsBuild an AI-powered proposal generator for fee proposals sent to solicitorsEach proposal is manually assembled from per-expert templates, taking 15 to 20 minutes and varying in quality by handler. Generate AI drafts instead, let handlers review and edit, and use their edits as a feedback loop to improve the prompts over time.
Report QABuild an AI-driven QA assistantQA happens too late in the process, causing repetitive corrections and missed learning. Shift it to the point of report upload and let experts self-correct before submission.

Many of these pain points could be addressed at two levels. At the operational level, by giving handlers better tools to do the work they were already doing. Or at a structural level, by redesigning who does the work entirely. Report QA, for example, could be improved by helping the operations team review reports more efficiently, or it could be restructured so that automated checks gate submission before the operations team is involved at all.

The approach during this engagement was focused on the client’s immediate business and operational goals: strengthening the existing team with better tools, better data, and better reasoning to work with. Within that framing, one insight from the reframing stood out immediately and shaped everything that followed. The expert matching problem was not really a matching problem. It was a data problem. The existing expert profiles were unstructured multi-page CVs with no standardised taxonomy, no consistent categorisation of specialisms, and no way for software to reason about what an expert actually covers. I identified in the same-day response that the first thing to build was a structured expert profile system. If profiles were well structured, with specialisms broken into sub-specialisms and tagged with searchable keywords, then matching accuracy would improve as a natural consequence. Without that foundation, no amount of AI sophistication would compensate.

Product thinking: When someone asks for a matching engine, the instinct is to start building the matching engine. But matching quality is bounded by data quality. The first delivery needed to be the expert schema and the pipeline to populate it, not the matching logic itself.

3. The Priority

From that analysis, the client and I agreed on a single priority for Phase 1: expert matching and response generation. It had the most direct link to the conversion rate and the clearest path to a demonstrable proof of concept.

The formal brief was to build a system that could accurately match an expert to an incoming enquiry, generate a CPR Part 35-compliant justification explaining why the expert is appropriate for the case, and adapt its behaviour across three tiers of enquiry detail: low, moderate, and high. Where the named expert was not a good match, the system should explain why and recommend a better-suited alternative.

Before any of that could work, the expert data needed to exist in a structured form. That meant the delivery sequence started not with the matching engine but with the expert schema and the pipeline to populate it from existing CVs. Only then could matching and response generation be built on a reliable foundation. Everything else, including fees, scheduling, QA, and the expert portal, was deliberately deferred. The goal was to prove the core capability and build outward from there.

A deliberate design principle ran through the entire project: the goal was to keep the operations handler fully in control while enhancing and augmenting their expertise. The system would surface reasoning, suggest matches, and draft responses, but the handler would always make the final decision. Every stage was built as an independent, observable step that the handler could review, override, or approve. A fully automated mode was also built and demonstrated, but as a capability the client could choose to adopt when ready, not as the default operating model.

4. Discovery

The initial assumption was that enquiries would arrive as formal Letters of Instruction: structured, multi-page legal documents containing detailed clinical information, named parties, and specific questions for the expert to address. If that were the case, the system could parse a predictable format and extract what it needed with reasonable confidence.

The operations lead corrected this in the first working session. Most enquiries arrive as ordinary emails. Some are four sentences long. Others are detailed briefs with injury specifics and a named expert. Many land somewhere in between. The system would need to handle unstructured text with widely varying levels of detail rather than relying on a known document format.

That changed the design in a fundamental way. Rather than a single extraction pipeline, I needed a classification layer that could assess how much information was actually present and adapt downstream behaviour accordingly. A three-sentence enquiry about a knee injury produces a very different workflow from a four-page Letter of Instruction naming a specific neurologist.

The other important discovery concerned matching itself. Roughly 70% of enquiries already name a specific expert. That does not mean the match is correct. The consultancy still needs to verify the fit and sometimes redirect to someone better suited. A solicitor might request a general orthopaedic surgeon for a spinal decompression case when a dedicated spinal specialist would be a stronger match. Matching is therefore not just for the 30% of open enquiries. It is a verification and quality layer across every enquiry that arrives.

5. The Expert Schema

Before matching could work, expert profiles needed structure. A 50-page CV contains the right information but not in a form that software can reason about. I designed a hierarchical schema that decomposes each expert into discrete, searchable layers.

1 Expert Identity Name, aliases, availability status.
2 Clinical Profile Posts, qualifications, languages, locations.
3 Specialisms Primary disciplines such as Neurology or Orthopaedics.
4 Sub-specialisms Specific areas within each discipline, such as Epilepsy or Traumatic Brain Injury.
5 Keywords and Synonyms Searchable terms mapped to each sub-specialism for semantic matching.
6 Witness Profile Medico-legal experience, report types, court appearances.
7 Testimonials and Media Quotes, publications, press mentions.

The specialism hierarchy is what makes matching granular. An orthopaedic surgeon with hip and spine sub-specialisms gets separate embedding vectors, one for each sub-specialism with its own keywords. When a spinal injury enquiry arrives, it matches the spine vector strongly without being diluted by the hip expertise. That specificity is what turns broad “orthopaedics” into a meaningful, targeted match.

Every factual field in the schema carries a source and confidence score. A qualification extracted directly from a CV carries high confidence. A location inferred from an NHS trust name carries lower confidence. This provenance tracking supports auditability and allows the system to distinguish between verified facts and reasonable inferences when generating responses.

6. From CVs to Searchable Profiles

The consultancy had over 200 expert CVs stored as documents in Google Drive. These needed to become structured, searchable, and embeddable profiles, and the process needed to be operable by non-technical staff without manual data entry for each expert.

CV document
Text extraction
GPT structuring
Spreadsheet review
Encrypted database

I built a two-part pipeline. The first part is a Google Apps Script that staff can operate directly from a custom menu inside Google Sheets. It opens each CV from a Drive link, extracts the document text, sends it to GPT with a strict schema-driven grounding prompt, and writes the structured JSON result back to the spreadsheet. The prompt enforces exact field ordering, British English, qualification type classification, source attribution on every field, and zero fabrication. Where information is absent from the CV, the field is left null rather than inferred.

The second part is a Python utility that takes the structured output and prepares it for the live system. It generates unique identifiers with collision checking, encrypts sensitive fields using Fernet symmetric encryption, upserts records to the database, and builds the embedding chunks that power vector search.

How chunking works

Each expert produces one embedding chunk per sub-specialism. A chunk is a concatenated string containing the expert ID, name, specialism label, sub-specialism label, and up to eight keywords. For an expert with three specialisms and ten sub-specialisms across them, that produces ten separate vectors in the database, each tightly focused on a specific area of expertise.

This is the design choice that makes matching precise rather than vague. Embedding an entire CV would produce one blended vector that averages everything together. Embedding at the sub-specialism level means a search for “epilepsy, seizure, EEG” finds the neurology-epilepsy chunk directly, without interference from the same expert’s stroke rehabilitation or headache disorder expertise.

7. Two-Stage Matching

The matching engine needed to solve a tension between context and cost. Semantic search is fast and inexpensive, but it operates on compressed representations of expertise rather than full profiles. It can surface candidates whose specialism vectors align with the enquiry terms, but it cannot reason about the nuance of why one neurologist is better suited than another for a specific case involving post-concussion syndrome in a child.

The alternative, passing every expert’s complete profile through a language model alongside the enquiry, would provide that reasoning. But with a panel of over 200 experts, the cost and latency of prompt-stuffing the entire database on every enquiry would be prohibitive. The approach does not scale.

The solution is a two-stage design where each stage does what it is best at.

Stage 1: Semantic search

The system takes the extracted medical terms, primary specialism, and key phrases from the enquiry and combines them into a query string. That string is embedded using the same model used for the expert chunks. The resulting vector is compared against every pre-computed chunk in the database using cosine similarity.

Chunks scoring above a 0.4 similarity threshold are kept. They are grouped by expert and scores are aggregated with a weighting that favours higher-ranked chunks. The result is a ranked shortlist of the top candidates with numeric similarity scores. This entire process takes seconds and costs fractions of a penny. It reduces 200+ experts to a manageable set of the most plausible matches.

Stage 2: Reasoned ranking

For the shortlisted candidates from Stage 1, the system retrieves their full, detailed profiles from the database. These complete profiles, along with the original enquiry text, are then passed to a language model that can reason about the match with the full context available. The model returns a re-ranked list with an AI-generated explanation for each expert: why one has direct sub-specialism alignment, why another has broader experience but less specific relevance to this particular case, and why a third was excluded despite surface-level similarity.

This second pass is meaningfully more expensive, but it only runs against a small number of candidates rather than the entire panel. That is the core architectural insight: use the cheap pass to reduce the search space, then apply the expensive pass where it can operate with complete information and produce genuinely useful reasoning.

The handler sees both stages and can accept the vector ranking, review the reasoned output, or override either. Experts who were not shortlisted are hidden by default but can be revealed alongside the explanation of why they were excluded.

Crucially, the handler can also override the system entirely and select any expert from the full panel, including ones the matching engine did not rank highly or did not surface at all. When they do, the system does not simply accept the choice. It runs the same detailed assessment against the selected expert specifically, evaluating their profile against the enquiry and returning a full explanation of how strong or weak the match is. The handler gets an informed second opinion on their own decision before committing to it. This means the system never blocks human judgement but always ensures that judgement is exercised with context rather than in the dark.

Systems thinking: Semantic search is excellent at narrowing a large field quickly and cheaply. Language models are excellent at reasoning about fit when given full context. Neither is sufficient alone at scale. Combining them in sequence, with the vector pass reducing the candidate pool before the reasoning pass provides depth, produces high-quality results at a cost that remains proportional to volume rather than growing with the size of the expert panel.

8. Response Generation

Once an expert is selected, the system generates a response appropriate to the scenario. The classification metadata, expert profile, and original enquiry text are assembled and passed to a prompt designed to produce CPR Part 35-compliant correspondence.

The prompt went through three iterations. The first version assumed formal Letters of Instruction as input, which reflected the pre-discovery understanding of the domain. The second adapted for informal email enquiries with a warmer, more conversational tone after the operations lead revealed the reality of how enquiries arrive. The third, informed by client feedback, tightened the focus: rather than showcasing the expert’s full skill set, the response was rewritten to address the specific medical requirements raised in the enquiry. Each iteration made the output more relevant to the person reading it.

Every response must be factually grounded. It cannot fabricate qualifications, claim expertise the profile does not support, or oversell the match. It must comply with CPR Part 35 requirements for expert witness communications. The handler can edit the draft before sending, and the system preserves the distinction between generated content and manual modifications.

Scenario handling

The classification output from the earlier stage flows directly into response generation. The detail level, extracted terms, cited expert status, and matching outcome combine to determine not just which expert to recommend but which type of response to produce. Different scenarios generate genuinely different correspondence:

This matters because the previous approach applied a single template regardless of context. The classification layer drives substantively different output rather than cosmetically different phrasing over the same structure.

9. End to End

Enquiry received
Classification
Expert matching
Response draft
Human review
Send

During the handover demonstration, a real enquiry was forwarded mid-call. I ran it through the system live. It classified the enquiry, matched against the expert panel with reasoning, generated a CPR 35-compliant response, and completed the full cycle in under six minutes against a sixty-minute target.

The system also demonstrated a fully autonomous mode in which an incoming email triggers the entire pipeline without human interaction: classification, matching, response generation, and reply sent automatically. A semi-autonomous mode runs the same flow through the visual interface, completing the process in 38 seconds while the handler watches the reasoning unfold and can intervene at any step.

The human-in-the-loop mode is the one that matters most. At every stage the handler sees how the enquiry was classified, which terms were extracted and at what confidence, which experts were considered and why some were excluded, and what the generated response contains. They can override any decision. That transparency builds trust in the system over time and, just as importantly, helps junior handlers learn the domain by observing how the system reasons about specialisms and fit. That capability development was one of the underlying problems the consultancy needed to solve alongside the immediate efficiency gains.

The operations lead’s assessment after the live demonstration: “Definitely better than what we currently have in place. This is a very good starting point. Really impressed.”

10. What Makes It Different

The engagement started with problem reframing, not solution building. It identified that the matching problem was really a data problem, delivered the schema and ingestion pipeline first, and only then built the matching and response generation on top of a reliable foundation. Every stage was designed to augment the handler rather than replace them, with full automation available as a choice rather than the default.

The two-stage matching architecture solves a real constraint that simpler approaches ignore: semantic search is fast but shallow, and prompt-stuffing an entire expert panel is too expensive to sustain. Chaining the two produces results that are both specific and well-reasoned at a cost that scales with query volume rather than panel size.

The proof of concept was successfully demonstrated live. Since the engagement, I have continued developing it, and have since brought in a business partner to take it further.