What to Look for in an AI Notes Tool for Therapy (Buyer's Checklist)
The market for AI-assisted clinical notes has exploded. Two years ago, there were a handful of options. Today, virtually every EHR has an AI feature, and standalone AI note tools are multiplying rapidly. Everyone claims to be "clinically accurate" and "HIPAA compliant."
The problem is not a lack of options. The problem is that most therapists do not know what to evaluate when choosing an AI notes tool, because the marketing language is nearly identical across every product. They all say they save time. They all say they produce better notes. They all say they are secure.
So how do you actually tell the difference?
This checklist covers the criteria that matter -- organized by priority, with specific questions to ask vendors and red flags to watch for. Whether you are evaluating your first AI notes tool or considering a switch, these are the factors that separate genuinely useful tools from expensive autocomplete.
Key Takeaway
The two criteria that matter most when evaluating AI note tools are clinical intelligence (does it understand your therapeutic modality?) and HIPAA compliance (does it have a BAA and clear data handling?). A tool that scores well on these two categories is worth considering even if other areas are weaker. A tool that fails on either is not worth your time.
Category 1: Clinical Intelligence (Most Important)
This is the most important category and the one where tools differ most significantly. Clinical intelligence determines whether the AI produces notes you can sign in two minutes or notes you have to rewrite from scratch.
Does the tool understand your therapeutic modality?
This is the single most important question. A tool that produces the same generic output regardless of whether you practice CBT, IFS, EMDR, ACT, or psychodynamic therapy is operating at the summarization level. It can tell you what happened in the session. It cannot tell you what clinical work was done.
What to ask: "If I run a CBT session through your tool, will the output identify specific cognitive distortions by name? If I run an IFS session, will it use parts language? If I run an EMDR session, will it track SUD levels and desensitization phases?"
Red flag: If the vendor answers by talking about templates ("We have a CBT template") rather than clinical understanding ("The AI identifies cognitive distortions from the session content"), the tool operates at the template level, not the intelligence level.
Does the output use clinical vocabulary?
There is a meaningful difference between "client discussed negative thoughts" and "client identified catastrophizing pattern: predicted job termination based on a single critical email." The first is a summary. The second is clinical documentation.
What to test: Run a sample session (or read the vendor's example output carefully). Does the output use the specific vocabulary of your framework? Cognitive distortions by name? Parts and Self? Hexaflex processes? Desensitization phases? Or does it default to generic therapy language?
Red flag: Output that consistently uses vague terms like "explored feelings," "provided therapeutic interventions," or "discussed coping strategies" regardless of modality. These phrases indicate the AI does not understand the clinical framework.
Can the tool produce meaningfully different notes for different modalities?
A modality-aware tool should produce fundamentally different output for the same client concern processed through different therapeutic frameworks. Anxiety addressed through CBT (cognitive distortion identification, Socratic questioning, behavioral experiment) should look completely different from anxiety addressed through ACT (acceptance, defusion, values-linked committed action).
What to test: If possible, describe the same clinical scenario and ask the vendor to show output for two different modalities. If the outputs are structurally similar with different vocabulary swapped in, the tool is using templates, not clinical intelligence.
Does the AI understand note structure beyond SOAP?
SOAP is fine for many contexts, but it is not the only format -- and for some modalities, it is not the best one. BIRP maps more naturally to CBT. IFS documentation benefits from parts-based formats. DBT notes should reflect diary card review and the treatment hierarchy. EMDR notes need phase tracking.
What to ask: "What note formats does your tool support? Can it produce BIRP, DAP, or modality-specific formats? Or only SOAP?"
Red flag: A tool that only outputs SOAP notes and positions this as sufficient for all modalities. Format flexibility is not a luxury -- it is a reflection of how well the tool understands different clinical workflows.
Category 2: HIPAA Compliance and Security
AI note tools handle protected health information. This is non-negotiable territory.
Does the vendor have a Business Associate Agreement (BAA)?
A BAA is required by HIPAA for any vendor that handles PHI on your behalf. No BAA means you are operating outside of HIPAA compliance, regardless of what the vendor claims about their security.
What to ask: "Do you provide a signed BAA? Can I review it before signing up?"
Where does session data go, and how long is it retained?
When you record a session or submit session notes to an AI tool, that data travels somewhere to be processed. You need to know where it goes, whether it is stored, and for how long.
What to ask: "Where is my session audio/transcript processed? Is it sent to a third-party AI provider? Does the AI provider retain any data? Is data encrypted in transit and at rest?"
Best answer: Data processed through a HIPAA-compliant AI provider (like AWS Bedrock) with a BAA in place, zero data retention by the AI model provider, encryption in transit and at rest.
Red flag: Vague answers like "our data is secure" without specifying where processing occurs, whether data is retained by AI subprocessors, or whether BAAs are in place with every entity in the data flow.
Is session audio deleted after processing?
Some tools retain session recordings indefinitely. Others delete them immediately after generating the note. Understanding the data lifecycle matters for your risk profile.
What to ask: "After the AI generates the note, what happens to the session recording or transcript? Is it deleted automatically? Can I control retention settings?"
Red flag: Session recordings stored indefinitely without a clear data lifecycle policy or client consent framework.
Who can access my client data?
Understand the access model. Can vendor employees see your client data? Is there audit logging? Are there role-based access controls?
What to ask: "Can your employees access my session recordings, transcripts, or notes? Under what circumstances? Is there an audit log?"
Category 3: Integration and Workflow
A tool that produces perfect notes but creates workflow friction will eventually sit unused.
Does it integrate with your EHR?
If you are using a standalone AI notes tool (not built into your EHR), how do the notes get into your clinical record? Copy and paste? Direct integration? API?
What to evaluate: A standalone AI tool that requires you to copy-paste notes into your EHR creates a manual step that adds time and introduces error potential. Integrated solutions (AI notes built into the EHR) eliminate this friction entirely.
The integrated advantage: When AI notes live inside your practice management platform alongside scheduling, billing, and client records, the AI can access clinical context that standalone tools cannot -- treatment plans, previous session notes, client history. This context makes the AI more accurate.
How does session input work?
Different tools capture session content differently:
- Live recording: The tool records the session in real time and processes the full audio.
- Voice memo: You record a brief post-session summary and the AI expands it.
- Text input: You type key observations and the AI generates the full note.
Each has trade-offs. Live recording is comprehensive but raises client consent and comfort questions. Voice memos are fast but depend on what you remember to include. Text input is the most controlled but requires the most manual effort.
What to consider: Which input method matches your workflow? Does the tool support multiple input options? Can you switch between methods based on the session?
Is the note-generation process fast enough for between-session use?
If the tool takes 10 minutes to generate a note, it is not useful in the 10-minute gap between sessions. Practical AI tools should generate notes in under 2 minutes.
What to test: Time the tool from input to output. If it takes longer than 2 to 3 minutes, you will end up batching notes at the end of the day, which defeats much of the purpose.
Category 4: Cost and Pricing Model
AI note tools use different pricing models, and the one that looks cheapest on the pricing page may not be the cheapest at scale.
What is the total monthly cost at your caseload?
Pricing models include:
- Flat monthly subscription: One price regardless of session volume (e.g., $40-$90/month).
- Per-session pricing: Charged per note generated (e.g., $0.99-$1.49/session).
- Included in EHR: AI notes bundled into the EHR subscription at no additional cost.
Per-session pricing can be deceptive. At 5 sessions per week, $0.99/session is about $20/month. At 25 sessions per week, it is $99/month. The more you use it, the more it costs.
How to calculate: Multiply the per-session rate by your average weekly sessions, then multiply by 4.3 (average weeks per month). Compare that number to flat-rate alternatives.
Is the AI a paid add-on or included in the platform?
Some EHRs include AI notes in their base or standard plan. Others charge a separate add-on fee. The add-on model means your total EHR cost is the base subscription plus the AI fee.
Example comparison:
- EHR at $69/month + AI add-on at $40/month = $109/month total
- Integrated platform with AI included at $59/month = $59/month total
The $50/month difference adds up to $600/year -- more than one month of free service.
Are there hidden per-use fees?
Beyond AI note generation, check for per-use charges on SMS reminders, claim submissions (if applicable), additional storage, or premium features.
What to ask: "Beyond the subscription fee, are there any per-use charges I should be aware of?"
Category 5: Clinical Customization
Can you customize the note format?
You may prefer BIRP over SOAP, or a modality-specific format over a generic one. Can the tool accommodate your preference?
What to ask: "Can I choose between note formats (SOAP, BIRP, DAP)? Can I customize what sections appear in the output?"
Can you edit the output before finalizing?
The AI note should always be a draft. You review it, make corrections, and finalize it. Any tool that positions AI output as a finished note -- or makes it difficult to edit -- is a clinical risk.
What to confirm: "Are AI-generated notes presented as drafts for my review? Can I edit any part of the note before finalizing?"
Red flag: Tools that automatically file notes without requiring therapist review. No AI is accurate enough to bypass clinical review.
Does the AI improve over time?
Some tools adapt to your writing style, clinical preferences, and feedback. Others produce the same output quality regardless of how long you use them.
What to ask: "Does the AI learn from my edits and preferences? Does output quality improve over time?"
Category 6: Support and Reliability
What happens when the AI is wrong?
AI-generated notes will contain errors. The question is how the tool handles that reality.
What to evaluate: Is there a clear review workflow? Can you flag errors? Does the vendor communicate about known accuracy limitations honestly?
Red flag: Any vendor that claims 100% accuracy or positions their tool as eliminating the need for note review. AI-assisted does not mean AI-autonomous.
Is there responsive customer support?
When something goes wrong with a tool that handles PHI, you need help quickly.
What to check: Response time commitments, support channels (email, chat, phone), and whether support is available during business hours in your time zone.
What is the vendor's track record?
For newer tools, this is harder to assess. But you can look at:
- How long they have been operating
- Whether they have published information about their clinical validation process
- Reviews from other therapists (not just marketing testimonials)
- Transparency about limitations and roadmap
The Quick-Reference Checklist
Use this checklist when evaluating any AI notes tool. Score each criterion on a 1-3 scale (1 = does not meet, 2 = partially meets, 3 = fully meets).
Clinical Intelligence
- Understands my therapeutic modality (not just templates)
- Uses clinical vocabulary specific to my framework
- Produces different output for different modalities
- Supports note formats beyond SOAP
Security and Compliance
- Provides a signed BAA
- Data processed by HIPAA-compliant AI providers
- Clear data retention and deletion policies
- Transparent about data access and audit logging
Workflow Integration
- Integrates with my EHR (or is built into it)
- Supports my preferred input method
- Generates notes fast enough for between-session use
Cost
- Total monthly cost at my caseload is reasonable
- No hidden per-use fees
- AI is included, not a paid add-on
Customization
- Supports my preferred note format
- Notes are editable drafts, not auto-filed
- AI improves with use
Support
- Responsive customer support
- Honest about limitations
- Transparent track record
A tool that scores 3 on Clinical Intelligence and Security is worth considering even if other categories are 2s. A tool that scores 1 on either of those categories is not worth your time, regardless of how well it performs elsewhere.
Applying This Checklist
No AI tool is perfect. Every option involves trade-offs. The purpose of this checklist is not to find a perfect score -- it is to help you evaluate trade-offs deliberately rather than choosing based on marketing claims.
The tools that score highest tend to share two characteristics: they were built specifically for therapy (not adapted from medical or general-purpose AI), and they integrate clinical intelligence into the platform rather than bolting it on as an add-on.
TherapyDesk was designed around these principles -- modality-aware AI that understands CBT, DBT, IFS, EMDR, ACT, and psychodynamic frameworks, built into an integrated practice management platform with HIPAA compliance, free SMS, and no per-session AI fees. All at $59/month.
If you want to evaluate for yourself, try the demo -- it takes two minutes, and you can see how the AI handles your specific modality.