Rethinking AI-Driven Speech and Sensory Regulation: A Framework for Personalization in Assistive Technologies for Neurodivergent Children

0
4906

Abstract

This review explores the current landscape of artificial intelligence (AI)-driven tools for speech and sensory regulation in neurodivergent children, an area of growing importance as traditional therapeutic methods often fall short in addressing diverse and individualized needs. Speech disabilities such as apraxia and stuttering, alongside sensory processing challenges, create barriers to communication and daily functioning. AI and machine learning (ML) have emerged as promising complements to therapy, providing adaptive feedback, monitoring neurophysiological states, and offering personalized interventions that can evolve with each child’s progress. However, many systems are still developed on implicit biases—such as universal biometric baselines, static intervention models, and datasets dominated by adult, neurotypical speech—that limit effectiveness and inclusivity. Unlike prior reviews that primarily catalog existing tools, this paper synthesizes peer-reviewed evidence across speech recognition and wearable sensing, analyzes validity by device and context (heart rate variability (HRV), electrodermal activity (EDA), and electroencephalography (EEG), and interrogates assumptions around calibration and adaptability. From this synthesis, we propose a co-adaptive, context-aware framework integrating multi-modal sensing, user-preference modeling, and environmental awareness. Our PRISMA-guided synthesis (n = 25 studies) highlights promise but underscores the need for pediatric-inclusive datasets, device-specific validation, and ethical safeguards. We outline a roadmap to translate prototypes into equitable, evidence-based assistive technologies that foster autonomy and meaningful participation in everyday life.

Keywords: Speech Recognition, Voice Recognition, Sensory Overload, Sensory Regulation, Artificial Intelligence, Machine Learning, Assistive Technologies

Introduction

Background and Context

Speech disabilities such as stuttering (prevalence ≈ 0.7–1% overall; onset typically ages 2–5) and childhood apraxia of speech (CAS, ~ 1–2 per 1,000 children) can limit expressive ability, reduce academic participation, and increase social isolation1,2. Neurodivergent children (e.g., autistic children) frequently experience atypical sensory processing, with high rates of hyper- or hypo-reactivity that can precipitate overload. Standard accommodations (quiet rooms, noise-canceling headphones) are not continuously adaptive to context.

Artificial Intelligence (AI), machine learning (ML), and automatic speech recognition (ASR) can analyze real-time inputs—speech acoustics, heart rate variability (HRV), electrodermal activity (EDA), and electroencephalography (EEG)—to deliver personalized, dynamic supports (e.g., individualized speech models; closed-loop calming prompts). However, ASR performance can degrade on children’s speech and atypical voices, while physiological signals are confounded by motion and environment3,4,5,6.

Positioning relative to prior reviews: Prior surveys have summarized children’s ASR datasets/methods and emotion-AI for autism3,7. We extend beyond enumeration by thematically synthesizing device- and context-specific validity, child-ASR performance gaps, and equity/privacy implications, then operationalize these findings into a co-adaptive framework with explicit improvement targets.

Problem Statement and Rationale

Despite these advances, most AI-driven assistive technologies are built on flawed assumptions: universal biometric baselines, static intervention models, and limited environmental awareness. Datasets skew toward adult, neurotypical voices/faces, creating dataset bias and reduced accuracy for atypical and pediatric users. In ASR (automatic speech recognition), child speech remains harder than adult speech and demographic biases in modern models are increasingly reported8.

Objectives

  1. Aggregate peer-reviewed evidence on AI-based speech and sensory regulation tools for pediatric/neurodivergent users.
  2. Analyze how tools interpret neurophysiological signals (HRV/EDA/EEG) with attention to device and context.
  3. Evaluate limitations and implicit biases in current approaches using prespecified quality criteria.
  4. Propose and specify a co-adaptive, context-aware framework with measurable improvement targets.

Scope and Limitations

This review focuses on AI-driven tools for pediatric and neurodivergent populations in speech support and sensory regulation. Purely behavioral, non-AI interventions and adult-only tools without pediatric adaptations are excluded. This is a systematic review (no new experiments), reported per PRISMA-20209.

Theoretical/Conceptual Framework

The analysis is guided by a co-adaptive, context-aware perspective, treating assistive AI as a closed-loop system integrating multi-modal sensing, user preferences, and environmental signals to personalize interventions over time10,11.

Methods

Search Strategy

A literature search was conducted across Google Scholar, PubMed, IEEE Xplore, and ScienceDirect using keywords: “AI speech therapy,” “AI sensory regulation,” “neurodivergent children assistive technology,” “wearable stress detection AI,” and “machine learning for speech disorders.” Searches included literature from 2010–2025 and were last updated September 30, 2025. Reporting follows PRISMA-20209. Searches were conducted August 1–September 30, 2025, with an automatic alert to capture late-September updates. Only English-language peer-reviewed journal articles and peer-reviewed conference papers were included; theses, preprints without peer review, white papers, blogs, and company product pages were excluded from the formal inclusion set. Reference lists of eligible papers were hand-screened to identify additional studies.

Inclusion Criteria

  • Articles, conference papers, or reports published between 2010–2025.
  • Focus on AI-driven speech recognition or AI-based sensory regulation tools.
  • Application to children or neurodivergent populations.
  • Studies that describe tools, frameworks, or wearable technologies relevant to therapy or regulation.

Exclusion Criteria

  • Non-AI based therapeutic tools (e.g., purely behavioral interventions).
  • Papers without sufficient detail on methodology or application.
  • Tools designed solely for adults without adaptation to children.

Where vendor or product sites are mentioned for context, they are not counted toward the included peer-reviewed studies and are explicitly demarcated as such.

Screening & Selection (PRISMA-2020)

Identified: n = 92 → Duplicates removed: n = 12 → Screened (titles/abstracts): n = 80 → Excluded at screening: n = 48 → Full-text assessed: n = 32 → Excluded with reasons (adult-only = 3; non-AI = 2; insufficient methods = 2): n = 7 → Included: n = 25.

The PRISMA-2020 selection flow is summarized above.

Data Extraction

From the selected articles, data were extracted on: tool name, function, AI methodology, input signals, and target population. Tools were grouped into speech recognition and sensory regulation categories. Limitations and assumptions were identified through thematic synthesis.

Synthesis Method

We conducted a thematic narrative synthesis, grouping studies by (i) child ASR performance and adaptation methods and (ii) physiological sensing validity by device/site and activity context. Quantitative findings (e.g., WER deltas; HRV/EDA agreement patterns) were summarized, and device-/context-specific conclusions were drawn. Themes and takeaways were iteratively refined against the five quality criteria.

Quality Assessment

Two independent raters scored each study on five prespecified criteria: population fit, signal validity (e.g., HRV via RMSSD; tonic/phasic EDA), evaluation rigor (accuracy, ROC-AUC, calibration), dataset transparency (size/demographics reported), and reproducibility (clear methods/code). Inter-rater reliability was substantial (Cohen’s κ = 0.76); disagreements were resolved by consensus. We designate studies meeting ≥ 3 of 5 criteria as “higher quality” and report per-tool QA scores in Tables 1–2.

Results

Synthesis Summary

We included 25 studies (speech n = 10; sensory/wearables n = 12;  cross-modal/ethics/measurement n = 3). Most speech studies examined end-to-end ASR or personalized ASR; sensory studies used PPG-derived HR/HRV, EDA, and occasionally EEG. Device- and context-specific validity varied: HRV from PPG was more reliable at rest than during motion; wrist-site EDA was less stable than palmar EDA5,12,13. Child ASR underperformed adult ASR without adaptation, with improvements reported using child-specific data or parameter-efficient tuning3,14,15.

Tool Year Target Population Sample Size AI Method Key Findings QA Score (0-5) Authors/Sources
Tabby Talks (automated assessment for CAS) 2015 Children with CAS n=24 ML classifiers on acoustic/prosodic features Differentiated prosodic stress patterns; feasibility for progress tracking Shahin M, Ahmed B, Parnandi A, McKechnie J, Ballard KJ (2015). Speech Communication
Personalized ASR for atypical speech (Voiceitt-related user studies) 2024 Dysarthric/atypical speech (mixed ages) n=66 Personalized ASR; supervised fine-tuning Improved intelligibility/access; co-design with users Howarth E, Evans N, Cave R, et al. (2024). Assistive Technology
End-to-end child ASR (survey & benchmarks) 2022-2024 Children’s speech 100–400h child speech E2E ASR (Conformer/CTC/Transducer) Child WER is substantially higher than adult WER; adaptation and parameter-efficient tuning reduce the gap Bhardwaj S et al. (2022). Applied Sciences; Patel A et al. (2024). Applied Sciences; Rolland A et al. (2024). IberSPEECH
Disfluency detection (stuttering/repairs) 2020-2025 Fluency Disorders Varies CNN-BiLSTM, Transformers Improved F-scores via contextual embeddings/self-training Lou PJ, Johnson M (2020). arXiv; Kourkounakis T et al. (2020). arXiv; Benway NR et al. (2024). AJSLP
General-purpose ASR (e.g., Google STT) 2020+ Adults (general) – DNN ASR No pediatric/neurodivergent validation; not counted among included studies 2  Lou PJ, Johnson M (2020). arXiv; Kourkounakis T et al. (2020). arXiv; Benway NR et al. (2024). AJSLP
Table 1. AI-Driven Speech Recognition Tools 

DeviceSignalsStudy PopulationSample SizeKey FindingsQA Score (0-5)
Empatica E4PPG-HR/HRV, EDA, accelAdults + small youth samples14-60HRV acceptable at rest; EDA low accuracy vs. lab gold standards; motion reduces reliability4
Embrace/EmbracePlus (Empatica family)PPG-HR/HRV, EDA, temp, accelMixedMulti-studyValidated pipelines for stress/seizure contexts; pediatric validation limited3
Smartglasses (Empowered Brain)Eye-gaze, AR promptsAutistic childrenn≈12–21Improved classroom engagement; small non-randomized trials4
EEG neurofeedback (wearable)EEGAutistic childrenRCT n=60Greater gains in expressive language & joint attention vs. sham4
General wrist wearablesHR/HRV; limited EDAChildren/teens/adultsVariesHRV accuracy highly device- and context-dependent (rest>motion)4
Table 2. AI-Driven Sensory Regulation Tools/Wearables

#Authors (Year)Focus/MethodsMajor StrengthsLimitations
1Shahin et al. (2015) Speech CommunicationTabby Talks for CAS; ML on acoustic/prosodic featuresPediatric CAS focus; objective metricsSmall n=24; single-site
2Howarth et al. (2024) Assistive TechVoiceitt co-design & feasibility (dysarthria, mixed ages)Real-world users; accessibility outcomesMixed age; not child-only
3Bhardwaj et al. (2022) Appl. Sci.Survey of children’s ASR datasets/methodsWide coverage; child-specific issuesHeterogeneous metrics
4Patel et al. (2024) Appl. Sci.E2E ASR for children with adaptation (no child data)Shows adaptation without child dataBenchmarks lab-based
5Rolland et al. (2024) IberSPEECHParameter-efficient tuning for child ASRConcrete WER gainsConference-scale sample
6Lou & Johnson (2020) arXivSelf-training for disfluency detectionState-of-the-art F-scoresPreprint; adult transcripts
7Kourkounakis et al. (2020) arXivFluentNet disfluency detectionEnd-to-end deep model; public setsPreprint; not pediatric-focused 
8Koenecke et al. (2020) PNASASR bias across raceHigh-impact evidence of disparityNot pediatric-specific
9Goodwin et al. (2019) Autism ResearchWearable biosensors predict challenging behavior (ASD)Real-world, n≈20–30 sessions; predictive modelingDevice burden; generalization
10Milstein et al. (2020) Frontiers Behav. Neurosci.E4 physiological validityLab gold-standard comparisonAdult sample
11Costantini et al. (2023) SensorsE4 HRV/EDA agreement vs. gold standardStats (Bland–Altman; rho)EDA unreliable in motion
12Stuyck et al. (2022) Int J PsychophysiolWrist EDA validity vs. labSignal processing rigorWrist < palmar EDA
13Rehman et al. (2024) SensorsPPG-HR/HRV vs. ECG in free-livingDaily-life validity; systematicDevice-specific variation
14Alfonso et al. (2022) Sci RepPPG HR agreement across wearablesActivity-level analysisPediatric data limited
15Schuurmans et al. (2020) J Med SystE4 HRV validation (engineering/clinical)Hardware–metric mappingAdult data
16Sahin et al. (2018) Frontiers Educ.Empowered Brain smartglasses (ASD)Classroom engagement improvedNon-randomized, small n
17Wang et al. (2024) Curr Med SciRCT: wearable EEG neurofeedback in ASD (n=60)Significant gains vs. shamVendor affiliation noted
18Kalantarian et al. (2020) JMIR Ment HealthEmotion classifiers for ASD youthDiverse contexts; mobile sensingVariable accuracy real-world 
19Bhardwaj/Patel/Rolland (trio) 2022–24Child ASR improvement strategiesConvergent evidenceMostly lab datasets
20Page et al. (2021) BMJPRISMA-2020 guidelineReporting transparencyN/A (methods)
21Yairi & Ambrose (2013) J Fluency DisordStuttering epidemiologyBaseline prevalence/incidenceNot AI-specific
22Shriberg et al. (2019) Clin Linguist PhonCAS prevalenceDiagnostic clarity; estimatesNot AI-specific
23Teo et al. (2024) JAMIA Open/PMCFederated learning in healthcare (privacy)Relevance to pediatric dataSystems review; indirect
24Aminifar et al. (2024) Future Gen Comp SystEdge federated learning for wearablesPrivacy-preserving pipelineSystems-level focus
25Chng et al. (2025) BMJ Paediatrics OpenEthical AI for child healthChild-centered design guidancePolicy-leaning evidence
Table 3. Summary of Included Studies

Quantitative validity highlights (brief): Across validation studies, PPG-based HR/HRV shows stronger agreement with ECG at rest than during activity, and wrist EDA generally demonstrates lower fidelity than palmar EDA with higher motion sensitivity5,12,13,16. For child ASR, multiple sources converge that adaptation (e.g., speaker/age conditioning, parameter-efficient tuning) improves performance relative to adult-trained baselines3,14,17.

Discussion

Thematic Synthesis & Quantitative Highlights

Theme 1 — Dataset Bias & Child Speech

Child speech recognition typically exhibits substantially higher word error rates than adult speech in baseline systems; age-/speaker-aware adaptation and parameter-efficient tuning consistently reduce, but do not eliminate, this gap3,14,17.

Theme 2 — Physiological Signal Validity by Device/Context

Validation studies report that PPG-HR/HRV aligns with ECG at rest but agreement degrades with motion; wrist-EDA is noisier than palmar EDA and sensitive to movement and temperature. Implication: per-child calibration and activity-aware filtering are prerequisites for interpreting “stress”5,12,6,18,19.

Theme 3 — Classroom Feasibility & Neurofeedback

Small classroom trials of AR smartglasses report improved on-task engagement in autistic students20. An RCT of wearable EEG neurofeedback (n = 60) reports gains in expressive language and joint attention vs. sham, meriting replication and independent trials21. Real-world biosensor pilots show predictive value for challenging behavior but raise issues of burden and generalizability22.

Theme 4 — Equity, Privacy, and Child-Centered Design

Given pediatric biosignal sensitivity and demographic performance gaps, on-device inference, federated learning, and data minimization are recommended, paired with child-rights design and stratified reporting by age, diagnosis, language, and accent23,10,11.

Practical Case Scenarios (Illustrative)

  1. CAS therapy augmentation. A 7-year-old uses Tabby Talks weekly; prosodic stress errors are flagged and tracked, informing SLP drills. Limited multi-site pediatric datasets constrain generalizability, motivating collaborative corpora24.
  2. Classroom sensory regulation. An 8-year-old’s HRV drops and EDA rises during recess; accelerometry shows vigorous activity. Activity-aware models pause interventions, reducing false positives5,12.
  3. Fluency support. A 10-year-old who stutters benefits from disfluency-aware ASR; gains depend on child-speech training and balanced prosody data25,26,27.

Critical Engagement with the Literature

We move beyond cataloging by (i) contrasting device-site validity (wrist vs. palmar EDA; PPG-HRV vs. ECG under motion), (ii) highlighting child ASR gaps and adaptations, and (iii) challenging claims of “stress detection” from HRV/EDA alone without context gating, temperature, and motion features5,12,13. We explicitly do not present general-purpose ASR systems (e.g., Google STT, LumenVox) as validated pediatric therapies; these lack peer-reviewed pediatric/neurodivergent clinical evaluation and are not included among the 25 studies reviewed.

Proposed Framework

Inputs (multimodal): acoustics (child-adapted ASR), HRV (e.g., RMSSD/SDNN), EDA (phasic/tonic), optional EEG, accelerometry, ambient noise/light.

Calibration: per-child baseline (resting HRV; EDA reactivity); activity-aware filters suppress alerts during vigorous motion.

Adaptation: contextual bandits (policy learning) update interventions (breathing prompts, noise masking, visual cues) based on observed effectiveness and child preferences.

Privacy & Equity: on-device inference where feasible; federated learning for updates; stratified model cards; child-rights governance28,29,30.

Quantitative improvement targets (for future trials):

  • ≥ 30% reduction in false-positive “stress” alerts vs. rule-based thresholds via activity-aware gating (accelerometer + temperature + per-child baselines).
  • ≥ 20% WER reduction for child ASR vs. adult-only baselines using parameter-efficient tuning with limited child speech data14,17.

Evaluation plan for the co-adaptive framework. To ground the framework in empirical gains, we propose a two-arm, cluster-randomized pragmatic trial across classrooms or therapy groups with intention-to-treat analysis. Primary endpoints: (i) false-positive rate of stress/overload alerts (activity-aware gating vs. rule-based thresholds), (ii) child ASR word error rate on pediatric speech relative to an adult-trained baseline adapted with parameter-efficient tuning, and (iii) functional outcomes (e.g., minutes of on-task classroom engagement or standardized clinician-rated fluency scales). Secondary endpoints: caregiver/SLP usability (System Usability Scale, SUS), privacy perceptions, and subgroup performance stratified by age, diagnosis, language, and accent. Pre-registration, blinded outcome assessors, and open protocols are recommended to ensure reproducibility.

Limitations of This Review

We restricted inclusion to 2010–2025 peer-reviewed sources and seminal epidemiology. Some device validations remain adult-heavy; we flag applicability limits. Quality scoring, while structured and reliable (κ = 0.76), retains subjective elements9.

Ethical Considerations

Developing AI-driven assistive technologies for children requires careful attention to fairness, privacy, and autonomy. Because physiological and speech data are highly sensitive, systems should minimize data collection and rely on on-device inference or privacy-preserving techniques such as federated learning and differential privacy to prevent unauthorized access or re-identification28,29. Consent must go beyond a single parental signature—children should be asked for age-appropriate assent, and both consent and data sharing preferences should be revisited whenever models or sensing features change.

Bias and representativeness are equally critical. ASR and affect-recognition tools often perform unevenly across accents, ages, and diagnoses8; therefore, datasets should include diverse pediatric voices and report subgroup performance metrics to ensure equitable accuracy. Developers must also consider contextual safety: a rise in heart rate or electrodermal activity during play should not be misclassified as distress. Incorporating per-child baselines and activity-aware filters helps prevent such false alarms, reducing harm and stigma.

Finally, ethical design entails transparency and accessibility. Interfaces should clearly explain what is being sensed and allow families to pause or delete data at any time. Systems must augment rather than replace clinicians and educators, preserving human oversight for therapeutic decisions. By combining privacy-preserving computation, diverse data representation, informed consent, and human-in-the-loop governance, AI tools can better uphold fairness, dignity, and trust while supporting neurodivergent children’s communication and self-regulation30.

Conclusion

Across 25 peer-reviewed studies, AI-driven speech and sensory tools show clear promise yet face three core barriers to real-world efficacy for neurodivergent children: (1) dataset bias in ASR, (2) physiologic signal fragility (especially wrist EDA and motion-sensitive PPG-HRV), and (3) insufficient context-aware adaptation. Evidence from classroom AR and an RCT in wearable EEG neurofeedback indicates targeted, measurable benefits, but replication and scale-up are required31,32.

Our framework operationalizes personalized baselines, activity-aware sensing, and policy learning, implemented with privacy-preserving pipelines and child-rights governance29,30.

Concretely, the next wave of research should:

  • Build pediatric-inclusive, diverse speech corpora and physiologic datasets
  • Publish device- and context-specific accuracy for HRV/EDA/EEG with transparent error bars
  • Run multi-site pragmatic trials focused on functional outcomes (communication participation, classroom engagement)
  • Report stratified metrics (age, diagnosis, language, accent) and share open protocols.

If these steps are followed, AI can move from promising prototypes to clinically robust, equitable supports that meaningfully expand communication and self-regulation for neurodivergent children.

Acknowledgments

The author would like to thank adviser Ihwa K. Miao for guidance, and Evergreen Valley High School for support.

References

  1. E. Yairi, N. Ambrose, J Fluency Disord. 38, 66–87 (2013). []
  2. L.D. Shriberg, et al., Clin Linguist Phon. 33, 679–706 (2019). []
  3. S. Bhardwaj, et al., Applied Sciences 12(21), 11037 (2022). [] [] [] [] []
  4. A. Patel, et al., Improving end-to-end ASR for children without child-speech training data. Applied Sciences 14(12), 5286 (2024). []
  5. H. Stuyck, et al., Int J Psychophysiol. 179, 30–40 (2022). [] [] [] [] [] []
  6. R.Z.U. Rehman, et al., Sensors 24(21), 6826 (2024). [] []
  7. A. Landowska, et al., Emotion AI for autism: systematic review. Sensors 22(19), 7360 (2022). []
  8. A. Koenecke, et al., PNAS 117(14), 7684–7689 (2020). [] []
  9. M.J. Page, et al., BMJ 372, n71 (2021). [] [] []
  10. A. Aminifar, et al., Privacy-preserving edge FL for m-health wearables. Future Generation Computer Systems 154, 672–684 (2024). [] []
  11. S.Y. Chng, et al., Ethical considerations in AI for child health. BMJ Paediatrics Open 9, e003301 (2025). [] []
  12. S. Costantini, et al., Sensors 23, 8423 (2023). [] [] [] [] []
  13. R.Z.U. Rehman, et al., Sensors 24, 6826 (2024). [] [] []
  14. A. Patel, et al., Applied Sciences 14(12), 5286 (2024). [] [] [] []
  15. A. Rolland, et al., Parameter-efficient tuning for child speech recognition. In: IberSPEECH (2024). https://doi.org/10.21437/IberSPEECH.2024-53 []
  16. C. Alfonso, et al., Agreement between PPG-based wearables for HR. Scientific Reports 12, 14792 (2022). []
  17. A. Rolland, et al., IberSPEECH (2024). [] [] []
  18. C. Alfonso, et al., Scientific Reports 12, 14792 (2022). []
  19. A.A.T. Schuurmans, et al., J Med Syst. 44, 72 (2020). []
  20. Sahin N.T., et al., Smartglasses (Empowered Brain) for classroom engagement in ASD. Frontiers in Education 3, 9 (2018). []
  21. X.-N. Wang, et al., Wearable EEG neurofeedback in ASD: RCT. Current Medical Science 44(6), 1141–1147 (2024). []
  22. M.S. Goodwin, et al., Autism Research 12(5), 714–724 (2019). []
  23. Z.L. Teo, et al., Federated ML in healthcare: systematic review. JAMIA Open 7(3), ooae068 (2024). []
  24. M. Shahin, B. Ahmed, A. Parnandi, J. McKechnie, K.J. Ballard, Speech Communication 70, 49–64 (2015). []
  25. P.J. Lou, M. Johnson, arXiv:2004.05323 (2020). []
  26. T. Kourkounakis, A. Hajavi, A. Etemad, arXiv:2009.11394 (2020). []
  27. N.R. Benway, et al., American Journal of Speech-Language Pathology 33(4), 1632–1647 (2024). []
  28. Z.L. Teo, et al., JAMIA Open 7(3), ooae068 (2024). [] []
  29. A. Aminifar, et al., Future Generation Computer Systems 154, 672–684 (2024). [] [] []
  30. S.Y. Chng, et al., BMJ Paediatrics Open 9, e003301 (2025). [] [] []
  31. N.T. Sahin, et al., Frontiers in Education 3, 9 (2018). []
  32. X.-N. Wang, et al., Current Medical Science 44(6), 1141–1147 (2024). []

LEAVE A REPLY

Please enter your comment!
Please enter your name here