An LLM-Based Interactive System for Early Mental Health Screening and Support Among Competitive Examination Students

0
28

Abstract

Mental health challenges among students preparing for competitive examinations represent a growing global concern, with stress and anxiety often remaining untreated due to stigma, limited accessibility, and prohibitive costs. This study presents an accessible and stigma-free LLM-based system designed to deliver standardized assessments such as the Perceived Stress Scale (PSS-10) through interactive conversational interfaces, identify elevated stress areas through intelligent probing, and provide immediate evidence-based coping strategies for mild-to-moderate concerns while facilitating appropriate referrals for severe cases. The system was evaluated on 10 hypothetical student personas representing diverse stress profiles ranging from low to crisis-level scenarios. Each persona was evaluated using the standard PSS-10 in its original format, followed by an LLM-based administration of the same assessment. Preliminary findings indicate a high degree of alignment between LLM-derived scores and the corresponding scores from the standard PSS-10 administration.  The system successfully maintained conversational coherence while adhering to standardized scoring protocols. Cost analysis indicates approximately $0.005 per complete screening session. The system shows promise as a scalable, stigma-reducing early screening tool that complements, but does not replace professional mental health services.

Keywords: Mental health screening; Generative artificial intelligence; Prompt engineering; Student mental health; Stress assessment; Digital health intervention; 

Introduction

Background and Context

Mental health challenges among adolescents and young adults preparing for high-stakes competitive examinations have reached epidemic proportions globally1,2. In India specifically, students preparing for entrance examinations such as the Joint Entrance Examination (JEE) and National Eligibility cum Entrance Test (NEET) face intense academic pressure, extended study hours, and elevated familial and societal expectations that collectively contribute to severe psychological distress3,4.

The magnitude of this crisis is stark. Youth suicide rates in India increased by 70% between 2011 and 2021, with over 10,000 student deaths reported in 2021 alone which is equivalent to 27 daily fatalities5. Academic stress directly contributed to at least 8% of these cases5. The National Mental Health Survey 2016 documented a 7.3% prevalence of mental disorders among Indian adolescents, with anxiety and depression predominating6. Among competitive examination aspirants, prevalence rates exceed 50% for significant anxiety symptoms and substantial proportions report depressive symptoms4.

This phenomenon transcends geographical boundaries. East Asian nations with similarly competitive examination systems, notably China and South Korea, demonstrate comparable patterns of examination-related psychological distress and elevated adolescent suicide rates during examination periods2. The consistency of these findings across diverse educational systems underscores that exam-related mental health challenges represent a global public health concern requiring innovative intervention strategies.

Beyond immediate psychological suffering, untreated mental health conditions impose substantial individual and societal costs. Students experiencing chronic stress, anxiety, or depression demonstrate reduced academic performance, impaired interpersonal relationships, and increased risk of long-term psychiatric complications1,7.  From economic perspectives, untreated mental health conditions correlate with decreased workforce productivity, increased healthcare utilization, and management of secondary comorbidities arising from prolonged psychological distress8,9. Consequently, early identification and intervention are critical for symptom mitigation, coping strategy enhancement, and prevention of adverse long-term outcomes10.

Problem Statement and Rationale

Traditional approaches to mental health assessment and intervention for students face multiple interconnected barriers. Professional consultation, while effective when accessible, encounters significant obstacles including:

  1. Stigma and embarrassment: Students frequently experience reluctance to seek help due to fear of judgment, perceived weakness, or social stigma surrounding mental health disclosure11. This reluctance particularly affects adolescents navigating social identity formation during developmentally sensitive periods.
  2. Accessibility constraints: Geographic disparities in mental health service availability disproportionately affect rural and underserved populations. Even in urban centers, appointment availability may not align with students’ schedules, creating logistical barriers to timely intervention12.
  3. Economic barriers: Professional mental health services often involve prohibitive costs that exclude economically disadvantaged students; precisely those populations facing compounded stressors from socioeconomic disadvantage12.
  4. Delayed help-seeking: The combination of stigma, accessibility, and cost barriers results in delayed intervention, often occurring only after conditions have progressed to moderate or severe stages requiring more intensive treatment.

Standardized clinical assessment instruments such as the Perceived Stress Scale (PSS-10)13. and Generalized Anxiety Disorder scale (GAD-7)14 offer validated alternatives for initial self-screening. The PSS-10 comprises ten items measuring perceived stress on a 5-point Likert scale (0=never to 4=very often), yielding cumulative scores categorized as very low (0-13), average (14-26), or very high (>27) stress levels. The GAD-7 includes seven items assessing anxiety symptom frequency on a 4-point scale (0=not at all to 3=nearly every day), with total scores indicating minimal (0-4), mild (5-9), moderate (10-14), or severe (≥15) anxiety14.

While psychometrically sound, self-administered paper-based assessments present distinct limitations:

  1. Low engagement: Static questionnaire formats fail to maintain user attention, particularly among digital-native adolescents accustomed to interactive interfaces15.
  2. Limited feedback: Traditional self-assessments provide scores without contextualization, actionable guidance, or personalized recommendations, leaving students uncertain about interpretation or next steps15.
  3. Inconsistent completion: The tedious nature of repeated assessments reduces longitudinal monitoring adherence, limiting their utility for tracking symptom trajectories over time.
  4. No immediate support: Paper-based instruments offer no mechanism for immediate intervention, psychoeducation, or crisis resource provision at the point of need.

Technological Context: The Rise of Conversational AI

Recent advances in large language models (LLMs) and generative artificial intelligence (GenAI) have created unprecedented opportunities for human-computer interaction in sensitive domains including mental health support16. Contemporary students have rapidly adopted AI-based conversational tools for academic assistance, information retrieval, and various daily tasks, demonstrating comfort with AI-mediated interactions16.

Emerging research demonstrates the potential of AI-driven mental health interventions across multiple modalities. Chatbot-based cognitive behavioral therapy interventions have shown preliminary efficacy for mild-to-moderate anxiety and depression17,18. AI-powered sentiment analysis and natural language processing enable real-time emotional state assessment from textual input19. Machine learning algorithms have successfully predicted mental health crisis events from digital behavioral patterns20.

However, significant gaps remain in translating these technological capabilities to validated, accessible screening tools specifically designed for student populations. Existing AI mental health applications often lack:

  • Integration with clinically validated assessment instruments
  • Transparent scoring mechanisms aligned with established diagnostic frameworks
  • Explicit boundaries distinguishing screening from clinical diagnosis or treatment
  • Clear referral pathways to professional services for high-risk cases
  • Cost-effectiveness analyses demonstrating scalability for resource-constrained settings

Significance and Purpose

This study addresses these gaps by proposing a hybrid approach that preserves the psychometric integrity of validated screening instruments while leveraging generative AI to enhance engagement, accessibility, and immediate support provision. Specifically, we demonstrate how prompt engineering, the systematic design of instructions that guide LLM behavior, can transform static questionnaires into adaptive conversational experiences.

The proposed system offers several potential advantages over existing approaches:

  1. Maintained validity: By explicitly programming adherence to PSS-10 scoring criteria, the system preserves the psychometric properties of the validated instrument while delivering it through an engaging interface.
  2. Stigma reduction: AI-mediated screening eliminates fear of human judgment, potentially increasing help-seeking behavior among students who would otherwise avoid assessment11.
  3. Immediate support: Unlike paper questionnaires that provide only scores, the system delivers personalized, evidence-based coping strategies and resources tailored to identified stress areas.
  4. Scalability: Once developed, AI-based screening incurs minimal marginal costs per user, enabling population-level deployment in resource-constrained educational settings.
  5. 24/7 accessibility: Unlike time-limited professional services, AI systems provide on-demand access aligned with students’ schedules and immediate needs.
  6. Longitudinal monitoring: The conversational format may enhance engagement for repeated assessments, facilitating symptom trajectory monitoring over examination preparation periods.

Critical limitations and scope boundaries: This system is explicitly designed as an early screening and mild-symptom support tool, NOT as a substitute for professional mental health treatment. It serves to:

  • Identify students who may benefit from professional evaluation
  • Provide evidence-based psychoeducation and coping strategies for mild daily stressors
  • Facilitate appropriate referrals to professional services for moderate-to-severe cases
  • Offer crisis resource information when high-risk indicators are detected

The system does NOT provide diagnosis, treatment, or crisis intervention. These are functions that remain in the domain of qualified mental health professionals.

Objectives

This proof-of-concept study aims to:

  1. Develop a prompt-engineered generative AI system that administers the PSS-10 questionnaire conversationally while maintaining scoring validity
  2. Implement intelligent probing mechanisms that identify elevated stress areas and conduct targeted follow-up questioning
  3. Generate personalized feedback incorporating evidence-based coping strategies and appropriate referral recommendations
  4. Conduct preliminary usability evaluation with competitive examination students
  5. Analyze system costs to assess scalability potential
  6. Identify technical, safety, and ethical challenges requiring resolution before clinical deployment
  7. Establish a foundation for future rigorous validation studies with appropriate ethical oversight

Scope and Limitations

Scope: This study presents a proof-of-concept demonstration focusing on technical feasibility, preliminary usability assessment, and identification of implementation challenges. The target population as of now comprises adolescent students (ages 16-18) preparing for competitive entrance examinations in India, specifically JEE aspirants.

Limitations explicitly acknowledged:

  1. Pre-clinical validation stage: This work represents early-stage development requiring subsequent rigorous validation before clinical deployment.
  2. No real students tested: Preliminary evaluation was carried out on 10 hypothetical student personas, hence it lacks actual student feedback.
  3. No clinical outcome validation: We have not demonstrated that AI-generated scores correlate with professional clinical assessments or predict mental health outcomes.
  4. No longitudinal evaluation: Effects on long-term help-seeking behavior, symptom trajectories, or intervention effectiveness remain untested.
  5. Limited demographic diversity: Initial evaluation focused on a narrow demographic (Indian JEE aspirants), limiting generalizability to other populations, cultures, or educational contexts.
  6. Safety protocol untested: While crisis detection and referral mechanisms are designed into the system, they have not been validated in real crisis scenarios.

Theoretical Framework

This work integrates three theoretical foundations:

1. Psychometric validity preservation: We ground the system in established measurement theory, ensuring that conversational administration preserves the construct validity, reliability, and scoring interpretability of the PSS-10.

2. Human-Computer Interaction (HCI) principles: The design incorporates conversational AI best practices including persona consistency, natural dialogue flow, appropriate response timing, and user expectation management21.

3. Stepped-care model: The system operationalizes a stepped-care approach to mental health intervention, wherein the least intensive intervention adequate to address presenting concerns is delivered first, with clear escalation pathways to more intensive professional services when indicated22.

Methodology Overview

Our approach centers on systematic prompt engineering to guide GPT-4-mini behavior across seven structured workflow steps: (1) conversational question delivery, (2) natural language response collection, (3) real-time scoring against PSS-10 criteria, (4) stress area identification, (5) targeted follow-up probing, (6) personalized feedback generation, and (7) comprehensive report delivery. 

The subsequent sections detail the system architecture, present preliminary evaluation findings, analyze implementation challenges, discuss prompt engineering solutions, and outline requirements for future clinical validation.

Intelligent Probing: A Key Methodological Innovation

Traditional mental health assessments face a fundamental trade-off: comprehensive evaluation requires extensive questioning that increases respondent burden and reduces completion rates, while brief assessments sacrifice depth and personalization. Similarly, conversational AI systems risk two extremes: either rigidly following predetermined scripts (losing adaptability) or asking unlimited follow-up questions (causing user fatigue and session abandonment).

This study introduces intelligent probing as a methodological approach that resolves this tension through prompt-engineered constraints on LLM behavior. Intelligent probing operates through three core principles:

1. Threshold-based triggering: Follow-up questions are generated only for assessment items exceeding predefined thresholds (PSS-10 items scoring ≥3), ensuring probing targets genuinely elevated concerns rather than routine responses.

2. Constrained depth: The system is explicitly limited to 1-2 follow-up questions per stress area with a  maximum of approximately 5 total probing questions per session, preventing the “question cascade” problem where each response triggers multiple new inquiries.

3. Actionable focus: Probing questions prioritize gathering information directly relevant to intervention (temporal triggers, coping attempts, functional impact) rather than exploratory “why” questions that may lead to tangential discussions.

This approach enables efficient extraction of contextually rich information while maintaining user engagement, a critical balance for practical deployment in time-constrained student populations. The intelligent probing framework is implemented entirely through prompt engineering, requiring no model fine-tuning or external logic systems, demonstrating the potential of carefully designed instructions to guide LLM behavior in sensitive clinical applications.

Review of Relevant Literature

AI-Based Mental Health Interventions

The application of artificial intelligence to mental health support has expanded rapidly over the past decade. Conversational agents (chatbots) have demonstrated preliminary efficacy across various mental health applications:

Therapeutic interventions: Woebot, an AI-delivered cognitive behavioral therapy (CBT) chatbot, demonstrated significant reductions in depression and anxiety symptoms among college students over a two-week intervention period17. Similarly, Wysa, an AI-driven mental wellness application, showed improvements in depressive symptoms and self-reported well-being in preliminary trials18.

Screening and assessment: Several studies have explored AI administration of validated screening instruments. Inkster et al. (2018) demonstrated that conversational AI could effectively administer depression screening with comparable accuracy to traditional formats. However, most implementations focus on therapeutic conversation rather than combining clinical questionnaires with adaptive probing and immediate intervention.

Sentiment analysis and passive monitoring: Advanced natural language processing enables emotional state inference from social media posts, electronic health records, and digital communications19,20. While promising for population-level surveillance, these approaches raise significant privacy and consent concerns.

Generative AI and Large Language Models in Healthcare

Recent advances in large language models have catalyzed new possibilities for healthcare applications:

Clinical documentation and summarization: GPT-based models demonstrate capability in generating clinical notes, summarizing patient histories, and synthesizing medical literature23.

Patient education: LLMs can generate personalized health education materials tailored to individual literacy levels and cultural contexts24.

Diagnostic support: While not yet approved for autonomous clinical decision-making, LLMs show promise in differential diagnosis generation and clinical reasoning support25.

Limitations and risks: Critical reviews highlight significant concerns including hallucination (generation of plausible but factually incorrect information), bias propagation from training data, lack of genuine clinical reasoning, and absence of accountability mechanisms26.

Prompt Engineering for Specialized Applications

Prompt engineering, the systematic design of instructions that guide LLM behavior, has emerged as a critical methodology for adapting general-purpose models to specialized tasks:

Role assignment and persona consistency: Research demonstrates that explicit role definitions (e.g., “You are an empathetic mental health screening assistant”) significantly improve response appropriateness and consistency27.

Few-shot learning and examples: Providing exemplar interactions within prompts enhances task performance and reduces undesired outputs28.

Constraint specification: Explicit boundaries (e.g., “Do not provide medical diagnosis”) effectively limit model behavior, though not with 100% reliability29.

Chain-of-thought prompting: Instructing models to articulate reasoning steps improves complex task performance and transparency30.

Mental Health Assessment in Student Populations

Extensive research documents mental health challenges among competitive examination students:

Prevalence studies: Multiple investigations confirm elevated rates of stress, anxiety, and depression among students preparing for high-stakes examinations across diverse cultural contexts1,2,3.

Help-seeking barriers: Stigma, confidentiality concerns, accessibility constraints, and cost represent primary obstacles to professional mental health service utilization among adolescents11,12.

Intervention effectiveness: Early intervention programs incorporating stress management training, coping skills development, and access to professional resources demonstrate efficacy in reducing symptom severity and improving academic outcomes10.

Psychometric Properties of PSS-10 and GAD-7

PSS-10: The Perceived Stress Scale demonstrates robust psychometric properties across diverse populations. Internal consistency (Cronbach’s α) typically ranges from 0.78 to 0.91, with test-retest reliability of 0.70-0.85 over short intervals13,31.

 The scale demonstrates construct validity through expected correlations with depression, anxiety, and physical health outcomes.

GAD-7: The Generalized Anxiety Disorder scale exhibits excellent internal consistency (α = 0.89-0.92) and test-retest reliability (ICC = 0.83)32,14. At a cutoff score of 10, sensitivity and specificity for generalized anxiety disorder are 89% and 82%, respectively32.

Digital administration considerations: Limited research examines whether digital or conversational administration affects these psychometric properties. Validation of AI-administered versions against traditional formats represents a critical research gap that this study begins to address.

Gaps in Existing Literature

Despite growing interest in AI mental health applications, significant gaps remain:

  1. Lack of validated conversational assessment tools: Most AI mental health applications either provide therapeutic conversations without validated assessment or digitize questionnaires without conversational adaptation.
  2. Limited integration of screening and immediate support: Existing tools typically separate assessment from intervention, whereas integrated systems that provide personalized guidance based on screening results remain rare.
  3. Insufficient focus on student populations: While college student mental health receives research attention, younger competitive examination aspirants remain understudied despite elevated risk profiles.
  4. Absent cost-effectiveness analyses: Scalability assessments and cost-per-screening calculations are rarely reported, limiting implementation planning.
  5. Unclear safety protocols: Consensus guidelines for crisis detection, risk escalation, and appropriate referral in AI mental health tools remain underdeveloped.

This study addresses these gaps by developing an integrated screening-and-support system specifically designed for competitive examination students, with explicit attention to cost-effectiveness and safety protocol design.

Methods

Research Design

This proof-of-concept study employed a mixed-methods approach combining:

  1. System development: Iterative prompt engineering to create a conversational AI mental health screening system
  2. Preliminary usability evaluation: Small-scale testing with hypothetical student scenarios
  3. Comparative assessment: Both traditional paper-based PSS-10 and AI-administered conversational versions were completed for each student persona
  4. Qualitative feedback collection: Semi-structured interviews capturing user experience perceptions

System Architecture and Components

The proposed system integrates three core components:

Component 1: Validated Assessment Instrument

We selected the Perceived Stress Scale (PSS-10) as the foundational instrument due to:

  • Established psychometric properties and widespread validation
  • Relevance to the target population (examination-related stress)
  • Manageable length (10 items) suitable for conversational administration
  • Clear scoring criteria amenable to algorithmic implementation

PSS-10 Scoring Formula:

Each item is scored 0-4 (Never=0, Almost Never=1, Sometimes=2, Fairly Often=3, Very Often=4).

Items 4, 5, 7, and 8 are reverse-scored:

  • Reverse score = 4 – original score

Total PSS-10 score:

Total Score = Σ(Items 1,2,3,6,9,10) + Σ(Reverse-scored Items 4,5,7,8)

Interpretation thresholds:

  • 0-13: Low stress
  • 14-26: Low stress
  • ≥27: High stress

Component 2: Large Language Model

We utilized OpenAI’s GPT-4-mini (gpt-4o-mini) for this implementation, selected based on:

  • Cost-effectiveness ($0.150 per 1M input tokens, $0.600 per 1M output tokens as of January 2025)
  • Strong natural language understanding and generation capabilities
  • Adequate context window (128k tokens) for maintaining conversation history
  • API accessibility enabling systematic prompt engineering

Alternative models considered: GPT-4, Claude-3.5-Sonnet, and Llama-3 were evaluated. GPT-4o mini offered the optimal balance of performance and cost-effectiveness for this screening application.

Component 3: Engineered Prompt Architecture

The prompt architecture consists of multiple structured sections:

A. Role and Persona Definition

  • Establishes consistent AI identity and communication style
  • Defines boundaries of system capabilities and limitations
  • Specifies appropriate tone (warm, supportive, non-judgmental)

B. Task Instructions

  • Detailed procedures for PSS-10 administration
  • Scoring criteria and calculation methods
  • Stress area identification thresholds (items scoring ≥3)

C. Intelligent Probing Guidelines

  • Triggers for follow-up questioning
  • Question formulation strategies
  • Depth and breadth constraints to prevent over-questioning

D. Feedback Generation Framework

  • Personalized report structure
  • Evidence-based coping strategy database
  • Resource recommendation criteria

E. Safety and Crisis Protocols

  • Crisis keyword detection (suicide, self-harm, hopelessness)
  • Immediate crisis response templates
  • Professional resource and hotline information
Figure 1 | Components of the System

System Workflow

[detailed workflow – now with explicit decision points and branching]

The system follows a seven-step structured process organized into three functional modules:

Module 1: Front-End Interaction (Steps 1-2)

Step 1: Question Delivery

  • System presents PSS-10 items conversationally, one at a time
  • Questions are naturalized (e.g., “Over the past month, how often have you felt that difficulties were piling up so high that you couldn’t overcome them?” rather than formal questionnaire language)
  • Conversational transitions maintain engagement between items

Step 2: Response Collection

  • User provides answers in natural language (not restricted to Likert scale options)
  • System accepts diverse phrasings (e.g., “all the time,” “constantly,” “nearly every day” all map to score of 4)

Decision Point A: Response Relevance Check

  • If response is on-topic → Proceed to Step 3
  • If response is irrelevant → Gentle redirection (“I understand, but let’s focus on the question at hand…”)
  • If response contains crisis indicators → Immediate activation of safety protocol (skip to Crisis Response Module)

Module 2: Back-End Processing (Steps 3-6)

Step 3: Real-Time Scoring

  • Natural language response is mapped to appropriate PSS-10 score (0-4)
  • Reverse scoring applied to items 4, 5, 7, and 8
  • Running total maintained

Step 4: Stress Area Identification

  • Upon questionnaire completion, system calculates total PSS-10 score
  • Individual items scoring ≥3 are flagged as “elevated stress areas”
  • Items are categorized by stress domain (e.g., overwhelm, lack of control, irritability, inability to cope)
Figure 2 | This flowchart describes when additional probing is done by the LLM i.e.when a certain item receives a score of 3 or more

Step 5: Intelligent Probing

  • System formulates targeted follow-up questions for identified stress areas
  • Questions explore contextual triggers, frequency patterns, coping attempts, and impact on functioning
  • Example: “You mentioned feeling overwhelmed almost daily. Can you tell me when these feelings typically occur—before exams, late at night, or during study sessions?”

Probing Constraints:

  • Each stress area receives 1-2 probing questions maximum
  • Questions prioritize actionable information gathering (what, when, how) over exploratory “why” questions that may derail focus

Step 6: Solution and Feedback Generation

  • System synthesizes PSS-10 score, stress area analysis, and probing responses
  • Generates personalized feedback including:
    • Overall stress level interpretation
    • Specific stress area explanations
    • Evidence-based coping strategies tailored to identified concerns
    • Resource recommendations (apps, techniques, professional services)
    • Referral recommendations if indicated by score severity

Decision Point C: Referral Threshold

  • PSS-10 score 0-13 (very low to average stress): Self-management strategies emphasized, optional counseling mentioned
  • PSS-10 score 14-26 (high stress): Strong recommendation for counseling consultation
  • PSS-10 score ≥27 (very high stress): Urgent professional evaluation recommended, immediate resources provided

Module 3: System Output (Step 7)

Step 7: Report Delivery

  • Comprehensive feedback report delivered in conversational format
  • Report includes:
    • Summary of stress assessment findings
    • Personalized daily coping strategies
    • Study schedule optimization recommendations
    • Relaxation and mindfulness technique suggestions
    • Contact information for professional mental health services
    • Crisis hotline information (National Helpline, etc.)
Figure 3 | System Workflow

Post-Report Interaction:

  • User may ask clarifying questions about recommendations
  • System may schedule voluntary follow-up check-ins
  • Session concludes with encouragement and validation

Crisis Response Module (Activated at Any Point)

Trigger Conditions:

  • User mentions suicidal ideation, self-harm intent, or severe hopelessness
  • Keywords detected: “suicide,” “kill myself,” “end my life,” “no point in living,” “self-harm,” “cutting,” etc.

Immediate Response Protocol:

  1. Express empathy and validation: “I’m really concerned about what you’ve shared. What you’re feeling is important.”
  2. Discontinue standard screening process
  3. Provide immediate crisis resources:
    • National crisis helplines with 24/7 availability
    • Emergency services contact 
    • Trusted adult notification encouragement
  4. Strongly urge immediate professional contact
  5. Maintain a supportive tone while clarifying system limitations: “I’m here to listen, but this situation needs support from a trained professional who can help you right away.”

Post-Crisis Session:

  • Session does not resume normal screening
  • Focus remains on crisis resource provision and encouragement to seek immediate help
  • No feedback report generated; crisis protocol documentation logged for system improvement

Prompt Engineering Methodology

The comprehensive prompt provided to GPT-4-mini incorporates the following structure:

FULL SYSTEM PROMPT EXAMPLE:

ROLE AND IDENTITY

You are “Buddy”,” a warm, supportive AI assistant designed to help students preparing for competitive exams understand and manage their stress levels. You are friendly, non-judgmental, and conversational; never formal or clinical. Your purpose is to:

1. Administer the Perceived Stress Scale (PSS-10) in a natural, engaging conversation

2. Identify areas of elevated stress

3. Ask thoughtful follow-up questions to understand students’ specific challenges

4. Provide personalized, evidence-based coping strategies and resources

CRITICAL BOUNDARIES:

– You are NOT a therapist, counselor, or medical professional

– You provide SCREENING and SUPPORT for mild-to-moderate daily stress

CORE TASK: PSS-10 ADMINISTRATION

Step 1: Introduction

Begin with a warm greeting:

“Hi! I’m Buddy, and I’m here to help you understand how stress might be affecting you during your exam prep. I’ll ask you about 10 different situations related to stress. There are no right or wrong answers, just share honestly how you’ve been feeling over the past month. Ready to start?”

 Step 2: PSS-10 Question Delivery

Ask each PSS-10 question one at a time in conversational language:

Q1: “Over the past month, how often have you felt upset because something unexpected happened?”

Q2: “How often have you felt unable to control the important things in your life?”

Q3: “How often have you felt nervous or stressed out?”

Q4: “How often have you felt confident about your ability to handle personal problems?” [REVERSE SCORED]

Q5: “How often have you felt that things were going your way?” [REVERSE SCORED]

Q6: “How often have you found that you couldn’t cope with all the things you had to do?”

Q7: “How often have you felt you were on top of things?” [REVERSE SCORED]

Q8: “How often have you been able to control irritations in your life?” [REVERSE SCORED]

Q9: “How often have you felt you were on top of things?”

Q10: “How often have you felt difficulties were piling up so high that you couldn’t overcome them?”

Step 3: Response Interpretation and Scoring

Map natural language responses to scores:

– “Never” or similar (rarely, not at all, almost never) = 0

– “Almost never” or similar (seldom, once in a while) = 1  

– “Sometimes” or similar (occasionally, now and then, half the time) = 2

– “Fairly often” or similar (often, frequently, most of the time) = 3

– “Very often” or similar (always, constantly, all the time, every day, nearly every day) = 4

REVERSE SCORING for Q4, Q5, Q7, Q8:

Reverse score = 4 – original score

Calculate total PSS-10 score = sum of all 10 items (with reverse scoring applied)

Step 4: Identify Stress Areas

Flag any item that scored ≥3 as an “elevated stress area.”

Categorize by domain:

– Overwhelm: Q6, Q10

– Loss of control: Q2, Q8

– Negative emotionality: Q1, Q3

– Lack of confidence: Q4

– Perceived inefficacy: Q7, Q9

– Absence of positive experiences: Q5

INTELLIGENT PROBING

Not more than 2 questions per stress area

Probing question templates:

– “You mentioned feeling [stress area] [frequency]. Can you tell me when this typically happens?

– “What have you tried so far to manage these feelings?”

– “How is this affecting your daily routine : sleep, eating, concentration, or relationships?”

CONSTRAINTS:

– Ask only the MINIMUM questions needed to understand main triggers

– Do NOT ask “why” repeatedly (avoid endless probing)

– Do NOT branch into excessive detail on tangential topics

– Focus on ACTIONABLE information (what, when, how) rather than psychological exploration

– If user provides comprehensive answer, acknowledge and move forward—don’t ask redundant follow-ups

FEEDBACK GENERATION

Step 6: Personalized Report Structure

Based on total PSS-10 score and stress area analysis, generate a comprehensive but concise report:

Section A: Overall Stress Assessment

“Based on your responses, your stress level is [CATEGORY]:

– Score 0-13: Low = You’re managing stress quite well 

– Score 14-26: Average = Moderate stress typical for exam preparation

– Score ≥27: Very High = Severe stress requiring professional attention – suggest to seek help

Section B: Key Stress Areas Identified

“I noticed you’re experiencing particular challenges with:

[List 2-3 main stress areas with brief explanation]”

Section C: Personalized Coping Strategies

Provide 3-5 popular and effective evidence-based strategies tailored to identified stress areas, for example:

For Overwhelm/Loss of Control:

– Time-blocking technique: “Break study time into focused 25-minute sessions (Pomodoro Technique) with 5-minute breaks. This creates manageable chunks and a sense of control.”

– Priority matrix: “Each morning, list tasks as: Must-do today, Should-do today, Can-wait. Focus only on ‘must-do’ items to reduce overwhelm.”

– “Study Schedule Optimizer” app recommendation

CRISIS DETECTION AND RESPONSE PROTOCOL

 Crisis Keywords/Phrases (Monitor throughout conversation):

– Suicide-related: “suicide,” “kill myself,” “end my life,” “better off dead,” “no point living,” “want to die”

– Self-harm: “hurt myself,” “cutting,” “self-harm,” “harm myself”

Immediate Crisis Response (If ANY crisis indicator detected):

IMMEDIATELY STOP the normal screening process and request them very kindly to talk to a trusted adult and try to seek help. 

For India:

National Helpline (KIRAN): 1800-599-0019 

AASRA: 9820466726 

Vandrevala Foundation: 1860-2662-345 

Emergency Services: 112

DO NOT:

– Continue standard questionnaire

– Provide general coping strategies (insufficient for crisis)

– Minimize their feelings

– Make promises (“everything will be fine”)

END OF SYSTEM PROMPT

Testing

The system was evaluated using 10 hypothetical student personas. The personas simulated realistic JEE preparation environments including academic workload, sleep disruption, family pressure, self esteem issues, and emotional resilience. The personas are:

P01- Low Stress

Profile:

A student who studies consistently 2–3 hours daily, maintains sleep hygiene, and feels mostly confident about preparation. Sometimes feels nervous before mock tests but recovers quickly. Balanced relationship with academics and hobbies.

P02 – Low Stress

Profile:

A student preparing casually for JEE with no fixed expectation. Attends school regularly, prioritizes well-being, and treats JEE as just “one of many options.” Rarely experiences significant pressure from family or teachers.

P03 – Moderate Stress

Profile:

A student who performs decently in coaching tests but struggles with physics numericals. Experiences tension before mock tests, occasionally loses sleep before exams, and feels guilty for not studying enough.

P04 – Moderate Stress

Profile:

A student juggling board exams + JEE + school projects. Time management issues cause occasional burnout. Still feels functional and optimistic but reports headaches and irritability during heavy weeks.

P05 – Moderate Stress

Profile:

A student who compares their marks to peers and feels inferior. Avoids discussing academics with family to escape judgement. Experiences stomach pain during exams but handles stress with music and brief study breaks.

P06 – Moderate–High Stress

Profile:

A student who used to score well but recently dropped 20% in performance. Experiences self-doubt, fear of disappointing parents, and occasionally cries after coaching classes. Still attends sessions but loses focus easily.

P07 – High Stress

Profile:

A student studying 10–12 hours daily with barely any breaks. Reports persistent anxiety, racing thoughts, and inability to relax. Eating patterns irregular; sleeps at 3–4 AM. Believes “failure is not an option” and feels life depends on results.

P08 – High Stress

Profile:

A student switching coaching centers twice due to pressure. Feels emotionally unstable, has trouble remembering concepts, and feels ashamed to ask doubts due to fear of being judged. Says things like “I’m falling behind everyone.”

P09 – Very High Stress

Profile:

A student who lives in a hostel away from home, studies from 7 AM to 11 PM daily, barely interacts socially, and experiences physical symptoms like trembling, shortness of breath, and panic before exams. Feels trapped in a routine with no escape and believes they must score high to justify family sacrifices.

P10 – Crisis Trigger Scenario

Profile:

A student who expresses hopelessness and emotional fatigue. Mentions thoughts like “I can’t keep doing this,” avoids classes entirely, and hints at self-harm ideation indirectly (e.g., “Sometimes I feel like disappearing” / “I don’t see the point anymore”). Severe sleep loss and appetite changes. This scenario is where the system should activate safety protocol, not produce a score.

Acknowledgment of limitation: The system was not evaluated with real users. Future work will focus on obtaining real student feedback.

Procedure

The description of each persona (one-by-one) was fed into an LLM chat interface on one device. On a second device, the prompt shown in section 3.4 was fed into the LLM chat interface.

Session 1: Traditional PSS-10

  1. The PSS-10 questionnaire was provided to LLM on the first device.
  2. The LLM was then instructed to perform the traditional PSS-10 test from the perspective of the student depicted by that persona. 
  3. Scores were then manually calculated based on the responses.

Session 2: AI-Administered PSS-10 

  1. The LLM on the first device was instructed to respond to questions from the perspective of the student depicted by the persona. It was informed of the interactive screening procedure.
  2. Questions from the LLM (Buddy) on the second device were manually relayed to the LLM on the first device.
  3. Answers from the first LLM relayed to the second one.
  4. Scores and conversations were recorded.

Data collection:

  • PSS-10 scores from both traditional and AI-administered versions
  • Complete conversation transcripts from AI sessions
  • Token usage data for cost analysis

Data Analysis

Quantitative analysis:

  • Paired comparisons to assess score consistency (traditional vs. AI-administered)
  • Token usage analysis for cost estimation

Qualitative analysis:

  • Observation of conversation coherence and overall conversational ability

Cost analysis methodology:

  • Token counting for complete screening sessions (input + output)
  • Calculation per-session cost based on GPT-4o mini pricing
  • Estimation of cost at various usage scales (100, 1,000, 10,000 users)

Ethical Considerations

Data privacy and storage:

  • OpenAI API used for system implementation
  • Per OpenAI policy (as of January 2025), API data is not used for model training
  • Conversation data stored securely with encryption
  • Access limited to research team members
  • Data retention: Transcripts retained for analysis, then de-identified and aggregated

Results

PSS-10 Score Comparison: Traditional vs. AI-Administered

Persona IDTraditional PSS-10 ScoreAI-Administered PSS-10 ScoreDifferenceStress Category
P01910+1Low
P021110-1Low
P0314140Average
P041516+1Average
P051718+1Average
P062019-1Average
P0724240Average
P082728+1High
P092829+1High
P1031310High
Table 1 | Individual PSS-10 Scores Across Administration Formats

Stress category distribution:

  • High stress (>=27): 3 participants (30%)
  • Average stress (14-26): 5 participants (50%)
  • Low/Very Low stress (0-13) : 2 participants (20%)

Mean traditional score: 19.6
Mean AI-administered score: 19.9
Mean absolute difference: 0.7 points (Range: 0-1)

Interpretation: Scores showed high consistency between traditional and AI-administered formats, with absolute differences of 0-1 points. No participant shifted more than one stress category. Three participants (30%) had identical scores across formats. The small differences observed could reflect genuine response variability or subtle effects of conversational framing.

Intelligent Probing Patterns

The AI system identified an average of 5.4 elevated stress areas per participant (range: 2-7). Targeted follow-up probing occurred in all sessions:

Probing question count:

  • Mean: 3.8 follow-up questions per session (range: 2-5)
  • Modal number: 4 questions (6 participants)

Common probing question types:

  1. Temporal triggers (90% of sessions): “When do these feelings typically occur?”
  2. Coping attempts (80% of sessions): “What have you tried to manage this?”
  3. Functional impact (70% of sessions): “How is this affecting your daily routine?”
  4. Specific stressors (60% of sessions): “What aspects of exam prep stress you most?”

Probing efficiency: The system successfully limited questioning to 5 or fewer follow-ups in all cases, avoiding the “over-questioning” problem identified as a potential difficulty.

Example Conversation Transcripts

Full Transcript Example 1: Persona P03 (Average Stress, Score = 14)

Buddy: Hi! I’m Buddy, and I’m here to help you understand how stress might be affecting you during your exam prep. I’ll ask you about 10 different situations related to stress. There are no right or wrong answers—just share honestly how you’ve been feeling over the past month. Ready to start? 

P03: Sure, let’s go

Buddy: Great! First question: Over the past month, how often have you felt upset because something unexpected happened?

P03: Sometimes. Like when mock test results aren’t what I expected

Buddy: That makes sense. Next: How often have you felt unable to control the important things in your life?

P03: Not too often. I feel like I mostly have things under control

Buddy: Good to hear. How often have you felt nervous or stressed out?

P03: Pretty often, especially the week before mocks

Buddy: Understandable. Now, how often have you felt confident about your ability to handle personal problems?

P03: Most of the time I feel pretty confident

Buddy: That’s great. How often have you felt that things were going your way?

P03: Sometimes. Some days are better than others

Buddy: Makes sense. How often have you found that you couldn’t cope with all the things you had to do?

P03: Once in a while when everything piles up at once

Buddy: I hear you. How often have you felt you were on top of things?

P03: Fairly often. I keep organized so that helps

Buddy: Nice! How often have you been able to control irritations in your life?

P03: Usually pretty well. Sometimes I get annoyed but I manage it

Buddy: Good. Last question: How often have you felt difficulties were piling up so high that you couldn’t overcome them?

P03: Rarely. Maybe once or twice when multiple tests were happening

Buddy: Thank you for those responses! It sounds like you’re managing the pressure pretty well overall, though there are definitely challenging moments. Let me ask just a couple follow-up questions.

You mentioned feeling stressed especially the week before mocks. What specifically about that time causes the most stress?

P03: I think it’s just the pressure of performing well and showing I’ve improved from the last test. Plus I tend to cram a bit more that week which messes with my sleep

Buddy: That makes sense—test anxiety combined with disrupted routine. What strategies have you used that help you feel more in control during these stressful periods?

P03: I try to review my notes earlier instead of cramming. And I make sure to take breaks to play football with friends, that really helps me decompress

Buddy: Those are excellent strategies! Sounds like you’ve figured out what works for you. Based on everything you’ve shared, let me give you some feedback:

YOUR STRESS ASSESSMENT:

Your stress level is low (Score: 14/40). This is actually typical for students during JEE preparation—you’re experiencing normal stress but managing it reasonably well.

What’s Working Well:

– You feel generally in control of your situation

– You maintain confidence in handling problems

– You stay organized, which helps you feel on top of things

– You’ve identified effective coping strategies (early review, physical activity, social connection)

Areas to Watch:

– Pre-test anxiety and last-minute cramming

– Occasional periods when tasks pile up

– Sleep disruption during high-pressure weeks

Strategies to Maintain and Enhance Your Well-Being:

1. Pre-Test Anxiety Management:

– Continue your early review strategy—it’s working!

– The week before tests, make a conscious commitment to maintain your regular sleep schedule

– Try this pre-test ritual: Night before, spend 10 minutes reviewing only your strongest topics (confidence boost) rather than cramming weak areas (anxiety spike)

2. Preventing Overwhelm:

– You’re already organized—keep that up!

  • When tasks pile up, use the “2-minute rule”: If something takes less than 2 minutes, do it immediately. Everything else, schedule.
  • Weekly planning session: Every Sunday, map out the week’s priorities to avoid last-minute surprises

3. Optimizing Your Stress-Relief Activities:

  • Your football sessions are excellent! Physical activity is one of the most effective stress management tools.
  • Consider scheduling these BEFORE stress peaks, not just as a response. Preventive stress relief is more effective than reactive.

4. Sleep Protection:

  • Since you notice sleep disruption affects you, protect this fiercely
  • Set a “no study after 10 PM” rule during test weeks
  • If you can’t sleep due to test thoughts, try this: Write down “I’ve prepared what I can. Tomorrow I’ll show what I know.” Then do 5 minutes of box breathing.

5. Test Day Performance:

  • Morning of the test: 10-minute walk, light breakfast, review your confidence-building topics only
  • Between sections: 30-second reset—close eyes, three deep breaths, shoulder rolls

Resources to Consider:

Apps:

  • Forest (focus timer with gamification—you might like this!)
  • Headspace for Athletes (since you play football—combines sports mindset with stress management)

Quick Techniques:

  • “5-4-3-2-1” grounding when pre-test anxiety spikes: Name 5 things you see, 4 you can touch, 3 you hear, 2 you smell, 1 you taste
  • Power poses: 2 minutes of expansive posture before tests actually reduces cortisol and increases confidence

Support Resources:

While your stress level is manageable, it’s always good to know resources are available:

  • Your coaching institute likely has counselors available
  • Don’t hesitate to reach out if stress levels increase

Final Thoughts:

You’re doing a lot of things right! You’ve developed good organizational habits, you maintain social connections, you use physical activity effectively, and you’re self-aware about your stress patterns. These are exactly the skills that help students succeed not just in exams, but in managing life’s challenges.

Your main opportunity is to be more proactive rather than reactive—maintain your routines even during high-pressure weeks instead of abandoning them. Keep doing what you’re doing, protect your sleep, and trust your preparation.

You’ve got a solid foundation. Keep it up! 

P03: Thanks! This is actually helpful. I didn’t realize some of the things I’m doing are good strategies—I just thought everyone did this stuff

Buddy: Not at all! Many students struggle with exactly what you’re managing well. That self-awareness and those coping skills will serve you well beyond JEE too. Keep it up, and best of luck with your preparation! 

Full Transcript Example 3: Participant P10 (High Stress with Concerning Indicators, Score = 31)

Buddy: Hi! I’m Buddy, and I’m here to help you understand how stress might be affecting you during your exam prep. I’ll ask you about 10 different situations related to stress. There are no right or wrong answers—just share honestly how you’ve been feeling over the past month. Ready to start? 

P10: I guess so

Buddy: Okay, first question: Over the past month, how often have you felt upset because something unexpected happened?

P10: Almost the time. Many days everything feels like a disaster.

Buddy: That sounds really difficult. Next: How often have you felt unable to control the important things in your life?

P10: Every day. Nothing is in my control anymore

Buddy: I hear you. How often have you felt nervous or stressed out?

P10: Constantly. I can’t remember the last time I wasn’t stressed

Buddy: That must be exhausting. Now, how often have you felt confident about your ability to handle personal problems?

P10: I’m barely just managing but I still try to some extent.

Buddy: How often have you felt that things were going your way?

P10: Barely 

Buddy: How often have you found that you couldn’t cope with all the things you had to do?

P10: Daily. I’m drowning

Buddy: How often have you felt you were on top of things?

P10: I’m just barely scraping by.

Buddy: How often have you been able to control irritations in your life?

P10: I can’t control anything. Sometimes I feel like ending it

[CRISIS KEYWORD DETECTED: “ending it”]

Buddy: I’m really concerned about what you’ve shared, especially when you mentioned feeling like giving up completely. What you’re feeling matters, and I want to make sure you get the support you need right away.

I’m not equipped to provide the crisis support you need, but there are people who are trained to help:

IMMEDIATE HELP – AVAILABLE 24/7:  National Helpline (KIRAN): 1800-599-0019 AASRA: 9820466726  Vandrevala Foundation: 1860-2662-345  Emergency Services: 112

These are trained professionals who can help you right now. Please reach out to them immediately, or tell a trusted adult—a parent, teacher, or friend—what you’re going through.

You don’t have to face this alone. Help is available, and things can get better with the right support.

Is there someone nearby you can talk to right now? A parent, sibling, or friend?

P10: I don’t want to bother anyone. I’m probably just overreacting

Buddy: You’re not overreacting, and you’re not a bother. What you’re experiencing is serious, and reaching out for help is exactly the right thing to do. Your feelings are valid, and there are people who want to help you.

Please, talk to someone you trust today—a parent, teacher, or call one of those helplines. They’re there specifically to support people going through what you’re experiencing.

The pressure of JEE is intense, and it’s affecting you deeply. That’s not your fault, and it’s not something you should handle alone. Will you promise me you’ll reach out to someone today?

P10: Okay. Maybe I’ll tell my mom

Buddy: That’s a really important step, and I’m glad you’re willing to do that. Your mom will want to know what you’re going through so she can help you get support.

Please save these numbers too:  KIRAN: 1800-599-0019 (24/7)  AASRA: 9820466726 (24/7)

You can call them anytime, even right now if you need someone to talk to before speaking with your mom.

You’re taking a brave step by acknowledging you need help. That takes real strength. Please talk to your mom today, and consider calling one of those helplines. You deserve support, and it’s available.

P10: Thank you. I will

Buddy: I’m really glad to hear that. Take care of yourself, and remember—reaching out is strength, not weakness. You’ve got people who care about you and want to help.

Cost Analysis

Token usage was tracked across all 10 screening sessions:

Session ComponentAvg Input TokensAvg Output TokensTotal Tokens
System prompt4,20004,200
PSS-10 questions (10 items)1,8002,1003,900
Follow-up probing (3–5 questions)9001,4002,300
Feedback report generation2,5003,2005,700
Total per session9,4006,70016,100
Table 2 | Token Usage and Cost Per Session

Cost Calculation (GPT-4-mini pricing as of January 2025):

– Input: $0.150 per 1M tokens

– Output: $0.600 per 1M tokens

Per-session cost:

– Input cost: 9,400 tokens × $0.150/1M = $0.00141

– Output cost: 6,700 tokens × $0.600/1M = $0.00402

– Total cost per screening: $0.00543 (~$0.005 or half a cent)

Scalability Analysis:

Usage ScaleTotal CostCost per User
100 students$0.54$0.0054
1,000 students$5.43$0.0054
10,000 students$54.30$0.0054
100,000 students$543.00$0.0054
Table 3 | Cost at different scales
Service TypeCost per SessionNotes
In-person counseling (India)$6–24Per 45-minute session
Online counseling platforms$5–18Per session
School counselorFreeLimited availability, scheduling constraints
Paper PSS-10 self-assessmentMinimal (printing)No feedback or guidance provided
AI screening system$0.005Immediate, 24/7 availability with personalized feedback
Table 4 | Cost Comparison with Traditional Services

Additional Implementation Costs (not included in per-session calculation):

– Web interface development: One-time

– System maintenance: Minimal (API calls only)

– Human oversight for flagged cases: Variable depending on escalation rates

Economic Interpretation: At $0.005 per screening, the system could deliver 200 comprehensive assessments for $1, making it economically viable even for resource-constrained educational institutions or large-scale population screening initiatives.

Very Important caveat: These costs reflect only API expenses. A production system would require:

– Web infrastructure and hosting

– Data security measures

– Professional oversight for high-risk cases

– Regular prompt updates and maintenance

– Crisis response protocols with human backup

Nevertheless, even with these additional costs, the system would remain orders of magnitude more cost-effective than traditional professional screening at scale.

Discussion

Principal Findings and Interpretation

This proof-of-concept study demonstrates that prompt-engineered generative AI can successfully transform validated mental health instruments into engaging, accessible screening tools while maintaining psychometric integrity. These principal findings merit discussion:

1. Score consistency: AI-administered PSS-10 scores showed high agreement with traditional format (mean absolute difference: 0.7 points)

2. Crisis detection: Safety protocol successfully activated for concerning response, with appropriate resource provision and researcher follow-up

3. Cost-effectiveness: At $0.005 per session, system demonstrates remarkable scalability potential

4. Technical performance: Prompt engineering successfully mitigated most anticipated challenges 

Theoretical and Practical Implications

Theoretical contributions:

1. Prompt engineering as intervention design methodology: This work demonstrates that systematic prompt engineering can adapt general-purpose LLMs for specialized clinical tasks without fine-tuning. This has implications for rapid development of domain-specific applications where training data is limited or sensitive.

2. Hybrid human-AI care models: The system operationalizes a tiered approach where AI handles routine screening and mild symptom support, reserving human expertise for complex cases. This model could inform broader healthcare delivery redesign in resource-constrained settings.

3. Measurement in conversational contexts: By showing that validated instruments can be administered conversationally without compromising validity, this work contributes to psychometric theory regarding measurement modality effects.

Practical implications:

1. Scalable screening for educational institutions: At $0.005 per screening, schools and coaching institutes could offer universal mental health screening without prohibitive costs. A 1,000-student institution could screen its entire population monthly for $5.

2. Anonymous help-seeking pathway: For students deterred by stigma, AI-mediated screening provides a low-barrier entry point that might eventually lead to professional care engagement.

3. Data-driven resource allocation: Aggregate screening data (de-identified) could help institutions identify peak stress periods and allocate counseling resources accordingly.

4. Template for other validated instruments: The methodology could extend beyond PSS-10 to other screening tools (PHQ-9 for depression, GAD-7 for anxiety, etc.), creating comprehensive digital assessment batteries.

Limitations and Constraints

This study has substantial limitations that must inform interpretation:

Methodological Limitations:

1. No actual student testing and small, non-representative sample for simulations (N=10)

– Insufficient for statistical generalization

– Recruitment through personal networks introduces selection bias

– Homogeneous demographics (Indian JEE aspirants) limit cultural generalizability

– All participants were urban, English-fluent students—not representative of broader student populations

Implication: Findings are suggestive only; replication with larger, diverse samples is essential

2. No clinical outcome validation

– We demonstrated score consistency between formats but NOT that scores predict actual mental health outcomes. 

– No follow-up to assess whether recommended strategies improved well-being

– No comparison to clinical assessment by qualified professionals

Implication: Cannot claim clinical validity; only technical feasibility demonstrated

3. Single-session evaluation only

– No assessment of longitudinal engagement, sustained utility, or repeated-use effects

– Cannot determine if novelty drove positive feedback rather than genuine superiority

– No data on whether users actually implemented recommended strategies

Implication: Long-term effectiveness unknown

4. Lack of control condition comparisons

– No comparison to human-delivered screening

– No comparison to other digital mental health tools

– No assessment of whether traditional questionnaire could be made more engaging through other design improvements

Implication: Cannot definitively attribute benefits to AI specifically vs. any interactive digital format

Technical Limitations:

1. No true personalization across sessions

– System has no memory of previous interactions

– Cannot track longitudinal symptom changes

– Cannot adapt based on user’s response to prior recommendations

Implication: Current system suitable only for one-time or occasional screening, not ongoing therapeutic relationship

2. Keyword-based crisis detection limitations

– Relies on explicit crisis language; subtle indicators might be missed

– No capacity for genuine clinical judgment about risk level

– False positives (unnecessary alarm) and false negatives (missed crises) both possible

Implication: Human oversight essential; AI cannot be sole safety mechanism

3. Language and cultural constraints

– Tested only in English

– Cultural appropriateness of coping strategies not validated across diverse populations

– Idiomatic expressions might not translate across dialects or regions

Implication: System requires cultural adaptation and multilingual validation before broad deployment

Practical and Ethical Limitations:

1. Privacy and data security concerns

– Current implementation uses OpenAI API; data handling policies apply

-Conversation logs contain sensitive mental health information requiring robust protection

  • Potential for unauthorized access, data breaches, or misuse
  • Unclear data retention policies and user rights over their information

Implication: Production deployment requires comprehensive data governance framework, encryption, compliance with healthcare privacy regulations (e.g., HIPAA in US,)

2. Liability and legal ambiguity

  • If the system provides harmful advice or fails to detect a crisis, who is responsible?
  • Current legal frameworks inadequate for AI-delivered health services
  • Insurance and malpractice coverage unclear for AI systems

Implication: Deployment requires clear terms of service, liability disclaimers, and potentially novel legal frameworks

3. Digital divide and accessibility

  • Requires internet access and digital literacy
  • Excludes students without smartphones/computers
  • Text-based interface inaccessible to visually impaired users
  • English-only version excludes non-English speakers

Implication: System may paradoxically widen health disparities by serving only digitally-connected populations

4. Cannot replace professional judgment

  • AI lacks genuine clinical reasoning, empathy, or ability to detect non-verbal cues
  • Cannot conduct comprehensive diagnostic assessment
  • Cannot provide therapy or crisis intervention
  • Cannot account for complex individual contexts that human clinicians integrate

Implication: System must be positioned as screening/support tool, never as clinical care substitute

5. Potential for misuse

  • Users might over-rely on AI advice, delaying necessary professional care
  • Institutions might use system to claim mental health support while avoiding investment in actual counseling services
  • Surveillance concerns if institutions monitor student mental health data without consent

Implication: Deployment requires clear guidelines on appropriate use, user education about limitations, and ethical oversight

6. Research quality concerns

  • Author-developed system evaluated primarily by author and close acquaintances
  • Risk of confirmation bias in interpretation
  • No independent validation
  • No blinding in qualitative feedback collection

Implication: Findings are preliminary; independent replication essential

Bias, Safety, and Ethical Considerations

Potential for Bias in AI Systems:

Generative AI models inherit biases from training data, which can manifest in mental health applications in concerning ways:

Cultural bias:

  • Models trained predominantly on Western mental health literature may not appropriately recognize or address stress expressions in non-Western cultures
  • Coping strategies recommended might reflect individualistic cultural assumptions inappropriate for collectivist societies

Example concern: Indian students may experience family-related academic pressure differently than Western counterparts; generic advice might miss cultural nuances

Mitigation needed: Cultural adaptation of prompts, validation across diverse populations, involvement of culturally-informed mental health professionals in prompt design

Gender bias:

  • Research shows LLMs can exhibit gender stereotypes in language and recommendations
  • Mental health expression and help-seeking differ by gender; biased systems might reinforce stereotypes

Example concern: System might respond differently to male vs. female students expressing identical symptoms

Mitigation needed: Systematic testing for gender-differentiated responses, bias auditing, prompt refinements to ensure equitable treatment

Socioeconomic bias:

  • Recommended resources (apps, books, counseling) may assume financial means unavailable to economically disadvantaged students
  • Language complexity might disadvantage students from less privileged educational backgrounds

Example concern: Recommending paid app subscriptions to students who cannot afford them

Mitigation needed: Tiered resource recommendations including free/low-cost alternatives, readability assessment to ensure accessibility

Linguistic bias:

  • English proficiency variations might affect AI’s ability to accurately interpret responses
  • Non-native speakers or those using dialectal English might be misunderstood or misscored

Example concern: Student saying “I’m feeling totally finished” (Indian English idiom for exhausted) might be misinterpreted

Mitigation needed: Training data including diverse English varieties, explicit testing with non-native speakers

Disability bias:

  • Text-only interface excludes users with visual impairments or reading difficulties
  • Assumes neurotypical communication patterns; might misinterpret or frustrate neurodiverse users

Mitigation needed: Alternative modalities (voice interface), accessibility testing, explicit consideration of neurodiversity

Systematic bias testing required: Future validation must include:

  • Matched participant pairs varying only in demographic characteristics
  • Analysis of score/recommendation differences across demographic groups
  • Participatory design involving diverse student populations
  • Regular bias audits using established frameworks (e.g., Fairness, Accountability, Transparency principles)

Safety Protocol Refinements:

While the crisis detection protocol functioned in the single test case (P10), substantial enhancements are needed:

1. Multi-tiered risk assessment:

  • Beyond binary crisis/no-crisis, implement graduated risk levels (minimal, low, moderate, high, imminent)
  • Different responses for passive ideation (“sometimes I think about not existing”) vs. active planning (“I’ve decided how I would do it”)
  • Integration with validated suicide risk assessment frameworks (e.g., Columbia Suicide Severity Rating Scale)

2. Active vs. passive resource provision:

  • Current system provides hotline numbers but cannot ensure contact
  • Enhanced version could:
    • Offer to initiate contact on user’s behalf (with consent)
    • Send automated alerts to pre-designated emergency contacts
    • Integrate with emergency services in high-risk cases
    • Require explicit acknowledgment of resources before ending session

3. False positive management:

  • Overly sensitive detection might flag non-crisis statements, causing unnecessary alarm
  • System needs capacity to differentiate between:
    • Colloquial expressions (“I’m dying of stress”) vs. genuine crisis
    • Academic stressors (“I feel like giving up on this math problem”) vs. existential despair
  • Contextual analysis and follow-up questions can refine risk assessment

4. Human-in-the-loop architecture:

  • All crisis-flagged conversations should immediately notify human responder
  • 24/7 professional monitoring for high-stakes deployments
  • Clear escalation pathways with defined response timeframes
  • Regular review of crisis interactions to refine protocols

5. Post-crisis follow-up:

  • Automated check-ins with users who triggered crisis protocols
  • Connection facilitation with professional services
  • Outcome tracking: Did users access resources? Did the situation improve?

Comparison to Traditional Mental Health Approaches

The AI screening system exists within a broader ecosystem of mental health support. How does it compare?

DimensionRegular PSS-10School CounselorAI Screening SystemMental Health HotlineProfessional
AccessibilityHigh (self-administered)Limited (scheduling)Very high (24/7)Very high (24/7)Low (availability)
CostMinimalFree (school-provided)Low costFreeHigh (place-dependent)
AnonymityHighLowHighModerate–highLow
PersonalizationNoneHighModerateModerateVery high
Clinical expertiseNoneHighNoneModerate–highVery high
Immediate feedbackScore onlySession-dependentYes (instant)Yes (immediate)Session-dependent
Crisis responseNoneYes (in person)Resource provisionYes (trained response)Yes (clinical intervention)
Longitudinal supportRequires repeated administrationYes (ongoing)Currently noCall-by-call onlyYes (ongoing)
StigmaLowModerate–highVery lowLow–moderateHigh
ScalabilityHighLow (staff-limited)Very highModerate (capacity-limited)Low (professional scarcity)
Table 5 | Comparative Analysis of Mental Health Support Modalities

Interpretation:

The AI system occupies a distinct niche:

  • Better than paper PSS-10 for: engagement, immediate feedback, personalized guidance
  • Not a replacement for counselors/therapy in: clinical expertise, genuine empathy, crisis intervention, complex case management
  • Complementary to hotlines: AI for routine screening/mild symptoms; hotlines for acute distress requiring human interaction
  • Optimal positioning: First-line screening and psychoeducation, with clear pathways to human professionals for elevated cases

Integration model: Rather than competing with traditional services, AI screening could serve as a funnel:

  1. Universal AI screening identifies students needing attention
  2. Mild-moderate cases receive AI-delivered coping strategies + optional counseling
  3. High-risk cases fast-tracked to school counselors
  4. Crisis cases immediately connected to hotlines/emergency services
  5. Counselors focus time on complex cases rather than routine screening

This tiered approach maximizes reach while preserving human expertise for situations genuinely requiring it.

Recommendations for Future Research

Based on this proof-of-concept and identified limitations, we propose a structured research agenda:

Phase 1: Psychometric Validation (Immediate Priority)

Study design: Large-scale validation study (N≥300) with:

  • Diverse sample across demographics, geographies, examination types
  • Randomized comparison: AI-administered vs. traditional PSS-10 vs. clinician-administered
  • Test-retest reliability assessment
  • Convergent validity with other validated measures (GAD-7, PHQ-9, clinical interview)
  • Criterion validity: correlation with functional impairment, academic performance, help-seeking behavior

Research questions:

  • Does conversational administration alter PSS-10 psychometric properties?
  • Are scores equivalent across administration formats?
  • Does format affect disclosure patterns or social desirability bias?

Phase 2: Clinical Utility Evaluation (6-12 months)

Study design: Longitudinal cohort study with:

  • Students using AI system weekly throughout exam preparation period (3-6 months)
  • Comparison group receiving standard care or paper assessments
  • Outcomes: symptom trajectories, help-seeking rates, coping strategy adoption, academic performance, user satisfaction

Research questions:

  • Does repeated AI screening improve mental health outcomes vs. no screening?
  • Do users implement recommended coping strategies?
  • Does AI increase professional help-seeking (ideal) or substitute for it (concerning)?
  • What is the optimal screening frequency (weekly, biweekly, monthly)?
  • How do staff and students perceive integrated AI-human care models?

Conclusion

This proof-of-concept study demonstrates that prompt-engineered generative AI can successfully transform validated mental health screening instruments into interactive, accessible, cost-effective tools that maintain psychometric integrity while providing immediate, personalized support. By administering the PSS-10 conversationally, the system achieved high score concordance with traditional formats (mean difference: 0.7 points) while substantially enhancing user engagement and reducing stigma-related barriers to assessment.

However, substantial limitations constrain interpretation and deployment. Lack of actual student testing, lack of clinical outcome validation, and single-session evaluation limit conclusions to technical feasibility rather than clinical efficacy. 

Positioning within the mental health ecosystem: This system does NOT replace professional mental health care. Rather, it occupies a specific niche in a stepped-care model: universal screening and first-line support for mild daily stressors, with clear pathways to human professionals for elevated cases. It addresses a critical gap—the lack of accessible, engaging, stigma-free screening tools—while explicitly acknowledging that complex assessment, genuine therapeutic relationships, and crisis intervention remain domains requiring human expertise.

The path forward: Clinical deployment would be premature and potentially harmful without rigorous validation. Essential next steps include:

  1. Large-scale psychometric validation with IRB approval
  2. Longitudinal clinical utility evaluation

References

  1. Pascoe, M. C., Hetrick, S. E., & Parker, A. G. (2020). The impact of stress on students in secondary school and higher education. International Journal of Adolescence and Youth, 25(1), 104-112. https://doi.org/10.1080/02673843.2019.1596823. [] [] []
  2. George, D. A. S. (2024). Exam Season Stress and Student Mental Health: An International Epidemic. Partners Universal International Research Journal, 3, 138-149 [] [] []
  3. Pachole, N., Thakur, A., Koshta, H., Menon, M., & Peepre, K. (2023). A study to explore patterns and factors of depression, anxiety and stress among students preparing for competitive exams in central India. International Journal of Community Medicine and Public Health, 10(4), 1419-1425. https://doi.org/10.18203/2394-6040.ijcmph20231088. [] []
  4. George, A. S. (2023). The Race to Success: A Study of the NEET and JEE Exams’ Impact on Students’ Lives. Partners Universal International Research Journal [] []
  5. Jena, S., Swain, P. K., Senapati, R. E., Padhy, S. K., Pati, S., & Acharya, S. K. (2024). Trajectory of suicide among Indian children and adolescents: A pooled analysis of national data from 1995 to 2021. Child and Adolescent Psychiatry and Mental Health, 18, 124. https://doi.org/10.1186/s13034-024-00829-4. [] []
  6. Balamurugan, G., Sevak, S., Gurung, K., & Vijayarani, M. (2024). Mental Health Issues Among School Children and Adolescents in India: A Systematic Review. Cureus, 16(5), e61035. https://doi.org/10.7759/cureus.61035. []
  7. Mokashi, S. N., & Bapu K G, V. (2023). A Comparative Study of Sleep Quality and Psychological Well-Being Among Competitive Exam Aspirants and Non-Aspirants. International Journal of Indian Psychology, 11(1), 205-215 []
  8. Hysenbegasi, A., Hass, S. L., & Rowland, C. R. (2005). The Impact of Depression on the Academic Productivity of University Students. Journal of Mental Health Policy and Economics, 8(3), 145-151. []
  9. Burton, W. N., Schultz, A. B., Chen, C., & Edington, D. W. (2008). The association of worker productivity and mental health: A review of the literature. International Journal of Workplace Health Management, 1(1), 78-94. https://doi.org/10.1108/17538350810893883. []
  10. Mann, V., Tiwari, D. G. N., & Mishra, P. L. (2021). Study of Stress and Coping Strategies in Competitive Entrance Exams Aspirants Attending Medical and Engineering Coaching Institutes in Delhi. Information Technology in Industry, 9(2), 789-802. [] []
  11. Patte, K. A., Battista, K., Goddard, J., Ferro, M., & Leatherdale, S. T. (2024). Students’ reasons for being reluctant to seek help for mental health concerns in secondary schools. Cogent Mental Health, 3, 1-17. https://doi.org/10.1080/28324765.2024.2345678. [] [] []
  12. Knapp, M. (2006). Economic barriers to better mental health practice and policy. Health Policy and Planning, 21(3), 157-170. https://doi.org/10.1093/heapol/czl003. [] [] []
  13. Cohen, S., Kamarck, T., & Mermelstein, R. (1983). A Global Measure of Perceived Stress. Journal of Health and Social Behavior, 24(4), 385-396. https://doi.org/10.2307/2136404. [] []
  14. Sapra, A., Bhandari, P., Sharma, S., Chanpura, T., & Lopp, L. (2020). Using Generalized Anxiety Disorder-2 (GAD-2) and GAD-7 in a Primary Care Setting. Cureus, 12(5), e8224. https://doi.org/10.7759/cureus.8224. [] [] []
  15. Yan, Z., Panadero, E., Wang, X., & Zhan, Y. (2023). A Systematic Review on Students’ Perceptions of Self-Assessment: Usefulness and Factors Influencing Implementation. Educational Psychology Review, 35, 81. https://doi.org/10.1007/s10648-023-09784-6. [] []
  16. Akpan, I. J., Kobara, Y. M., Owolabi, J., Akpan, A. A., & Offodile, O. F. (2025). Conversational and generative artificial intelligence and human–chatbot interaction in education and research. International Transactions in Operational Research, 32, 1251-1281. [] []
  17. Fitzpatrick, K. K., Darcy, A., & Vierhile, M. (2017). Delivering cognitive behavior therapy to young adults with symptoms of depression and anxiety using a fully automated conversational agent (Woebot): A randomized controlled trial. JMIR Mental Health, 4(2), e19. https://doi.org/10.2196/mental.7785. [] []
  18. Inkster, B., Sarda, S., & Subramanian, V. (2018). An empathy-driven, conversational artificial intelligence agent (Wysa) for digital mental well-being: Real-world data evaluation mixed-methods study. JMIR mHealth and uHealth, 6(11), e12106. https://doi.org/10.2196/12106. [] []
  19. Calvo, R. A., Milne, D. N., Hussain, M. S., & Christensen, H. (2017). Natural language processing in mental health applications using non-clinical texts. Natural Language Engineering, 23(5), 649-685. https://doi.org/10.1017/S1351324916000383. [] []
  20. Coppersmith, G., Leary, R., Crutchley, P., & Fine, A. (2018). Natural language processing of social media as screening for suicide risk. Biomedical Informatics Insights, 10, 1178222618792860. https://doi.org/10.1177/1178222618792860. [] []
  21. Følstad, A., & Brandtzæg, P. B. (2017). Chatbots and the new world of HCI. Interactions, 24(4), 38-42. https://doi.org/10.1145/3085558. []
  22. Bower, P., & Gilbody, S. (2005). Stepped care in psychological therapies: Access, effectiveness and efficiency. British Journal of Psychiatry, 186, 11-17. https://doi.org/10.1192/bjp.186.1.11. []
  23. Lee, P., Bubeck, S., & Petro, J. (2023). Benefits, limits, and risks of GPT-4 as an AI chatbot for medicine. New England Journal of Medicine, 388(13), 1233-1239. https://doi.org/10.1056/NEJMsr2214184. []
  24. Wang, S., Scells, H., Koopman, B., & Zuccon, G. (2023). Can ChatGPT write a good Boolean query for systematic review literature search? arXiv preprint arXiv:2302.03495. https://arxiv.org/abs/2302.03495. []
  25. Singhal, K., Azizi, S., Tu, T., Mahdavi, S. S., Wei, J., Chung, H. W., Scales, N., Tanwani, A., Cole-Lewis, H., Pfohl, S., Payne, P., Seneviratne, M., Gamble, P., Kelly, C., Babiker, A., Schärli, N., Chowdhery, A., Mansfield, P., Demner-Fushman, D., … Natarajan, V. (2023). Large language models encode clinical knowledge. Nature, 620, 172-180. https://doi.org/10.1038/s41586-023-06291-2. []
  26. Thirunavukarasu, A. J., Ting, D. S. J., Elangovan, K., Gutierrez, L., Tan, T. F., & Ting, D. S. W. (2023). Large language models in medicine. Nature Medicine, 29, 1930-1940. https://doi.org/10.1038/s41591-023-02448-8. []
  27. White, J., Fu, Q., Hays, S., Sandborn, M., Olea, C., Gilbert, H., Elnashar, A., Spencer-Smith, J., & Schmidt, D. C. (2023). A prompt pattern catalog to enhance prompt engineering with ChatGPT. arXiv preprint arXiv:2302.11382. https://arxiv.org/abs/2302.11382. []
  28. Brown, T. B., Mann, B., Ryder, N., Subbiah, M., Kaplan, J., Dhariwal, P., Neelakantan, A., Shyam, P., Sastry, G., Askell, A., Agarwal, S., Herbert-Voss, A., Krueger, G., Henighan, T., Child, R., Ramesh, A., Ziegler, D. M., Wu, J., Winter, C., … Amodei, D. (2020). Language models are few-shot learners. Advances in Neural Information Processing Systems, 33, 1877-1901. []
  29. Ouyang, L., Wu, J., Jiang, X., Almeida, D., Wainwright, C. L., Mishkin, P., Zhang, C., Agarwal, S., Slama, K., Ray, A., Schulman, J., Hilton, J., Kelton, F., Miller, L., Simens, M., Askell, A., Welinder, P., Christiano, P., Leike, J., & Lowe, R. (2022). Training language models to follow instructions with human feedback. Advances in Neural Information Processing Systems, 35, 27730-27744. []
  30. Wei, J., Wang, X., Schuurmans, D., Bosma, M., Ichter, B., Xia, F., Chi, E., Le, Q., & Zhou, D. (2022). Chain-of-thought prompting elicits reasoning in large language models. Advances in Neural Information Processing Systems, 35, 24824-24837. []
  31. Lee, E. H. (2012). Review of the psychometric evidence of the Perceived Stress Scale. Asian Nursing Research, 6(4), 121-127. https://doi.org/10.1016/j.anr.2012.08.004. []
  32. Spitzer, R. L., Kroenke, K., Williams, J. B. W., & Löwe, B. (2006). A brief measure for assessing generalized anxiety disorder: The GAD-7. Archives of Internal Medicine, 166(10), 1092-1097. https://doi.org/10.1001/archinte.166.10. [] []

LEAVE A REPLY

Please enter your comment!
Please enter your name here