HeyHi IELTS Test Prep

Proprietary Small Language Model (SLM) for English Language Learning

Understanding the intricacy of a student’s speech in an IELTS speaking test is crucial for determining the student’s band score. With our speech recognition technology and our language model capability, HeyHi has achieved a system for an accurate IELTS speech marking. In this blog post, we’ll be taking a look at our language models and how we trained these specialized models.

The Challenge in Automatic IELTS Speaking Marking

Globally, there are more than 4 million IELTS test takers a year. With IELTS being one of the biggest certification of the English Language in the world, providing high-quality automated IELTS assessment is important for helping students practice by accurately gauging the student’s skill and providing helpful feedback. To understand a student’s skill sufficiently, AI must be able to understand the differences of each student’s speech details such as filler words and word choices. These small differences matter in determining a student’s band score, and therefore require an AI that is specifically tuned for marking in the English Language.

International English Language Testing System

Proprietary Small Language Model for IELTS Learning

At HeyHi, we’ve been developing an integrated AI system that addresses these challenges, combining language understanding, speech recognition, as well as personalized learning through targeted feedback loops. Our solution centers on a Proprietary Small Language Model (SLM) fine-tuned specifically for IELTS speech learning. Unlike generic LLMs, this model is trained to understand learner English and to evaluate responses using IELTS-specific rubrics.

  • Automatic Speech Recognition: Enable accurate transcriptions of learner’s speech into detailed analysis based on the learner’s pronunciation, fluency patterns, as well as prosodic features.
  • Speech Evaluation and Personalized Feedback: Analyzes the transcript through the lens of IELTS criteria such as grammar, lexical range, fluency, and pronunciation and give personalized feedbacks targeted to each learner.
HeyHi Test Prep for IELTS
HeyHi Test Prep for IELTS

Leveraging NVIDIA’s NeMo and NIM to Fine-Tune AI for IELTS Speaking Marking

In the rapidly evolving landscape of artificial intelligence, NVIDIA’s NeMo and NIM have emerged as powerful tools for bringing enterprise AI to life. With NeMo allowing the training of language models into domain-specific intelligence, and NIM providing us with an easy way to deploy the trained language models, we have trained an IELTS speaking marking AI that is highly accurate in assessing student’s skills. We achieve this by utilizing NVIDIA’s NeMo and NIM to fine-tune Llama 3.1 and Qwen3 models for marking IELTS speaking tests as follows:

    • NeMo Curator:
      We utilized NeMo Curator, NVIDIA’s framework for data curation and filtering, to clean and deduplicate our training data, ensuring high-quality training data and helping to achieve an SLM capable of accurate band prediction.
    • NeMo Framework:
      With numerous customization – from base models to training methods – available in the NeMo framework, we trained Qwen3 models using techniques such as GRPO and LoRA, achieving SLM models that excels in predicting students’ performance in IELTS speaking test.
    • NVIDIA NIM:
      With the help of NVIDIA NIM, we are able to deploy multiple fine-tuned models within one deployment instance with an inference interface that is OpenAI-compatible, making it easy to use.
HeyHi SLM Workflow

 

Training Workflow

We utilized the Group Reward Policy Optimization (GRPO), which operates in a manner similar to Reinforcement Learning. We used a human-evaluated dataset for the training process to ensure consistency and robustness.

Below are an example of speech transcription from students within our training data before being analyzed by the SLM.

IELTS Speaking Part 3 Transcript
Examiner : When is it acceptable to lie?

Student : I think, uhh, the lie is accepted, uhh, when, for example, you are in, uhh, in worst situation and, uhh, for example, dangerous situation. I think everyone can, uhh, tell lie, and, uhh, but, uhh, in total, I don’t like, uhh, lying, and, uhh, I want just, uhh, I have, uhh, I’m a person who, uhh, always tell truth to others.

Examiner : What do you think about the fact that everybody has lied at least once?

Student : I think, uhh, always, uhh, telling the truth can help us, uhh, to solve, uhh, our problems. And, uhh, as I told you, uhh, when, when we are, uhh, for example, in a dangerous situation, and, uhh, we are forced to, uhh, tell a lie, we can, uhh, tell, tell a lie to others.

Examiner : Do you think we can lie to our friends and family?

Student : Not actually, I, I always tell the truth to my family or friends, but it depends, uh, that’s, uh, situations. It depends every, uh, situations.

Examiner : Do you find it acceptable to lie about your feelings to someone because you don’t want to hurt them?

Student : I think, uhh, it’s, uhh, not a… it is a bad… it is the worst, uhh, things in the world to, like, lie, uhh, to friends or someone, uhh, can, for example, broke their, uhh, heart. And, uhh, for example, they, uhh, think about us, uhh, the bad person.

Examiner : Has anybody ever lied to you about their real identities?

Student : I think, I don’t like, I don’t like somebody who is, uhh, unreliable to everyone. I want, I prefer to make a friendship with, uhh, everyone who can be, uhh, truth, for example, a truthful person. And there are a lot of people who, in the world, that they are, for example, fake, and always, uhh, tell us the, for example, the, the lie, lying things. Yes, actually, one, uhh, week ago, uhh, I tell, uhh, to my friend, for example, “Can you, can you hang out with me to RAS Academy for IELTS testing?” And he told me, “Not actually, I’m not, I’m not at home from morning to night.” And, uhh, I think it’s not really, it’s not truth, and he can, he could, uhh, come here with me.

IELTS Speaking Band Score: 4

Model Training
Using this dataset, we trained two open-source language models using NVIDIA A100 80GB GPU, with each LoRA adapter taking around 6-8 hours of training:

  • Llama 3.1 8B:
    As a language model with a small number of parameters while still being powerful, Llama 3.1 8B was a model that can easily be fine-tuned and deployed on the NVIDIA A100 80GB GPU with little to no memory issue. These factors are the reason we decided to fine-tune Llama 3.1 8B.
  • Qwen3 14B:
    Qwen3 14B is also a small language model like Llama 3.1 8B, but it has the capability of reasoning to help enhance its response. Being a newer generation of Qwen model released (April 2025) and having more parameters than Llama 3.1 8B, we decided to fine-tune Qwen3 14B and see if the fine-tuned model is able to get higher scores than fine-tuned Llama 3.1 8B.

Model Deployment and Inferencing

After we fine-tuned these models, we deployed them on NVIDIA A100 80GB GPUs and then we use it in a pipeline as follows:

  1. Speech-to-text:
    To convert the speeches to texts that can be processed by the fine-tuned models, the speeches are passed into two speech AI components, which are:

    • Speech Analysis:
      Speech Analysis AI is used for analyzing the details of the speeches, such as word pronunciation and pauses in the speech.
    • Speech Transcription:
      Speech Transcription AI is used to get the full transcription of the student’s speech, including any filler words within the speech.
  2. Marking + Feedback Generation:
    The result of speech analyses and the transcriptions are used to mark the student’s performance and generate feedback using the fine-tuned models.

Fine-tuned Model Result

With the fine-tuned SLMs, we are able to achieve higher accuracy of IELTS band predictions compared to commercially available LLMs.

Gemini 2.5 Flash GPT-5 DeepSeek-V3.2-Exp Llama 3.1 8B FT Llama 3.1 8B HeyHi IELTS SLM (FT Qwen3 14B)
Accuracy (%) 64 ± 4 77 ± 9 74 ± 3 67 ± 3 83 ± 2 89 ± 2

Model Accuracy Comparison

Accuracy is measured by comparing the differences between the predicted and the actual band score, with a 0.5 band score precision.

 

Ready to Experience the Future of IELTS Preparation?

At HeyHi, we are dedicated to advancing the field of English language education through the ethical and scalable application of artificial intelligence. Our proprietary Small Language Model (SLM) underpins the HeyHi Assessment platform and extends seamlessly into the HeyHi IELTS App, enabling precise, individualized feedback and promoting accelerated progress for every learner. This innovation reflects our ongoing commitment to supporting educators and institutions in delivering measurable outcomes and empowering students to achieve their goals.

To gain a deeper understanding of our technology in action, take a look at the demonstration of HeyHi’s AI-powered IELTS assessment app. If you are ready to experience the benefits of our platform firsthand, access the HeyHi IELTS App now and see how AI can help you boost your band score with personalized practice and real-time evaluation.

 

To discuss potential collaborations, or to learn more about how HeyHi Assessment can support your institution, please contact our team.

For media inquiries or partnerships:

hello@heyhi.sg
www.heyhi.sg

Leave a Comment

Your email address will not be published. Required fields are marked *