top of page
  • Writer's pictureICS.AI

Overcoming the Challenges of Building a Phone-Based AI Voice Solution: Insights and Best Practices

Updated: Aug 23, 2023

Navigating the hurdles of Phone AI development to create effective and engaging conversational experiences

Woman using SMART Phone AI

The rapid advancements in voice AI technology have created unprecedented opportunities for organisations. From improving customer experiences to streamlining processes and enhancing accessibility, the benefits are multifaceted. However, developing an effective AI voice solution, particularly for the phone channel, is not without its complexities. With the experience of being innovation leaders in AI technology for the public sector and having multiple customers now benefiting from phone-based AI, we're excited to share our journey through these challenges with the SMART: Phone AI platform.

Accurate Speech Recognition and Natural Language Understanding

Arguably one of the biggest hurdles in developing a phone AI solution lies in accurate speech recognition and natural language understanding. This comes into sharp focus when catering to users from diverse linguistic and cultural backgrounds, with the system needing to comprehend a wide range of accents, dialects, speech patterns, and pronunciations. We've tackled this challenge head-on by investing in high-quality training data, involving a wide variety of speech samples, and consistently refining our AI model using cutting-edge machine learning techniques.

Maintaining Contextual Understanding and Conversational Flow

Achieving a natural conversational flow and contextual understanding is another significant part of the puzzle. Users expect a seamless conversation experience, which necessitates the AI to interpret intent, grasp nuances, and provide relevant responses appropriately. To address this, we've leveraged advanced natural language processing algorithms, paired with a robust dialogue management system, creating a smooth, coherent conversational flow.

Non-linear Conversations

Conversations in real life often don't follow a linear path, and AI interactions should be no different. The capacity to manage non-linear conversations is crucial in the development of a sophisticated phone AI system. Our advanced dialogue management systems are designed to handle such unexpected shifts in the conversation, ensuring an uninterrupted conversational experience.

Addressing Conversational Conventions and Input Variations

Conversational conventions in spoken language can differ significantly from written language. Users might include extraneous phrases or spell out words, creating unique challenges for the AI system. We've addressed these by refining our NLP algorithms to filter out unnecessary information and train our system to recognise spelled-out inputs accurately. Additionally, our platform is designed to handle different forms of input effectively, including short, ideal, and long inputs.

Multilingual Support and Presentation of Answers

Catering to diverse user groups necessitates multilingual support, which can be resource-intensive and complex. To mitigate this, we've leveraged pre-trained multilingual models, fine-tuned to our specific domain. We've also focused on presenting answers in a clear and concise manner, avoiding long or complicated responses, particularly for voice-based interactions.

The Dead Air Problem and Interruption Management

‘Dead air’, or silence, while the AI processes information can disconcert users. Additionally, the AI must deal with pauses from the user’s end. Therefore, we've used optimisation techniques to reduce processing times, maintaining a steady conversational flow. Our AI is also designed to handle user pauses and interruptions without losing context, enhancing the overall user experience.

Recognising Spelled-Out Inputs and Keyboard-Based Interaction

Another challenge we faced was the limitations of input understanding. Users often spell out words or use a keyboard for certain inputs, like selecting menu options or providing phone numbers. Our phone-based voice AI initially struggled to accurately comprehend these inputs, often interpreting spelled-out letters as separate words.

To address this, we trained our advanced speech recognition systems to better recognise and handle spelled-out inputs. We also developed a multimodal AI that can handle both voice and text inputs to cater to keyboard-based interactions, enhancing the system's versatility and offering a comprehensive user experience.

Handling Short and Ambiguous Inputs

Our journey towards creating an efficient voice AI solution also revealed the challenge of interpreting short, ambiguous inputs. Such inputs, devoid of additional context, are hard for the AI to understand accurately.

To tackle this issue, we developed a service map. This interactive guide maps short inputs to potential operations or services, allowing the AI to make more educated guesses on user intent. As the system continues to learn and adapt, its capacity to correctly interpret such inputs improves, thus ensuring a smooth and engaging user experience.

Please note that addressing these issues is an ongoing process. With every interaction, our phone-based voice AI system continues to learn, adapt, and improve. Our commitment to innovation and user satisfaction drives us to continuously refine and enhance our technology.

Addressing Various Forms of Input

In developing an effective phone-based voice AI solution, one cannot overlook the diversity of inputs. Inputs could be short, ideal, long, or even take the form of lengthy dialogues. Each kind presents its own unique challenge in terms of comprehension and response generation.

To handle this, we took advantage of technologies like ChatGPT to summarise and generate appropriate responses. Whether the system needs to generate text messages, voice outputs, or other responses, leveraging these advanced technologies ensures that it can effectively process and respond to a broad range of inputs. As the system continues to learn and adapt, it becomes even more efficient at understanding and responding to different forms of user input, thereby making interactions more effective and satisfying for users.

User Privacy and Data Security

Woman using SMART Phone

In an era where data privacy and security are paramount, we've ensured that our solution adheres to the highest standards. With robust data encryption and secure storage solutions, we prioritise user data protection. Moreover, our system is designed with a privacy-by-default approach and complies with data protection regulations such as GDPR.

Integration with Existing Systems

Integrating a voice AI solution with existing systems can be complex and time-consuming. We've overcome this by developing our voice AI solution with a modular and flexible architecture that can easily integrate with different systems.

Future innovations – harnessing the power of generative AI

As we reflect on our progress, we also eagerly anticipate the future enhancements in our phone-based voice AI solution. The fast-paced advancements in generative AI not only unlock numerous avenues to continuously refine and innovate, but they really offer unlimited potential. Here’s a sneak peek on what developments are currently in progress:

  • Unlimited Multi-turn Topic Discussions – gone are the days of restarting conversations or setting up fresh intents, with unbounded multi-turn discussions, ensuring the conversation flows naturally

  • Generative Answers – a transition from limited curated answers to generative AI-created responses, ensuring users get dynamic, accurate, and contextually relevant answers every time

  • Local User Content Grounding – by rooting our AI’s answer in local user content topics, we assure more personalised and region-specific information, enhancing user trust and engagement

  • Interpreted AI on Local Data Sources – our AI will interpret and pull data directly from local sources, ensuring timely and accurate data-driven responses

  • Enhanced Chit-Chat Capabilities – more advanced chit-chat features, meaning our AI will not just answer queries but also engage users in richer, more human-like conversations

  • Boosting Deflection Rates – increased deflection rates by an additional 20% through continuous optimisation, promising even faster responses and reduced waiting times

  • Sophisticated Use Case Management – handling more complex tasks such as assessments, moving beyond question-answers to more involved interactions

  • Seamless Integration with ChatGPT – full integration with the generative AI ChatGPT framework, tapping into its vast knowledge base and power in a structured, governed and appropriate way


Building an AI voice solution comes with a fair share of challenges, but with innovation, diligence, and best practices, these can be addressed effectively. The SMART: Phone AI platform is a testament to this – a pioneering conversational AI solution designed to transform public sector services. By handling up to 50% of inbound calls and reducing call waiting times by up to 30%, SMART Phone AI is revolutionising customer services, offering prompt and accurate responses to user queries, and providing a more efficient, personalised service experience.

The development of the SMART: Phone AI platform underscores how the power of voice AI can be harnessed to enhance public sector customer services, paving the way for higher resident satisfaction and trust. Discover how SMART: Phone AI platform can help you transform your local government customer services and leverage the latest advancements in voice AI technology.

Customer Focus: Derby City Council save £200,000 against budget plan with adoption of Phone AI

Hear how Derby City Council have adopted SMART Phone AI to enhance their resident interactions, answering over 100,000 calls through their digital assistants, deflecting 43% calls away from human advisors and saving £200,000 against the Council's budget plan - download our recent webinar to find out more.



185 views0 comments


bottom of page