Overcoming Latency Challenges in Real-Time Conversational AI with Digital Avatars
Digital avatars have emerged as highly sought-after technologies in the realm of AI solutions, revolutionizing customer experiences for businesses. These virtual assistants possess custom appearances, voices, and controllable body movements, enabling them to interact with clients on a whole new level. The introduction of NVIDIA's AI-powered customer service agent, Tokkio, in November 2021, further fueled the growing interest in digital avatars. NVIDIA's recent ACE announcement has taken this technology even further, enabling the development of end-to-end interactive avatar solutions.
Conversational Intelligence and the Latency Issue
While digital avatars excel in their conversational capabilities, maintaining real-time, natural conversations presents a unique challenge. While Large Language Models (LLMs) like GPT-3 and GPT-J support generative AI mode, their implementation often results in significant latency issues. OpenAI's API response time can range from 3-15 seconds on NVIDIA V100 GPU, and although GPT-J is faster, latencies of up to 4 seconds are still observed. Such delays are impractical for maintaining human-like interactions and capturing clients' attention.
Addressing the Latency Challenge
To minimize response time while ensuring meaningful and high-quality interactions, Data Monsters has delved into tackling the latency challenge. Drawing on linguistic studies focused on fluency and speed in conversational turn-taking, we discovered that natural dialogue typically involves pause durations ranging from 200 to 800ms, with slight variations across languages. Consequently, significantly longer conversational gaps are deemed unacceptable for businesses aiming to deliver human-like interaction experiences.
Solutions and Workarounds
At Data Monsters, we have undertaken several projects involving the development of digital avatars, enabling us to gain valuable insights into addressing the latency challenge. We actively search for and test a range of solutions and workarounds, including the implementation of various conversational tricks. Our most promising approach involves creating and optimizing private LLMs that offer control and load management capabilities. We implement it after careful exploration and analysis of the most frequent questions expected to cover up to 80% of queries in the dialogues. As a result, the latency becomes less than 1 second for the frequently asked questions and less than 4 seconds if the question is too specific or uncommon. So, for the product managers that aim to implement best AI practices by introducing digital avatar to their clients, the first step is understanding and delivering the expected content of interaction between avatar and the customer as detailed as possible. At Data Monsters, the process of data collection and training the avatar can take up to 3 weeks. By leveraging these techniques, we aim to deliver exceptional value to your brand avatars, ensuring both high quality and natural fluency in digital interactions.
Overcoming latency challenges in real-time conversational AI is crucial for unlocking the full potential of digital avatars. By continuously exploring innovative solutions and drawing on linguistic studies, Data Monsters is committed to enhancing the performance of digital avatars and delivering seamless, human-like interactions. We believe that by addressing latency issues, we can create remarkable customer experiences and drive success for businesses in the era of conversational AI.