What is NVIDIA Riva?

NVIDIA Riva is a GPU-accelerated SDK for building state-of-the-art Speech AI applications that are customized for your use case and deliver real-time performance.

Try and Download NVIDIA Riva >

Why Riva is the best Speech technology
‍ for your future product

State-of-the-Art AI

Built on a decade of AI innovations by NVIDIA across hardware, model architectures, training techniques, inference optimizations, and deployment solutions.

Fully Customizable

Flexibility at every step, from modifying model architectures to fine-tuning models on your data and customizing pipelines, as well as the ability to deploy on any platform.

Leading Performance

Continuous optimizations across the entire stack, from models to software to hardware, deliver 12X the gain versus the previous generation.

NVIDIA Riva at GPU Technology Conference Keynote

THE LATEST NEWS ABOUT NVIDIA RIVA

January, 2023

NVIDIA announced Riva 2.8.1 in general availability

NVIDIA GTC highlights include:

ASR:

Added support for two more languages, now the full list is: English, Spanish, German, French, Mandarin, Hindi, Russian, English, Korean, Brazilian-Portuguese, Japanese, Italian.
Improved streaming accuracy with Conformer CTC.
Word/profanity filter.

TTS:

Neural-based text to speech, generating high-quality, human-like voices.
Lowcode AI custom voice creation with 30-minute voice input and adjustable voice pitch, volume, and pause output.
Compact and simple model with multiple synthetic voices available at inference time.

Embedded:

ASR in 7 languages and 2 pre-built female and male English TTS voices.
Real-time performance with latency of less than 100 ms.
Compatible with NVIDIA Jetson AGX Xavier, Jetson Xavier NX, Jetson AGX Orin and Jetson Orin NX.

Popular Applications

Riva for Сall centers and Customer Service

Call centers are one of the company's most important touchpoints for customer service. NVIDIA Riva delivers minimal latency and a 10x higher inference throughput than other speech recognition technologies on the market, making it the best choice for real-time communication in call centers.

To ensure accurate voice processing, every corporate client that outsources their operations to a call center requires a custom language and acoustic model that understands local and professional jargon for their group of end-customers. Even for call centers hosting a variety of smaller clients at once, NVIDIA Riva is scalable, and has the ability to add new custom ASR models and learn their domain-specific terms on the fly, without additional server deployments.

Here is an example of our project with Minerva (https://www.minervacq.com/ ) for NVIDIA Riva-based speech recognition in equipment customer service.

An example of Riva speech recognition in action.

Smart devices and self-service kiosks

NVIDIA Riva can run at the edge (powered by NVIDIA Jetson GPU) with unstable or non-existent internet connection, making it stand out from cloud-based alternative ASR solutions in commercial settings. Due to COVID-19 business owners became more aware of customer’s safety and are now using the opportunity to create voice-controlled solutions. Retailers can use the technology as a touch-free interactive digital kiosk for informational requests or order-placing.

NVIDIA Project Tokkio is the perfect synergy of computer vision and speech AI powered by Riva, making it possible to break the main barrier for a voice-enabled kiosk - ambient and crosstalk noise. The 3D avatar is animated and visualized with NVIDIA Omniverse to deliver a visually stunning experience, all in real-time.

Education

NVIDIA Riva allows the use of state-of-the-art NLP models and adapts them with domain specific voice and text data. This makes it the perfect tool for learning platforms and universities. Riva's text-processing modules are perfect for domain-specific knowledge management systems including professional literature archives search, categorization, summarization, and information retrieval.

NVIDIA Riva Speech AI SDK allows for building accurate speech capabilities using voice recordings from hundreds of students with varying accents and proficiency levels. The process requires minutes, as opposed to hours, of a teacher’s time for an entire class. Additionally, Riva enables text entities highlighting, domain-specific text classification, subject-specific term recognition, and much more. In combination with NVIDIA Omniverse, Riva is a perfect tool for creating teaching AI avatars for personalized learning experiences for each student.

Using NVIDIA Riva automatic speech recognition, Plabook is able to assess various students’ skills, e.g. reading accuracy at the phoneme level, and provide personalized feedback for improvement, saving hours of teaching time.

Enterprise Document Processing

NVIDIA Riva can also be used for text analysis, providing an automated and digitized information management system to companies.

Riva has a number of state-of-the-art NLP modules that enable document processing to extract and score important statements in corporate archives and workflow systems in combination with Robotic process automation (RPA), OCR, page clustering, and fact extraction. This saves time for highly-paid professionals, and accelerates corporate decisions.

For instance, large manufacturing companies have thousands of PDF documents with technical information regarding their products, including product installation, usage, maintenance, and troubleshooting. This information is barely discoverable, and people spend a lot of time trying to find proper specifications. With Riva-based language models, information search becomes faster and saves hours of time.

Media and Marketing

NVIDIA Riva enables user generated content analysis. This includes recommendations based on identified demographic information (gender, age estimation, language, accents, emotion and sentiment, topic, speech patterns, etc.), content classification, and spam moderation. Riva performs in-depth data-mining on recorded video, articles and comments to enable comprehensive customer profiling, enhance brand safety, and protect online communities.

Data Monsters
is your best NVIDIA Riva implementation partner

Data Monsters, a Palo Alto-based AI consulting company, is an NVIDIA Elite Partner who helps funded startups and enterprise R&D teams design and implement NVIDIA Riva-based software and hardware solutions and products. With our 15 years in AI, hundreds of completed projects, and Elite NVIDIA expertise, we are ready to become your trusted development team and accelerate releases of your AI product.

As an Elite Partner, Data Monsters has early and extended access to NVIDIA Riva technology. We have the right hardware and software component to experiment with the latest Riva modules several months before the official public release. Our direct connection with the Riva development team at NVIDIA helps to follow the best deployment practices, optimize configuration settings, calibrate the deployed pipelines, and adapt real-time streaming to different GPU chips.

IF YOU’RE BUILDING AN NVIDIA RIVA-BASED SOFTWARE PRODUCT, DON’T GO IT ALONE :)

NVIDIA Riva development requires a little bit of magic. It's a new and constantly evolving technology. Your team may spend too much time on experimentation and adaptation. It's a good idea to hire Data Monsters and accelerate your release cycles.

Creating prototypes/MVPs of speech systems. If you have an idea of a product that requires speech recognition or text processing, a chatbot, an RPA tool, an intelligent kiosk, a document processing app, we have a lot of pre-built components that can accelerate your development.

Developing scalable solutions. A real-time voice / text processing product requires perfection at all steps, starting from the right choice of the hardware, GPU-qualified servers, scalable architecture design, data collection and preparation, down to the neural networks training and deployment.

Here are some types of work you may need assistance with:

Voice data labeling and cleansing

Labeled voice data should be very accurate for Riva fine-tuning. Voice transcripts must have an exact match to the voice files. Every second matters, otherwise fine-tuning is prone to errors.

Converting to the required formats and manifests

In order to work with Riva the data should have LPCM encoding and channel splitting, with silence blocks removed. Text should be normalized, and common contractions and interjections handled.

Custom language model training

When creating custom voice processing, you need to create vocabularies for your custom domain. General language is usually recognized well out of the box, but special terminology and brand names are always misrecognized. With thousands of special words, you need to create a balanced vocabulary and train the model the right way.

Acoustic model fine-tuning

After preparing the acoustic transcriptions of in-domain terms, with the list of common misspellings, you need to run fine-tuning and evaluation using NeMo and TAO.

Building the ASR model using Riva Service Maker

Your acoustic model, language model and lexicon should be optimized for streaming or online usage through finding optimal VAD and adjusting Riva-build parameters.

Deploying the streaming models on a Triton server

Riva deployment is a multi-step process which has many aspects, including the types of server GPUs. Triton inference servers are able to serve both Riva and non-Riva ASR models for particular languages.

Performance testing and optimization

After deployment, the system should be monitored and optimized to improve throughput, reduce latency, and memory usage.

Don't go it alone - this work can take months of tinkering. Data Monsters has the relevant experience to help you design the system and accelerate your product releases.