Improve efficiency of Call centers using Dynamics 365 and Azure cognitive speech services

Photo by Hrayr Movsisyan on Unsplash

I am Fascinated by sophistication of Azure services and how they help us to improve our solutions and extend the way we can solve customer problems. Recently I had a requirement to implement  a dynamics 365 solution to enable a call center to capture cases while their operators are offline.
One solution was to provide a self-service portal to customers to log the cases when Call center operators are offline. But in this case the customer was looking for something very quick to implement and having the ability to link incoming cases with their call center channel and derive some reporting based on it.

Approach

I started looking at Azure services and see how I can use Azure cognitive services and speech recognition to help me solve this requirement and like always I Azure did not disappoint me. In this post I would like to share my experience with you and take you to the steps that you would need to create such a solution. Of course possibilities are endless. However, this post will give you a starting point to begin your journey.

I have seen solutions where telephony systems send voice recordings of callers as an email attachment to a queue in CRM. The CRM then converts that queue item to a case and attaches the voice recording as note to the case. The challenge with this solution is the call center operators have to open attachments manually and have to write the description of the case after listening to the audio file. This means their time is spent on inefficient activities whereas they should be utilize in better ways.

Another problem with this approach is size of attachments. As time goes by, audio attachments will increase the database size impacting the maintenance of solution.

Scenario

Our scenario is based on the fact that call center agents are not working 24 hours a day.
While agents  are offline customer should still be able to contact call center record the voice messages to create cases.
We will use the following components:

  1. Azure Blob to receive recorded audio files from telephony system.
  2. Azure cognitive services to listen to recorded audio files and translate the content to a text message. The audio file will be saved in  Azure blob (which is cheaper than CRM database storage). 
  3. Azure function (with Azure Blob Binding) to recognize the text from the audio file and extracts the case description. 
  4. Dynamics 365 Web API to create a case in CRM using the description extracted from Azure Cognitive services.  We can also add blob metadata like filename, etc. to case properties. 

The full source code is available at GitHub

However, the main code snippet to perform conversion is below:


public static async Task <string> RecognitionWithPullAudioStreamAsync ( string key, string region, Stream myBlob , ILogger log )

        {

            // Creates an instance of a speech config with specified subscription key and service region.

            // Replace with your own subscription key and service region (e.g., "westus").

            var config = SpeechConfig.FromSubscription(key, region);

            string finalText = string.Empty;



            var stopRecognition = new TaskCompletionSource<int>();



            // Create an audio stream from a wav file.

            // Replace with your own audio file name.

            using ( var audioInput = Helper. OpenWavFile ( myBlob ) )

            {

                // Creates a speech recognizer using audio stream input.

                using ( var recognizer = new SpeechRecognizer ( config , audioInput ) )

                {

                    // Subscribes to events.

                    recognizer. Recognizing += ( s , e ) =>

                    {                        

                    };



                    recognizer. Recognized += ( s , e ) =>

                    {

                        if ( e. Result. Reason == ResultReason. RecognizedSpeech )

                        {

                            finalText += e. Result. Text + " ";

                        }

                        else if ( e. Result. Reason == ResultReason. NoMatch )

                        {

                            log.LogInformation ( $"NOMATCH: Speech could not be recognized." );

                        }

                    };



                    recognizer. Canceled += ( s , e ) =>

                    {

                        log. LogInformation ( $"CANCELED: Reason={e. Reason}" );



                        if ( e. Reason == CancellationReason. Error )

                        {

                            log. LogInformation ( $"CANCELED: ErrorCode={e. ErrorCode}" );

                            log. LogInformation ( $"CANCELED: ErrorDetails={e. ErrorDetails}" );

                            log. LogInformation ( $"CANCELED: Did you update the subscription info?" );

                        }



                        stopRecognition. TrySetResult ( 0 );

                    };



                    recognizer. SessionStarted += ( s , e ) =>

                    {

                        log. LogInformation ( "\nSession started event." );

                    };



                    recognizer. SessionStopped += ( s , e ) =>

                    {

                        log. LogInformation ( "\nSession stopped event." );

                        log. LogInformation ( "\nStop recognition." );

                        stopRecognition. TrySetResult ( 0 );

                    };



                    // Starts continuous recognition. Uses StopContinuousRecognitionAsync() to stop recognition.

                    await recognizer. StartContinuousRecognitionAsync ( ). ConfigureAwait ( false );



                    // Waits for completion.

                    // Use Task.WaitAny to keep the task rooted.

                    Task. WaitAny ( new [ ] { stopRecognition. Task } );



                    // Stops recognition.

                    await recognizer. StopContinuousRecognitionAsync ( ). ConfigureAwait ( false );



                    return finalText;

                }

            }

        }


Important considerations:


  1. [This point is optional, if you use Web API to create cases in CRM] You will need use Multi-tenant configuration, if your Azure Function Tenant and the tenant in which your CRM API is registered, are different. If your Azure function tenant and the tenant in which your CRM API is registered, you can use Single Tenant configuration.
  2. The input file from the telephony to Azure blob must be in a specific format. The required format specification is:

    Property
    Value
    File Format
    RIFF (WAV)
    Sampling Rate
    8000 Hz or 16000 Hz
    Channels
    1 (mono)
    Sample Format
    PCM, 16-bit integers
    File Duration
    0.1 seconds < duration < 60 seconds
    Silence Collar
    > 0.1 seconds
  3. You can use ffmpeg tool to convert your recording to this specific format. For your testing, you can download and use the tool as below:

    Download ffmpeg from this link.

    Use the command: ffmpeg -i "<source>.mp3" -acodec pcm_s16le -ac 1 -ar 16000 "<output>.wav"
  4. My sample in GitHub covers input in one single chunk of audio. However, if you wish to have continuous streaming, you will need to implement the StartContinuousRecognitionAsync method.
  5. The azure function should be configured to be blob trigger. 

Comments

Popular Posts