How to enable voice playback for iOS push notifications

This article was published on the "Sohu Technology Products" public account How to Play Voice Notifications on iOS

iOS Voice Notification Playback#

1. Background#

The demand for iOS voice notification playback involves playing the content of the notification upon receipt, with the content being variable. This is similar to the voice notifications for payment receipts in Alipay and WeChat.

Only devices running iOS 10 or later support audio playback in the background/lock screen state after the app is awakened by a push notification. Therefore, devices running iOS 10 or earlier can only set a fixed ringtone for local push notifications upon receiving a VoIP Push, which is why only "WeChat Payment Receipt" is available on iOS 10 or earlier.
Prior to iOS 12.0, there were no restrictions on audio playback in the background, and the system-provided AVSpeechSynthesizer could be used directly with Notification Service Extension (supported from iOS 10.0 onwards).
After iOS 12.0, the background audio playback feature of Notification Service Extension was restricted, making playback implementation more challenging.
- If you want to publish to the App Store, you can only play fixed audio or fixed concatenated audio by setting the notification sound or sending local notifications to set the local notification sound.
- If you do not need to publish to the App Store, you can manually enable background playback for Notification Service Extension.

2. Development Process#

a. Notification Service Extension#

The logic after adding the Notification Service Extension to the project differs from before. As shown below:
After adding it, when a push notification is received, it will trigger a method in the Notification Service Extension, where you can modify the notification's title, content, and sound. Then, display the modified notification.

The lifecycle of the notification bar:

From the moment the notification is displayed (trigger code: self.contentHandler(self.bestAttemptContent);) until it is dismissed (controlled by the system), there is approximately 6 seconds.
If the notification is received but the notification bar is not called out, the system will call the self.contentHandler(self.bestAttemptContent) in the serviceExtensionTimeWillExpire method after a maximum of 30 seconds to bring up the notification bar.

It is important to note that the Notification Service Extension and the main project are not the same target, so the files of the main project are not shared with this target.

When creating new files, be sure to check the target to which they should be added.
- For example, when adding a class for playing voice notifications, it should be checked under the Notification Service Extension target;
- When copying a third-party SDK for voice playback, it should also be checked under the Notification Service Extension target;
- When creating a new application on the third-party platform, the bundle ID to be filled in should correspond to the bundle ID of the Notification Service Extension target. This is particularly important because the Baidu test account can only add the offline SDK once; if you make a mistake, you will need to register with a new account, a painful lesson, 😂.
The bundle directory access is also not the same, but data can be shared through App Group.
When enabling background playback, it should actually be the background playback of the Notification Service Extension target, which will be explained in detail later.

The creation steps are as follows:

Create a Notification Service Extension target by selecting the Xcode project, clicking File -> New -> Target, and selecting Notification Service Extension target. There are two very similar options, so be careful to select the correct one, as shown below:
Click Next, enter the Product Name
Click Finish, then click Activate
Open the NotificationService.m file, which is the class automatically created after adding the Notification Service Extension. After adding it, all processing for received pushes can be modified in this location.
- In the didReceiveNotificationRequest:withContentHandler: method, the userInfo in bestAttemptContent contains the detailed information of the push notification. If you want to modify the displayed title and content or the voice of the push, all operations should be done before calling back at the end of this method.
  - When modifying the notification sound, note that:
    - Supported audio file types for custom ringtones include aiff, wav, and wav format, and the length of the ringtone must be less than 30 seconds; otherwise, the system will play the default ringtone. More info here.
    - The directory where audio files are stored and the priority for reading are: in the Library/Sounds folder of the main application, in the Library/Sounds folder of the AppGroups shared directory, in the main bundle.
  - For handling multiple pushes, call self.contentHandler(self.bestAttemptContent); in the didReceiveNotificationRequest:withContentHandler: method to display the corresponding notification. If this method is not called, the system will automatically call this method after a maximum of 30 seconds. If 10 notifications come in at once, you will find that the notifications do not pop up 10 times and are not displayed in order. Therefore, if multiple pushes are not handled, there will be issues when playing the voice.
    - In the delegate method of the AVSpeechSynthesizer class, there is a completion callback speechSynthesizer:didFinishSpeechUtterance:. Move the code to call self.contentHandler(self.bestAttemptContent) from the didReceiveNotificationRequest:withContentHandler: method to the completion callback method to ensure that the voices are displayed in order. (Alternatively, add them to an array or an OperationQueue, and continue to the next one after the playback is complete.)

@interface NotificationService ()

@property (nonatomic, strong) void (^contentHandler)(UNNotificationContent *contentToDeliver);
@property (nonatomic, strong) UNMutableNotificationContent *bestAttemptContent;

@end

@implementation NotificationService

- (void)didReceiveNotificationRequest:(UNNotificationRequest *)request withContentHandler:(void (^)(UNNotificationContent * _Nonnull))contentHandler {
    self.contentHandler = contentHandler;
    self.bestAttemptContent = [request.content mutableCopy];
    
    // Modify the notification content here...
    // Modify the notification title
    //    self.bestAttemptContent.title = [NSString stringWithFormat:@"%@ [modified]", self.bestAttemptContent.title];
    
    // Modify the notification sound, supported audio formats for custom ringtones include aiff, wav, and wav format, and the length of the ringtone must be less than 30 seconds; otherwise, the system will play the default ringtone.
    //    self.bestAttemptContent.sound = [UNNotificationSound soundNamed:@"a.wav"];

    // Playback processing
    [self playVoiceWithInfo:self.bestAttemptContent.userInfo];
    
    self.contentHandler(self.bestAttemptContent);
}

- (void)serviceExtensionTimeWillExpire {
    // Called just before the extension will be terminated by the system.
    // Use this as an opportunity to deliver your "best attempt" at modified content, otherwise the original push payload will be used.
    self.contentHandler(self.bestAttemptContent);
}

- (void)playVoiceWithInfo:(NSDictionary *)userInfo {
    NSLog(@"NotificationExtension content : %@",userInfo);

    NSString *title = userInfo[@"aps"][@"alert"][@"title"];
    NSString *subTitle = userInfo[@"aps"][@"alert"][@"subtitle"];
    NSString *subMessage = userInfo[@"aps"][@"alert"][@"body"];
    NSString *isRead = userInfo[@"isRead"];
    NSString *isUseBaiDu = userInfo[@"isBaiDu"];

    [[AVAudioSession sharedInstance] setCategory:AVAudioSessionCategoryPlayback
                                     withOptions:AVAudioSessionCategoryOptionDuckOthers error:nil];
    [[AVAudioSession sharedInstance] setActive:YES
                                   withOptions:AVAudioSessionSetActiveOptionNotifyOthersOnDeactivation
                                         error:nil];

    // Ps: The following code example does not handle multiple playback, please pay attention

    if ([isRead isEqual:@"1"]) {
        // Play voice
        if ([isUseBaiDu isEqual:@"1"]) {
            // Use Baidu offline voice playback
            [[BaiDuTtsUtils shared] playBaiDuTTSVoiceWithContent:title];
        }
        else {
            // Use system voice playback
            [[AppleTtsUtils shared] playAppleTTSVoiceWithContent:title];
        }
    }
    else {
        // No need to play voice
    }

}

@end

In the AppleTtsUtils implementation, it is roughly using AVSpeechSynthesizer for direct playback, setting volume and speech rate. Note that:

Volume settings
- It will not play when muted.
- The actual playback volume = set volume * system volume. So even if the volume is set high, if the system volume is low, the playback sound will also be low. (For example, if the system volume is 0.5 and the AVAudioPlayer volume is 0.6, the final volume will be 0.5*0.6 =0.3). The solution is: The final solution borrowed from the scheme of automatically adjusting screen brightness when entering the payment display QR code: If the screen brightness does not reach the threshold, increase the screen brightness to the threshold, and when leaving the page, set the brightness back to the original brightness. Similarly, when playing the prompt sound, if the user's set system volume is lower than the threshold, adjust it to the threshold. After the prompt sound playback is complete, restore the prompt sound to the original volume, which roughly means:
Handling numbers
- For number-to-speech, using the zh-CN voice, the playback method for numbers is in the form of thousands, hundreds, tens, etc. You can handle it by appending a space after each number; iterate through each character of the content, and if it is a number, append a space to the end. Finally, when playing, the numbers will be read out one by one.

#import "AppleTtsUtils.h"
#import <AVFoundation/AVFoundation.h>
#import <AVKit/AVKit.h>

@interface AppleTtsUtils ()<AVSpeechSynthesizerDelegate>

@property (nonatomic, strong) AVSpeechSynthesizer *speechSynthesizer;
@property (nonatomic, strong) AVSpeechSynthesisVoice *speechSynthesisVoice;

@end

@implementation AppleTtsUtils

+ (instancetype)shared {
    static id instance = nil;
    static dispatch_once_t onceToken;
    dispatch_once(&onceToken, ^{
        instance = [[self class] new];
    });
    
    return instance;
}

- (BOOL)isNumber:(NSString *)str
{
   if (str.length == 0) {
        return NO;
    }
    NSString *regex = @"[0-9]*";
    NSPredicate *pred = [NSPredicate predicateWithFormat:@"SELF MATCHES %@",regex];
    if ([pred evaluateWithObject:str]) {
        return YES;
    }
    return NO;
}

- (void)playAppleTtsVoiceWithContent:(NSString *)content {
    
    if ((content == nil) || (content.length <= 0)) {
        return;
    }
    // Number-to-speech, using the zh-CN voice, the playback method for numbers is in the form of thousands, hundreds, tens, etc., so append a space after each number; iterate through each character of the content, and if it is a number, append a space to the end. Finally, when playing, the numbers will be read out one by one.
    NSString *newResult = @"";
    for (int i = 0; i < content.length; i++) {
        NSString *tempStr = [content substringWithRange:NSMakeRange(i, 1)];
        newResult = [newResult stringByAppendingString:tempStr];
        if ([self isNumber:tempStr]) {
            newResult = [newResult stringByAppendingString:@" "];
        }
    }
    // Todo: English to speech
    
    AVSpeechUtterance *utterance = [AVSpeechUtterance speechUtteranceWithString:newResult];
    utterance.rate = AVSpeechUtteranceDefaultSpeechRate;
    utterance.voice = self.speechSynthesisVoice;
    utterance.volume = 1.0;
    utterance.rate = AVSpeechUtteranceDefaultSpeechRate;
    [self.speechSynthesizer speakUtterance:utterance];
}

- (AVSpeechSynthesizer *)speechSynthesizer {
    if (!_speechSynthesizer) {
        _speechSynthesizer = [[AVSpeechSynthesizer alloc] init];
        _speechSynthesizer.delegate = self;
    }
    return _speechSynthesizer;
}

- (AVSpeechSynthesisVoice *)speechSynthesisVoice {
    if (!_speechSynthesisVoice) {
        _speechSynthesisVoice = [AVSpeechSynthesisVoice voiceWithLanguage:@"zh-CN"];
    }
    return _speechSynthesisVoice;
}


- (void)speechSynthesizer:(AVSpeechSynthesizer *)synthesizer didStartSpeechUtterance:(AVSpeechUtterance *)utterance {
    NSLog(@"didStartSpeechUtterance");
}

- (void)speechSynthesizer:(AVSpeechSynthesizer *)synthesizer didCancelSpeechUtterance:(AVSpeechUtterance *)utterance {
    NSLog(@"didCancelSpeechUtterance");
}

- (void)speechSynthesizer:(AVSpeechSynthesizer *)synthesizer didPauseSpeechUtterance:(AVSpeechUtterance *)utterance {
    NSLog(@"didPauseSpeechUtterance");
}

- (void)speechSynthesizer:(AVSpeechSynthesizer *)synthesizer didFinishSpeechUtterance:(AVSpeechUtterance *)utterance {
    NSLog(@"didFinishSpeechUtterance");
    [self.speechSynthesizer stopSpeakingAtBoundary:AVSpeechBoundaryWord];

//    // After each voice playback is completed, we call this code to bring up the notification bar
// This can be exposed to the upper layer through a Block callback
//    self.contentHandler(self.bestAttemptContent);
}

b. Adding Baidu TTS Offline SDK#

Open the Baidu Smart Console, select the application list, create a new application for testing, and after creation, ensure that the bundle ID matches the bundle ID of the Notification Service Extension created, not the main project bundle ID. Be careful!!! As shown below:
Select Offline SDK Management on the left, click Add, then select the newly created application, click Finish, and download the serial number list. Store the AppId, AppKey, SecretKey, and serial number for initializing the offline SDK. As shown below:
When selecting Offline SDK Management on the left, click the right side to download the SDK and the Development Documentation. According to the SDK's instructions:

Integration Guide: It is strongly recommended that users first run the Demo project in the SDK package, which details the usage of speech synthesis and provides complete examples. Generally, you only need to refer to the demo project to complete all integration and configuration work.
After downloading the SDK, open the BDSClientSample project, change the APP_ID, API_KEY, SECRET_KEY, and SN in the TTSViewController.mm file to the ones just applied for, and run the test to see if the voice playback works normally. Successful playback indicates that the application is fine, and you can continue integrating into the project; otherwise, if it does not play after integration, you might suspect it is an SDK issue. 😂 Debugging after integration can indeed make one question life.
Drag the BDSClientHeaders, BDSClientLib, and BDSClientResource folders extracted from the SDK into the Notification Service Extension target, ensuring to check the copy option. Then delete the .gitignore file in the BDSClientLib folder; otherwise, the compilation will fail, I’m not kidding, 😂, a guide to avoid pitfalls.
Add the required system libraries, referring to the dependencies in the BDSClientSample project, and ensure they are added to the Notification Service Extension target, as shown below:
Done, compile the Notification Service Extension target, ensuring to select the correct target. Oh, there is another issue here; the newly created target is based on the version of Xcode, so you also need to modify the minimum target compatibility of this target; otherwise, the default might be 14.4. It will run normally without errors, but breakpoints won’t hit, surprising, 😂.
Add the Baidu voice processing code to the Notification Service Extension target as described above. The BaiDuTtsUtils code is as follows:
- Note that in the configureOfflineTTS method, the loading of offlineSpeechData and offlineTextData resources should be consistent with what is written in the Demo; it is actually the content in the TTS folder of the BDSClientResource folder. If you have downloaded other voice files, load your downloaded voice files here.

#import "BaiDuTtsUtils.h"
#import "BDSSpeechSynthesizer.h"

// Baidu TTS
NSString* BaiDuTTSAPP_ID = @"Your_APP_ID";
NSString* BaiDuTTSAPI_KEY = @"Your_APP_KEY";
NSString* BaiDuTTSSECRET_KEY = @"Your_SECRET_KEY";
NSString* BaiDuTTSSN = @"Your_SN";

@interface BaiDuTtsUtils ()<BDSSpeechSynthesizerDelegate>

@end

@implementation BaiDuTtsUtils

+ (instancetype)shared {
    static id instance = nil;
    static dispatch_once_t onceToken;
    dispatch_once(&onceToken, ^{
        instance = [[self class] new];
    });
    
    return instance;
}

#pragma mark - Baidu TTS

-(void)configureOfflineTTS{
    
    NSError *err = nil;
    NSString* offlineSpeechData = [[NSBundle mainBundle] pathForResource:@"bd_etts_common_speech_m15_mand_eng_high_am-mgc_v3.6.0_20190117" ofType:@"dat"];
    NSString* offlineTextData = [[NSBundle mainBundle] pathForResource:@"bd_etts_common_text_txt_all_mand_eng_middle_big_v3.4.2_20210319" ofType:@"dat"];
//    #error "set offline engine license"
    if (offlineSpeechData == nil || offlineTextData == nil) {
        NSLog(@"Offline synthesis resource files are empty!");
        return;
    }

    err = [[BDSSpeechSynthesizer sharedInstance] loadOfflineEngine:offlineTextData speechDataPath:offlineSpeechData licenseFilePath:nil withAppCode:BaiDuTTSAPP_ID withSn:BaiDuTTSSN];
    if(err){
        NSLog(@"Offline TTS init failed");
        return;
    }
}

- (void)playBaiDuTTSVoiceWithContent:(NSString *)voiceText {
    NSLog(@"TTS version info: %@", [BDSSpeechSynthesizer version]);
    
    [BDSSpeechSynthesizer setLogLevel:BDS_PUBLIC_LOG_VERBOSE];
    // Set delegate object
    [[BDSSpeechSynthesizer sharedInstance] setSynthesizerDelegate:self];
    
    
    [self configureOfflineTTS];

    [[BDSSpeechSynthesizer sharedInstance] setPlayerVolume:10];
    [[BDSSpeechSynthesizer sharedInstance] setSynthParam:[NSNumber numberWithInteger:5] forKey:BDS_SYNTHESIZER_PARAM_SPEED];

    // Start synthesis and playback
    NSError* speakError = nil;
    NSInteger sentenceID = [[BDSSpeechSynthesizer sharedInstance] speakSentence:voiceText withError:&speakError];
    if (speakError) {
        NSLog(@"Error: %ld, %@", (long)speakError.code, speakError.localizedDescription);
    }
}

- (void)synthesizerStartWorkingSentence:(NSInteger)SynthesizeSentence
{
    NSLog(@"Began synthesizing sentence %ld", (long)SynthesizeSentence);
}

- (void)synthesizerFinishWorkingSentence:(NSInteger)SynthesizeSentence
{
    NSLog(@"Finished synthesizing sentence %ld", (long)SynthesizeSentence);
}

- (void)synthesizerSpeechStartSentence:(NSInteger)SpeakSentence
{
    NSLog(@"Began playing sentence %ld", (long)SpeakSentence);
}

- (void)synthesizerSpeechEndSentence:(NSInteger)SpeakSentence
{
    NSLog(@"Finished playing sentence %ld", (long)SpeakSentence);
}


@end

c. Debugging#

The exciting part comes now. If everything compiled without issues, use push notifications for debugging. First, run the main project, then select the Notification Service Extension target to run, set breakpoints in the didReceiveNotificationRequest:withContentHandler: method, and send yourself a push message. You will find that the breakpoint hits here, indicating that the target was created correctly.

Then control the push parameters, isRead and isBaiDu, which determine whether the voice playback of the push notification uses Baidu's voice. Oh, speaking of push parameters, you also need to add the "mutable-content = 1" field in the payload push parameters, e.g.:

{
  "aps": {
  "alert": {
      "title":"Title",
      "subtitle": "Subtitle",
      "body": "Content"
  },
  "badge": 1,
  "sound": "default",
  "mutable-content": "1",
  }
}

During push debugging, you may find that it runs normally, but the voice does not play, whether it is the system or Baidu's. Haha, frustrating, right? Upon closely examining the console, you may find the following error:

Ps: After iOS 12.0, calling system playback with AVSpeechSynthesizer in Notification Service Extension results in the following error.

[AXTTSCommon] Failure starting audio queue alp! 
[AXTTSCommon] _BeginSpeaking: couldn't begin playback

Ps: After iOS 12.0, calling Baidu's SDK for direct playback in Notification Service Extension results in the following error.

[ERROR][AudioBufPlayer.mm:1088]AudioQueue start errored error: 561015905 (!pla)
[ERROR][AudioBufPlayer.mm:1099]Can't begin playback while in background!

Both errors indicate that audio cannot be played in the background. How to solve this? Of course, by adding the background mode field. Open the main project's Signing & Capabilities, add Background Modes, and check Audio, Airplay, and Picture in Picture, as shown below:

OK, try again! After pushing again, you will find — still not working, same error, haha, despair, right? Sorry, let me calm down. Actually, the addition was correct, but you need to note:

After configuring the Notification Service Extension, if you find that the sound still does not play after receiving the notification, open the plist under this Extension's Target, add the Required background modes field, and write "App plays audio or streams audio/video using AirPlay" in item0. After debugging again, you will find that Baidu's voice can be played.

This method may not pass the review because this Extension's target does not actually have background mode settings, as can be seen from Signing & Capabilities. Therefore, if it is not going to be published on the App Store, but just for internal distribution within the company, this method can be used.

After adding it, when you push again, you will find that Baidu's voice can be played, and the playback of numbers, English, and Chinese is perfect, except for the pricing being a bit concerning, everything else is fine.
As for the system's voice playback, if you push the system's first, you will find it cannot play, still the same error; however, if you first push Baidu's, and after Baidu plays, then push the system's, you will find that the system's can also report. But the system's playback of English and numbers may have issues, remember to handle it; you can listen to the pronunciation of the English letter E, the pronunciation is... The solution — none yet found, it is recommended to use third-party synthesized voices.

Since the project does not need to be published to the App Store, this is where it ends. However, for applications that will be published to the App Store, this method is not feasible. Applications that are published to the App Store can only use fixed format audio playback as a solution, i.e., replacing the notification sound. Using fixed format audio or fixed format synthesized audio to replace the notification sound, or using remote push to mute and sending multiple local notifications, each with different sounds, are insights gained from the references at the end.

3. Conclusion#

Directly presenting the organized mind map below, most of the more complex processing logic is actually the handling after iOS 12.0.
推送播放语音.png

今是昨非