Translate speech to text on iOS using the Speech framework

Today we will translate speech into text. First, make sure you have the latest version of iOS and Xcode installed . To Speech Framework need at least iOS 10 . I created a project with SwiftUI support , this requires iOS 13 . But this is not necessary, you can use the Storyboard .

If you do not know what SwiftUI is and want a quick overview, here you are .

Create a new project โ€œFile> New> Project ...โ€ , select โ€œSingle View Appโ€ and โ€œUser Interface: SwiftUIโ€ . The project will look something like this:



Select a fileContentView.swift and change โ€œstruct ContentView ...โ€ to:

struct ContentView: View {
    @ObservedObject var speechRec = SpeechRec()
    var body: some View {
        Text(speechRec.recognizedText)
            .onAppear {
                self.speechRec.start()
        }
    }
}

class SpeechRec: ObservableObject {
    @Published private(set) var recognizedText = ""
    func start() {
        recognizedText = "!"
    }
}


ContentView is what we show on the screen. SpeechRec is where we will translate speech into text. We will keep the recognized text on a recognizedText , and the ContentView will display this on the screen.

Resolution


First, we need to ask the user for permission. Select the Info.plist file and add two keys there: NSSpeechRecognitionUsageDescription and NSMicrophoneUsageDescription .



Import Speech and ask permission:

import Speech
...
class SpeechRec: ObservableObject {
    ...
    func start() {
        SFSpeechRecognizer.requestAuthorization { status in
            
        }
    }
}


If you run, will ask permission:


.

Translate speech to text


To translate speech into text, we need to use SFSpeechRecognizer with localization โ€œru-Ruโ€ to recognize Russian speech. Then you need to specify the source, in our case it is a stream from the microphone.

Latest version of our class:

...

class SpeechRec: ObservableObject {
    @Published private(set) var recognizedText = ""
    
    let speechRecognizer = SFSpeechRecognizer(locale: Locale(identifier: "ru-RU"))
    var recognitionRequest: SFSpeechAudioBufferRecognitionRequest?
    var recognitionTask: SFSpeechRecognitionTask?
    let audioEngine = AVAudioEngine()
    
    func start() {
        self.recognizedText = "..."
        SFSpeechRecognizer.requestAuthorization { status in
            self.startRecognition()
        }
    }
    
    func startRecognition() {
        do {
            recognitionRequest = SFSpeechAudioBufferRecognitionRequest()
            guard let recognitionRequest = recognitionRequest else { return }
            
            recognitionTask = speechRecognizer?.recognitionTask(with: recognitionRequest) { result, error in
                if let result = result {
                    self.recognizedText = result.bestTranscription.formattedString
                }
            }
            
            let recordingFormat = audioEngine.inputNode.outputFormat(forBus: 0)
            audioEngine.inputNode.installTap(onBus: 0, bufferSize: 1024, format: recordingFormat) { buffer, _ in
                recognitionRequest.append(buffer)
            }
            
            audioEngine.prepare()
            try audioEngine.start()
        }
        
        catch {
            
        }
    }
}


After asking permission, the recognition process immediately begins. I tried to write briefly and therefore missed some necessary checks.

Now run and say something in Russian. Your speech will be displayed on the screen.


.

What's next?


Documentation:
developer.apple.com/documentation/speech

WWDC video:
developer.apple.com/videos/all-videos/?q=Speech

GitHub project:
github.com/usenbekov/speech-to-text-demo

Source: https://habr.com/ru/post/undefined/


All Articles