使用语音框架在iOS上将语音转换为文本

今天,我们将语音转换为文本。首先,请确保您已安装最新版本的iOSXcode。 To Speech Framework至少需要iOS 10。我创建了一个具有SwiftUI支持的项目,这需要iOS 13。但这不是必需的,您可以使用Storyboard

如果您不知道SwiftUI是什么,并且想快速浏览一下这里就是

创建一个新项目“ File> New> Project ...”,选择“ Single View App”“ User Interface:SwiftUI”。该项目将如下所示:



选择一个文件ContentView.swift并将“ struct ContentView ...”更改为:

struct ContentView: View {
    @ObservedObject var speechRec = SpeechRec()
    var body: some View {
        Text(speechRec.recognizedText)
            .onAppear {
                self.speechRec.start()
        }
    }
}

class SpeechRec: ObservableObject {
    @Published private(set) var recognizedText = ""
    func start() {
        recognizedText = "!"
    }
}


ContentView是我们在屏幕上显示的内容。SpeechRec是我们将语音转换为文本的地方。我们将可识别的文本保留在公认的 Text上ContentView会将其显示在屏幕上。

解析度


首先,我们需要征求用户的许可。选择Info.plist文件,然后在其中添加两个键:NSSpeechRecognitionUsageDescriptionNSMicrophoneUsageDescription



导入语音并征得许可:

import Speech
...
class SpeechRec: ObservableObject {
    ...
    func start() {
        SFSpeechRecognizer.requestAuthorization { status in
            
        }
    }
}


如果您运行,将询问权限:




将语音翻译成文字


要将语音转换为文本,我们需要使用本地化为“ ru-Ru”的SFSpeechRecognizer来识别俄语语音。然后,您需要指定源,在本例中,它是来自麦克风的流。 我们课程的最新版本:



...

class SpeechRec: ObservableObject {
    @Published private(set) var recognizedText = ""
    
    let speechRecognizer = SFSpeechRecognizer(locale: Locale(identifier: "ru-RU"))
    var recognitionRequest: SFSpeechAudioBufferRecognitionRequest?
    var recognitionTask: SFSpeechRecognitionTask?
    let audioEngine = AVAudioEngine()
    
    func start() {
        self.recognizedText = "..."
        SFSpeechRecognizer.requestAuthorization { status in
            self.startRecognition()
        }
    }
    
    func startRecognition() {
        do {
            recognitionRequest = SFSpeechAudioBufferRecognitionRequest()
            guard let recognitionRequest = recognitionRequest else { return }
            
            recognitionTask = speechRecognizer?.recognitionTask(with: recognitionRequest) { result, error in
                if let result = result {
                    self.recognizedText = result.bestTranscription.formattedString
                }
            }
            
            let recordingFormat = audioEngine.inputNode.outputFormat(forBus: 0)
            audioEngine.inputNode.installTap(onBus: 0, bufferSize: 1024, format: recordingFormat) { buffer, _ in
                recognitionRequest.append(buffer)
            }
            
            audioEngine.prepare()
            try audioEngine.start()
        }
        
        catch {
            
        }
    }
}


征得许可后,识别过程立即开始。我试图简短地写信,因此错过了一些必要的检查。

现在运行并用俄语说些什么。您的语音将显示在屏幕上。




下一步是什么?


文档:
developer.apple.com/documentation/speech

WWDC视频:
developer.apple.com/videos/all-videos/

q = Speech GitHub项目:
github.com/usenbekov/speech-to-text-demo

Source: https://habr.com/ru/post/undefined/


All Articles