在iOS上使用VisionKit和Vision Framework进行文档扫描和文本识别

今天,我们将扫描文档并显示该文档中可识别的文本。您无需为此安装其他库:VisionKit用于扫描,而Vision可用于文本识别。



首先,请确保您已安装Xcode 11iOS 13,然后创建一个具有Storyboard支持的新项目

我们将使用摄像机进行扫描。因此,我们需要将NSCameraUsageDescription添加Info.plist,如果没有该应用程序,它将崩溃。



扫描


为了扫描文档,我们使用VisionKit Framework要打开扫描屏幕,您需要从VNDocumentCameraViewController创建一个新示例并输出:

let scanner = VNDocumentCameraViewController()
scanner.delegate = self
present(scanner, animated: true)

VNDocumentCameraViewControllerDelegate添加ViewController

class ViewController: UIViewController, VNDocumentCameraViewControllerDelegate {
...

单击“取消”或错误后,关闭打开的屏幕:

func documentCameraViewControllerDidCancel(_ controller: VNDocumentCameraViewController) {
  controller.dismiss(animated: true)
}

func documentCameraViewController(_ controller: VNDocumentCameraViewController, didFailWithError error: Error) {
  controller.dismiss(animated: true)
}

扫描并单击“保存”后,将执行以下操作:

func documentCameraViewController(_ controller: VNDocumentCameraViewController, didFinishWith scan: VNDocumentCameraScan) {
  for i in 0 ..< scan.pageCount {
    let img = scan.imageOfPage(at: i)
    // recognizeText(inImage: img)
  }
  controller.dismiss(animated: true)
}

每个页面都可以单独处理。

文字识别


我们弄清楚了扫描过程,现在我们提取了文本。

为了使一切顺利进行,我们将在后台进行识别。为此,创建一个DispatchQueue

lazy var workQueue = {
  return DispatchQueue(label: "workQueue", qos: .userInitiated, attributes: [], autoreleaseFrequency: .workItem)
}()

对于识别,我们需要一个VNImageRequestHandler用图片和VNRecognizeTextRequest与选项recognitionLevelcustomWordsrecognitionLanguages,以及完成处理,这将使结果以文本形式。完成后,我们收集最佳文本选项并显示:

lazy var textRecognitionRequest: VNRecognizeTextRequest = {
    let req = VNRecognizeTextRequest { (request, error) in
        guard let observations = request.results as? [VNRecognizedTextObservation] else { return }
        
        var resultText = ""
        for observation in observations {
            guard let topCandidate = observation.topCandidates(1).first else { return }
            resultText += topCandidate.string
            resultText += "\n"
        }
        
        DispatchQueue.main.async {
            self.txt.text = resultText
        }
    }
    return req
}()

VNImageRequestHandler

func recognizeText(inImage: UIImage) {
    guard let cgImage = inImage.cgImage else { return }
    
    workQueue.async {
        let requestHandler = VNImageRequestHandler(cgImage: cgImage, options: [:])
        do {
            try requestHandler.perform([self.textRecognitionRequest])
        } catch {
            print(error)
        }
    }
}

最新的ViewController


import UIKit
import Vision
import VisionKit

class ViewController: UIViewController, VNDocumentCameraViewControllerDelegate {
    @IBOutlet weak var txt: UITextView!
    
    lazy var workQueue = {
        return DispatchQueue(label: "workQueue", qos: .userInitiated, attributes: [], autoreleaseFrequency: .workItem)
    }()
    
    lazy var textRecognitionRequest: VNRecognizeTextRequest = {
        let req = VNRecognizeTextRequest { (request, error) in
            guard let observations = request.results as? [VNRecognizedTextObservation] else { return }
            
            var resultText = ""
            for observation in observations {
                guard let topCandidate = observation.topCandidates(1).first else { return }
                resultText += topCandidate.string
                resultText += "\n"
            }
            
            DispatchQueue.main.async {
                self.txt.text = self.txt.text + "\n" + resultText
            }
        }
        return req
    }()

    @IBAction func startScan(_ sender: Any) {
        txt.text = ""
        
        let scanner = VNDocumentCameraViewController()
        scanner.delegate = self
        present(scanner, animated: true)
    }
    
    func recognizeText(inImage: UIImage) {
        guard let cgImage = inImage.cgImage else { return }
        
        workQueue.async {
            let requestHandler = VNImageRequestHandler(cgImage: cgImage, options: [:])
            do {
                try requestHandler.perform([self.textRecognitionRequest])
            } catch {
                print(error)
            }
        }
    }
    
    // MARK: - Document Camera VC Delegate
    
    func documentCameraViewController(_ controller: VNDocumentCameraViewController, didFinishWith scan: VNDocumentCameraScan) {
        for i in 0 ..< scan.pageCount {
            let img = scan.imageOfPage(at: i)
            recognizeText(inImage: img)
        }
        
        controller.dismiss(animated: true)
    }
    
    func documentCameraViewControllerDidCancel(_ controller: VNDocumentCameraViewController) {
        controller.dismiss(animated: true)
    }
    
    func documentCameraViewController(_ controller: VNDocumentCameraViewController, didFailWithError error: Error) {
        print(error)
        controller.dismiss(animated: true)
    }
    
}




下一步是什么?


文档:
developer.apple.com/documentation/vision
developer.apple.com/documentation/visionkit

WWDC语音框架视频:
developer.apple.com/videos/all-videos/

q = Vision GitHub项目:
github.com/usenbekov /视觉演示

Source: https://habr.com/ru/post/undefined/


All Articles