Overview
Syllabus
Intro
Business goal
Case 2: the Doctor's Consultation Certificate
Case 1: the Death Certificate
Data pipeline
Mining
Original image
Line segmentation
Baseline
Training set definition
Data quality
Data partitioning: split into text lines
Deep-Learning Model
Ensemble modelling
Model Training Performance
Model accuracy
Prediction confidence levels
Confidence plot
Examples target population
Examples false positives
Statistics false positives
Examples low confidence
Example: type A
Language detection
Certificate type detection
Preprocessing
Training data
Nomenclature code
Date of Consultation
Comparison OCR - Neural Network
The Grid (5)
Some grid examples
Approach
Examples: 2 lines
Training on histogram features
Results number of lines prediction
Reading the lines
Examples: high confidence false predictions
Grid summary
Second challenge: different patterns
Fourth challenge: rotations
Fifth challenge: superposition and readability
Step 1: find the stamp
Intersection over Union
Summary stamp reading
Application Integration
Conclusion
Taught by
Devoxx