Ocr gremlins: Difference between revisions

From SikhiWiki
Jump to navigationJump to search
No edit summary
No edit summary
 
Line 1: Line 1:
When many old articles in English have been converted into electronic form using {{w|Optical character recognition}} or OCR then various error are common. This article list some of the errors that have been detected so far:
When many old articles in English have been converted into electronic form using {{w|Optical character recognition}} or OCR then various error are common. This article list some of the errors that have been detected so far:


{|style="width:70%; background:#f0f0f0; " border="1" cellpadding="7" cellspacing="0"
{|style="width:90%; background:#f0f0f0; " border="1" cellpadding="7" cellspacing="0"
|width=20%| '''Real character '''
|width=20%| '''Real character '''
|width=80%| '''Erroneous ocr character'''
|width=30%| '''Erroneous ocr character'''
|width=50%| '''Examples'''
|-
|-
|m || rn - as small case RN
|m || rn - as small case RN || "him" is recognised as "hirn" also "learn" as "leam"
|-
|-
| this ||  dlis
| th ||  dl || "this" is recognised as "dhis"
|-
|-
| in ||  m
| in ||  m || "insert" is recognised as  "msert"
|-
|-
| ri ||  n
| ri ||  n || "rift" is recognised as "nft"
|-
|-
| u ||  ii
| u ||  ii || "must" is recognised as "miist" also ii). as u).
|-
|-
| l || i
| i || l - small case 'L'|| "missed" is recognised as "mlssed"
|-
|-
| 1 || l - the small case letter 'L'
| 1 || l - the small case letter 'L' || "145" is recognised as "L45" also "learned" as "1earned"
|-
|-
| Z || 2
| Z || 2 || "4325" is recognised as "43Z5" also "Zebra" as "2ebra"
|-
|-
| add new here || on this line
| add new here || on this line || examples here
|}
|}


{{Improve}}
{{Improve}}

Latest revision as of 13:00, 20 March 2010

When many old articles in English have been converted into electronic form using Optical character recognition or OCR then various error are common. This article list some of the errors that have been detected so far:

Real character Erroneous ocr character Examples
m rn - as small case RN "him" is recognised as "hirn" also "learn" as "leam"
th dl "this" is recognised as "dhis"
in m "insert" is recognised as "msert"
ri n "rift" is recognised as "nft"
u ii "must" is recognised as "miist" also ii). as u).
i l - small case 'L' "missed" is recognised as "mlssed"
1 l - the small case letter 'L' "145" is recognised as "L45" also "learned" as "1earned"
Z 2 "4325" is recognised as "43Z5" also "Zebra" as "2ebra"
add new here on this line examples here