Ocr gremlins: Difference between revisions

From SikhiWiki
Jump to navigationJump to search
(Created page with 'When many old articles in English have been converted into electronic form using {{w|Optical character recognition}} or OCR then various error are common. This article list some …')
 
No edit summary
 
(One intermediate revision by the same user not shown)
Line 1: Line 1:
When many old articles in English have been converted into electronic form using {{w|Optical character recognition}} or OCR then various error are common. This article list some of the errors that have been detected so far:
When many old articles in English have been converted into electronic form using {{w|Optical character recognition}} or OCR then various error are common. This article list some of the errors that have been detected so far:


{|style="width:70%; background:#f0f0f0; " border="1" cellpadding="7" cellspacing="0"
{|style="width:90%; background:#f0f0f0; " border="1" cellpadding="7" cellspacing="0"
|width=20%| '''Real character '''
|width=20%| '''Real character '''
|width=80%| '''Erroneous ocr character'''
|width=30%| '''Erroneous ocr character'''
|width=50%| '''Examples'''
|-
|-
|m || rn - as small case RN
|m || rn - as small case RN || "him" is recognised as "hirn" also "learn" as "leam"
|-
|-
| this ||  dlis
| th ||  dl || "this" is recognised as "dhis"
|-
|-
| in ||  m
| in ||  m || "insert" is recognised as  "msert"
|-
|-
| ri ||  n
| ri ||  n || "rift" is recognised as "nft"
|-
|-
| u ||  ii
| u ||  ii || "must" is recognised as "miist" also ii). as u).
|-
|-
| l || i
| i || l - small case 'L'|| "missed" is recognised as "mlssed"
|-
|-
| 1 || l - the small case letter 'L'
| 1 || l - the small case letter 'L' || "145" is recognised as "L45" also "learned" as "1earned"
|-
|-
| Z || 2
| Z || 2 || "4325" is recognised as "43Z5" also "Zebra" as "2ebra"
|-
|-
| add new here || on this line
| add new here || on this line || examples here
|}
|}


{{admin}}
{{Improve}}

Latest revision as of 13:00, 20 March 2010

When many old articles in English have been converted into electronic form using Optical character recognition or OCR then various error are common. This article list some of the errors that have been detected so far:

Real character Erroneous ocr character Examples
m rn - as small case RN "him" is recognised as "hirn" also "learn" as "leam"
th dl "this" is recognised as "dhis"
in m "insert" is recognised as "msert"
ri n "rift" is recognised as "nft"
u ii "must" is recognised as "miist" also ii). as u).
i l - small case 'L' "missed" is recognised as "mlssed"
1 l - the small case letter 'L' "145" is recognised as "L45" also "learned" as "1earned"
Z 2 "4325" is recognised as "43Z5" also "Zebra" as "2ebra"
add new here on this line examples here