Optical Character Recognition from Videography
First Author, Second Author, Third
The goal of this paper is to provide a new methodology to detect and recognize
the text from the video frames. Here the task performed is divided into three
steps; combination of that detect and recognize the text from the video frame.
The video frame creation devides the input video into several individual frames
that grabbed and passed to the rest two phases. The text detection is a
two-step approach, which involves text localization phase and the text
verification phase. The text recognition involves in text verification phase
and the optical character recognition phase. The final outcome of this paper is
the detection of the text from the video frames in a text or word file.
Index Terms— OCR-Videography; Methodology;
Video Segmentation; NTSC; Text Extraction; Text recognition; Video Frame;
—————————— u ——————————
ow a days digital video is very popular as an exchange medium
due to large improvement in video recording and compression techniques and
increasing of network-speed. Therefore audiovisual recordings are used more frequently
in e-learning and e-lecturing systems. OCR from videography is a technique that
can locate any text inside a digital video file via reading and automatic extraction of any notes and captions
1 that gives the actual information (like – the names of people, places or
description of objects etc.) about the video being presented. Detecting the video-content
requires many technologies like scanning, pre-processing, search strategies, video
segmentation, feature-extraction, recognition etc. Reading the extracted notes
and captions gives more appropriate information to understand the video-content.
Applying OCR 3,4,5 on video and combining the results with various detecting
techniques can improve the detection result. Although integrated character recognition
in text-based videos is needed greatly.
are various research fields which concern
character recognition from videos. In character extraction from the video difficulties
are present due to different conditions of title, character size and complex
backgrounds. In video caption resolution of character is lower and the
background complexity is more severe than in other research. The first problem
is low resolution of the characters. An image size is limited
by title, number of scan lines defined in the NTSC standard. Therefore, the resolution
of characters in the video-caption is insufficient to implement stable and
robust OCR-Videography systems. Another problem is the presence of complex backgrounds.
Characters superimposed on videos often have similar hue and brightness to the
background. Presence of these problems in OCR-Videography systems has opened a
new area for research work. OCR 3,4,5 from videography is a technique
that can greatly help to locate the topics of interest in a large digital video
via the automatic extraction and reading of captions and notes.
2 Problem statement
OCR-Videography on digital video and combining the obtained results with other
video analysis techniques will improve text detection of the video content. Text
recognition from digital video is a challenging problem because of the presence
of rich, dynamic backgrounds, low resolution, color, etc. A strategy is
required to process the video images to produce high-resolution binarized text images
that resemble printed text and minimize the error rate while performing
word “OCR-VIDEOGRAPHY” means performing the OCR 3,4,5 technique on a
digital video to extract text present in the video. There are some algorithms
like video-segmentation, feature-extraction, recognition that are performed to
extract the text. At first the text blocks present in the digital video is
detected using text detection process and then in segmentation stage these text
blocks are segmented into single characters to generate the templates.
4 Methodology of ocr-videography
shown that two different types of text present in videos and images. One is
scene text that appears suddenly and present in different scenes such as
street-names, shop-names, T-shirts etc. Other is overlay text that is carefully
directed in contrast. It is an important carrier of information and also
suitable for indexing and retrieval.
of different fonts with different font size, type, style, and color vary text
occurrences. Some more research work is needed in these specific domains with
some limited variations in these attributes.
4.1 Image Segmentation
segmentation is theprocess of partitioning a digital image into
multiple segments i.e. sets of pixels, also known as super-pixels. Segmentation is
applyed to simplify the representation of an image into something
that is more meaningful and easier to analyze. This is an initial stage of
4.2 Text Detection
There exist different
types of text detection methods on the basis of color, edge and texture 2.
method: It assumes the video text is made of a uniform color. In
this method the red color component is used to obtain high contrast edges between
text and background.
Edge-based method: It is used
to detect overlay text that contain rich edge information. The main procedure is
to apply an edge detector to the video-frames and then identify the high edge
Texture-based method: In this method the salient point detection and the wavelet
transform have been used to
detect the text regions.
4.3 Image Enhancement
Enhancement 8 is used to improve the overall quality of an image, so that the
result becomes more appropiate. Sometimes same text block may present in
differrent video-frames or background may be different for the same text block.
This is a typical characteristic of text present in video-frames. The enhanced
image is computed by aligning the different cases of a particular text region
across frames and for each pixel choosing the color corresponds to the minimum
intensity value across frames. Other filtration methods like: mean, median and
maximum also can be applyed.
Pixel image can be
converted into binary image using the binarization 6,7 method. There are
different techniques available for binarization; we need to choose the most appropriate
one to obtain the better result.
An IJSER copyright form must accompany
your final submission. You can get a .pdf, .html, or .doc version at http://computer.org/copyright.htm. Authors are responsible for obtaining any security
For any questions about initial or
final submission requirements, please contact one of our staff members. Contact
information can be found at: http://www.ijser.org.
6 Applications of video ocr
is one of the popular research topics. We have to be more careful while creating
our own OCR-Videography technique. This technique can be applied only on a snapshot
of video screen or in case of a binarized image. First we have to identify the
text present in video. Next we need to separate foreground and background in binarization
Some of the applications
editing and cataloguing: catalogs video databases on basis of content
digitization helps to digitize cover of journals, magazines and different types
of videos using advanced image and video optical character recognition (OCR).
tehnique can easily mark the degraded texts present in video and increases the
ability to recognize them.
of “Headlines” from a live video in broadcasting media.
name identification from live video. Etc.
The preferred spelling of
the word “acknowledgment” in American English is without an “e” after the “g.”
Use the singular heading even if you have many acknowledgments. Avoid
expressions such as “One of us (S.B.A.) would like to thank … .” Instead,
write “F. A. Author thanks … .” Sponsor and financial support acknowledgments
are included in the acknowledgment section. For example: This work was supported
in part by the US Department of Commerce under Grant BS123456 (sponsor and
financial support acknowledgment goes here). Researchers that contributed
information or assistance to the article should also be acknowledged in this
and future work
This research is based on
building an efficient system for identifying any greyscale values present in
videos. The main reason of building this system is to fulfill the the needs of
multimedia indexing, library digitizing and management. The technique is
divided into some subtasks: recognizing the text, segment the background and
investigations on other aspects need to be pursued for developing video text
detection and recognition applications and related multimedia retrieval and
is explotion of context text information of text strings from a video. Future
research can focus on mining the relationship between the content of the corresponding
text and categories of video shots.
aspect is OCR can be used for font encoding, spell checker and finding gramatical
errors in sentence. A converter could be implemented to convert texts from one
language to another by recognizing the characters first that are present in the
video and then by converting them.
The authors wish to thank
A, B, C. This work was supported in part by a grant from XYZ.
Sato T., T. Kanade, E. K. Hughes, and M. A. Smith, “Video OCR
digital news archive,” in Proc. IEEE Workshop Content-Based Access
Image Video Database, 1998, pp. 52-60.
Li h., D. Doermann, 0. Kia, “Automatic Text Detection and
in Digital Video.”, IEEE Trans on Image Processing, 2000, vol. 9,
no.1, pp. 147 -156.
“Optical Character Recognition”, NorskRegnesentral, Oslo, Norway,
Rep. 876, 1993.
C.P. Sumathi, T.
Santhanam, G.Gayathri Devi, “A Survey
On Various Approaches Of text Extraction InImages”,
International Journal of Computer Science
Survey (IJCSES). Vol.3, August 2012, Page no. 27-42.
Honggang Zhang, KailiZhao, Yi-ZheSong, JunGuo,
“Text extraction from natural scene image: A
survey”,Elsevier journal on Neurocomputing ,pp.310-
I. Pratikakis, K. Ntirogiannis, B.
Gatos “Performance evaluation
methodology for historical document image binarization” (IEEE).
7 S.N. Srihari, Ying Liu “Document image binarization based on
texture features” (IEEE).
8 Gurpreet kaur, Rajdavinder
Singh “Image Enhancement and Its Techniques- A Review”