Skip to main content

Microsoft OCR Library for Windows Runtime

This article is also available on the Microsoft TechNet Wiki.

Introduction

Microsoft OCR Library for Windows Runtime has been released as a NuGet package last year. 
It enables developers to easily add text recognition capabilities in your Windows Phone 8/8.1 and Windows 8.1 Store apps.
It was designed with flexibility and performance in mind, as it allows for OCR of high variety of image types and has numerous performance optimizations. 
Another cool feature is that the image processing is done on the client side.
This article demonstrates how to get started with the Microsoft OCR Library and provides an example where it ti used in a windows Store App.

Using the Microsoft OCR Library

Step 1: Install the nuget package

Step 2: Create and instance of OcrEngine.


OcrEngine ocrEngine = new OcrEngine(OcrLanguage.English);
The code above Initializes a new instance of the OcrEngine class and specifies the language to use for optical character recognition (OCR).
OcrLanguage defines the language of text for OCR to detect in the target image.



Step 3: Select which file to use and open a random-access stream oven the file.

var file = await Package.Current.InstalledLocation.GetFileAsync("g.jpg");
   using (var stream = await file.OpenAsync(Windows.Storage.FileAccessMode.Read))
}

Step 4: Create an instance of the image decoded


var decoder = await BitmapDecoder.CreateAsync(stream);

The code above Asynchronously creates a new BitmapDecoder using a specific bitmap codec and initializes it using a stream.

Step 5: Get the image width and height.


width = decoder.PixelWidth;
height = decoder.PixelHeight;

Step 6: Read the pixels data from the image.
var pixels = await decoder.GetPixelDataAsync(
  BitmapPixelFormat.Bgra8,
  BitmapAlphaMode.Straight,
  new BitmapTransform(),
  ExifOrientationMode.RespectExifOrientation,
  ColorManagementMode.ColorManageToSRgb
);

The method decoder.GetPixelDataAsync takes the following parameters:
a. BitmapPixelFormat: Specifies the pixel format of pixel data. Each enumeration value defines a channel ordering, bit depth, and data type.
b. BitmapAlphaMode: Specifies the alpha mode of pixel data.
c. BitmapTransform: Contains transformations that can be applied to pixel data.
d. ExifOrientationMode: Specifies the EXIF orientation flag behavior when obtaining pixel data.
e. ColorManagementMode: Specifies the color management behavior when obtaining pixel data.

Step 7: Extract text from image


OcrResult result = await ocrEngine.RecognizeAsync(height, width, pixels.DetachPixelData());

The method RecognizeAsync Scans the specified image for text in the language specified by the Language property.

This method reeturns an object of type OcrResult which contains a collection of OcrLine objects, which you access through the Lines property of the OcrResult.

Step 8: Loop through the lines and retrieve the text.


string recognizedText = "";
// Check whether text is detected.
if (result.Lines != null)
{
   // Collect recognized text.
   foreach (var line in result.Lines)
   {
      foreach (var word in line.Words) 
      {
            recognizedText += word.Text + " ";
      }
      recognizedText += Environment.NewLine;
    }
}
Each OcrLine object contains a collection of OcrWord objects, which can be accessed through the Words property of each OcrLine.
Each OcrWord object specifies the text, size, and position information of the word in the image.

Example: Microsoft OCR Library in a Windows Store App.

The example below shows how to extract text from an image, display the text and make the App "speak" the contents of the image.



The layout consists of the following elements:

    <Grid Background="{ThemeResource ApplicationPageBackgroundThemeBrush}"
        <MediaElement Grid.Row="0" x:Name="media" AutoPlay="True"/> 
        <Button x:Name="btnSelectImage" Content="Select Image" HorizontalAlignment="Left" Height="47"Margin="110,435,0,0" VerticalAlignment="Top" Width="136" Click="btnSelectImage_Click"/> 
        <Image x:Name="img" HorizontalAlignment="Left" Height="368" Margin="44,47,0,0" VerticalAlignment="Top"Width="447"/> 
        <TextBlock x:Name="txtTrasnlatedText" HorizontalAlignment="Left" Height="368" Margin="547,47,0,0"TextWrapping="Wrap" VerticalAlignment="Top" Width="437" FontSize="30" /> 
        <Button x:Name="btnSpeak" Content="Speak!" HorizontalAlignment="Left" Height="47" Margin="266,435,0,0"VerticalAlignment="Top" Width="137" Click="btnSpeak_Click" Visibility="Collapsed"/> 
    </Grid>

Step 1: Select the image

The image is loaded using a file picker after which, the image is passed to the method ReadImage.

        private async void btnSelectImage_Click(object sender, RoutedEventArgs e)
        {
            FileOpenPicker openPicker = new FileOpenPicker();
            openPicker.ViewMode = PickerViewMode.Thumbnail;
            openPicker.SuggestedStartLocation = PickerLocationId.PicturesLibrary; 
            openPicker.FileTypeFilter.Add(".jpg");
            openPicker.FileTypeFilter.Add(".jpeg");
            openPicker.FileTypeFilter.Add(".png");

            StorageFile file = await openPicker.PickSingleFileAsync();
            if (file != null)
            {
                BitmapImage image = new BitmapImage();
                IRandomAccessStream fileStream = await file.OpenAsync(Windows.Storage.FileAccessMode.Read);
                image.SetSource(fileStream);
                img.Source = image;

                string text = await ReadImage(file);
                txtTrasnlatedText.Text = text;

                btnSpeak.Visibility = Visibility.Visible;
            
            else 
            
                txtTrasnlatedText.Text = "Could not load image"
            
        }


Step 2: Retrieve the text from the image

The method ReadImage uses the library discussed above to extract the text from the image.


        public async Task<string> ReadImage(StorageFile file)
        {
            ocrEngine = new OcrEngine(OcrLanguage.English);

            using (var stream = await file.OpenAsync(Windows.Storage.FileAccessMode.Read)) 
            {
                // Create image decoder.
                var decoder = await BitmapDecoder.CreateAsync(stream);

                width = decoder.PixelWidth;
                height = decoder.PixelHeight;

                // Get pixels in BGRA format. 
                var pixels = await decoder.GetPixelDataAsync(
                    BitmapPixelFormat.Bgra8,
                    BitmapAlphaMode.Straight,
                    new BitmapTransform(),
                    ExifOrientationMode.RespectExifOrientation,
                    ColorManagementMode.ColorManageToSRgb);

                // Extract text from image.
                OcrResult result = await ocrEngine.RecognizeAsync(height, width, pixels.DetachPixelData());

                string recognizedText = "";
                // Check whether text is detected.
                if (result.Lines != null)
                
                    // Collect recognized text.

                    foreach (var line in result.Lines)
                    {
                        foreach (var word in line.Words)
                        {
                            recognizedText += word.Text + " ";
                        }
                        recognizedText += Environment.NewLine;
                    }
                }

                return (recognizedText);
            }
        }


Step 3: The "speak!" method

This method uses speech to read the text extracted from the image.


        private void btnSpeak_Click(object sender, RoutedEventArgs e)
        {
            Speak(txtTrasnlatedText.Text);
        }

        public async void Speak(string Text)
        {

            // The media object for controlling and playing audio.
            MediaElement mediaElement = this.media;

            // The object for controlling the speech synthesis engine (voice).
            var synth = new Windows.Media.SpeechSynthesis.SpeechSynthesizer();

            // Generate the audio stream from plain text.
            SpeechSynthesisStream stream = await synth.SynthesizeTextToStreamAsync(Text);

            // Send the stream to the media object.
            mediaElement.SetSource(stream, stream.ContentType);
            mediaElement.Play();
                }

References

a. http://blogs.windows.com/buildingapps/2014/09/18/microsoft-ocr-library-for-windows-runtime/
b. http://msdn.microsoft.com/en-us/library/windowspreview.media.ocr.ocrengine.ocrengine.aspx
c. http://msdn.microsoft.com/en-us/library/windowspreview.media.ocr.ocrengine.language.aspx
d. http://msdn.microsoft.com/en-us/library/windows/apps/br226193
e. http://msdn.microsoft.com/en-us/library/windowspreview.media.ocr.ocrengine.recognizeasync.aspx
f. https://code.msdn.microsoft.com/Uses-the-OCR-Library-to-2a9f5bf4

Comments

Popular posts from this blog

Creating and Querying Microsoft Azure DocumentDB

DocumentDB is the latest storage option added to Microsoft Azure.
It is a no-sql storage service that stores JSON documents natively and provides indexing capabilities along with other interesting features.

This article is available available on theMicrosoft Technet Wiki. This article was highlighted in theTop Contributor awardson the 12th of October 2014. This article was highlighted in the TNWiki Article Spotlight. This article was highlighted in the The Microsoft TechNet Guru Awards! (October 2014).


DocumentDB is the latest storage option added to Microsoft Azure.
It is a no-sql storage service that stores JSON documents natively and provides indexing capabilities along with other interesting features.
This wiki shall introduce you to this new service.

Setting up a Microsoft Azure DocumentDBGo to the new Microsoft Azure Portal. https://portal.azure.com/ 


 Click on New > DocumentDB


Enter A Database ID and hit Create!



Query Unstructured Data From SQL Server Using PolyBase

Scope The following article demonstrates how unstructured data and relational data can be queried, joined and processed in a single query using PolyBase, a new feature in SQL Server 2016. Pre-RequisitesIntroduction to Big Data Analytics Using Microsoft Azure Big Data Analytics Using Hive on Microsoft Azure Analyze Twitter Data With Hive in Azure HDInsight Running Hadoop on Linux using Azure HDInsight  Introduction Traditionally, Big Data is processed using Apache Hadoop which is totally fine. But what if the result of this needs to be linked to the traditional Relation Database? For example, assume that from the analysis of tons of application logs, marketing needs to contact some customs that faced problems in an application following a failure in the application.
This problem is solved with PolyBase. PolyBase allows you to use Transact-SQL (T-SQL) statements to access data stored in Hadoop or Azure Blob Storage and query it in an ad-hoc fashion. It also lets you query semi-structure…

Creating and Deploying Microsoft Azure WebJobs

Azure WebJobs enables you to run programs or scripts in your website as background processes. It runs and scales as part of Azure Web Sites.
This article focuses on the basics of WebJobs before demonstrating an example where it can be used.

This article is also available on the Mirosoft TechNet Wiki.
This article was highlighted in the The Microsoft TechNet Guru Awards! (October 2014).


Introduction
What is Microsoft Azure WebJobs?
Azure WebJobs enables you to run programs or scripts in your website as background processes. It runs and scales as part of Azure Web Sites.

What Scheduling Options is supported by Microsoft Azure WebJobs? Azure WebJobs can run Continuously, On Demand or on a Schedule.
In what language/scripts are WebJobs written?
Azure WebJobs can be created using the following scripts:  .cmd, .bat, .exe (using windows cmd).ps1 (using powershell).sh (using bash).php (using php).py (using python).js (using node)In this article, the use of c# command line app shall be demonstrated.
Cr…