Sunday, 4 January 2015

Microsoft OCR Library for Windows Runtime

This article is also available on the Microsoft TechNet Wiki.

Introduction

Microsoft OCR Library for Windows Runtime has been released as a NuGet package last year. 
It enables developers to easily add text recognition capabilities in your Windows Phone 8/8.1 and Windows 8.1 Store apps.
It was designed with flexibility and performance in mind, as it allows for OCR of high variety of image types and has numerous performance optimizations. 
Another cool feature is that the image processing is done on the client side.
This article demonstrates how to get started with the Microsoft OCR Library and provides an example where it ti used in a windows Store App.

Using the Microsoft OCR Library

Step 1: Install the nuget package

Step 2: Create and instance of OcrEngine.


OcrEngine ocrEngine = new OcrEngine(OcrLanguage.English);
The code above Initializes a new instance of the OcrEngine class and specifies the language to use for optical character recognition (OCR).
OcrLanguage defines the language of text for OCR to detect in the target image.



Step 3: Select which file to use and open a random-access stream oven the file.

var file = await Package.Current.InstalledLocation.GetFileAsync("g.jpg");
   using (var stream = await file.OpenAsync(Windows.Storage.FileAccessMode.Read))
}

Step 4: Create an instance of the image decoded


var decoder = await BitmapDecoder.CreateAsync(stream);

The code above Asynchronously creates a new BitmapDecoder using a specific bitmap codec and initializes it using a stream.

Step 5: Get the image width and height.


width = decoder.PixelWidth;
height = decoder.PixelHeight;

Step 6: Read the pixels data from the image.
var pixels = await decoder.GetPixelDataAsync(
  BitmapPixelFormat.Bgra8,
  BitmapAlphaMode.Straight,
  new BitmapTransform(),
  ExifOrientationMode.RespectExifOrientation,
  ColorManagementMode.ColorManageToSRgb
);

The method decoder.GetPixelDataAsync takes the following parameters:
a. BitmapPixelFormat: Specifies the pixel format of pixel data. Each enumeration value defines a channel ordering, bit depth, and data type.
b. BitmapAlphaMode: Specifies the alpha mode of pixel data.
c. BitmapTransform: Contains transformations that can be applied to pixel data.
d. ExifOrientationMode: Specifies the EXIF orientation flag behavior when obtaining pixel data.
e. ColorManagementMode: Specifies the color management behavior when obtaining pixel data.

Step 7: Extract text from image


OcrResult result = await ocrEngine.RecognizeAsync(height, width, pixels.DetachPixelData());

The method RecognizeAsync Scans the specified image for text in the language specified by the Language property.

This method reeturns an object of type OcrResult which contains a collection of OcrLine objects, which you access through the Lines property of the OcrResult.

Step 8: Loop through the lines and retrieve the text.


string recognizedText = "";
// Check whether text is detected.
if (result.Lines != null)
{
   // Collect recognized text.
   foreach (var line in result.Lines)
   {
      foreach (var word in line.Words) 
      {
            recognizedText += word.Text + " ";
      }
      recognizedText += Environment.NewLine;
    }
}
Each OcrLine object contains a collection of OcrWord objects, which can be accessed through the Words property of each OcrLine.
Each OcrWord object specifies the text, size, and position information of the word in the image.

Example: Microsoft OCR Library in a Windows Store App.

The example below shows how to extract text from an image, display the text and make the App "speak" the contents of the image.



The layout consists of the following elements:

    <Grid Background="{ThemeResource ApplicationPageBackgroundThemeBrush}"
        <MediaElement Grid.Row="0" x:Name="media" AutoPlay="True"/> 
        <Button x:Name="btnSelectImage" Content="Select Image" HorizontalAlignment="Left" Height="47"Margin="110,435,0,0" VerticalAlignment="Top" Width="136" Click="btnSelectImage_Click"/> 
        <Image x:Name="img" HorizontalAlignment="Left" Height="368" Margin="44,47,0,0" VerticalAlignment="Top"Width="447"/> 
        <TextBlock x:Name="txtTrasnlatedText" HorizontalAlignment="Left" Height="368" Margin="547,47,0,0"TextWrapping="Wrap" VerticalAlignment="Top" Width="437" FontSize="30" /> 
        <Button x:Name="btnSpeak" Content="Speak!" HorizontalAlignment="Left" Height="47" Margin="266,435,0,0"VerticalAlignment="Top" Width="137" Click="btnSpeak_Click" Visibility="Collapsed"/> 
    </Grid>

Step 1: Select the image

The image is loaded using a file picker after which, the image is passed to the method ReadImage.

        private async void btnSelectImage_Click(object sender, RoutedEventArgs e)
        {
            FileOpenPicker openPicker = new FileOpenPicker();
            openPicker.ViewMode = PickerViewMode.Thumbnail;
            openPicker.SuggestedStartLocation = PickerLocationId.PicturesLibrary; 
            openPicker.FileTypeFilter.Add(".jpg");
            openPicker.FileTypeFilter.Add(".jpeg");
            openPicker.FileTypeFilter.Add(".png");

            StorageFile file = await openPicker.PickSingleFileAsync();
            if (file != null)
            {
                BitmapImage image = new BitmapImage();
                IRandomAccessStream fileStream = await file.OpenAsync(Windows.Storage.FileAccessMode.Read);
                image.SetSource(fileStream);
                img.Source = image;

                string text = await ReadImage(file);
                txtTrasnlatedText.Text = text;

                btnSpeak.Visibility = Visibility.Visible;
            
            else 
            
                txtTrasnlatedText.Text = "Could not load image"
            
        }


Step 2: Retrieve the text from the image

The method ReadImage uses the library discussed above to extract the text from the image.


        public async Task<string> ReadImage(StorageFile file)
        {
            ocrEngine = new OcrEngine(OcrLanguage.English);

            using (var stream = await file.OpenAsync(Windows.Storage.FileAccessMode.Read)) 
            {
                // Create image decoder.
                var decoder = await BitmapDecoder.CreateAsync(stream);

                width = decoder.PixelWidth;
                height = decoder.PixelHeight;

                // Get pixels in BGRA format. 
                var pixels = await decoder.GetPixelDataAsync(
                    BitmapPixelFormat.Bgra8,
                    BitmapAlphaMode.Straight,
                    new BitmapTransform(),
                    ExifOrientationMode.RespectExifOrientation,
                    ColorManagementMode.ColorManageToSRgb);

                // Extract text from image.
                OcrResult result = await ocrEngine.RecognizeAsync(height, width, pixels.DetachPixelData());

                string recognizedText = "";
                // Check whether text is detected.
                if (result.Lines != null)
                
                    // Collect recognized text.

                    foreach (var line in result.Lines)
                    {
                        foreach (var word in line.Words)
                        {
                            recognizedText += word.Text + " ";
                        }
                        recognizedText += Environment.NewLine;
                    }
                }

                return (recognizedText);
            }
        }


Step 3: The "speak!" method

This method uses speech to read the text extracted from the image.


        private void btnSpeak_Click(object sender, RoutedEventArgs e)
        {
            Speak(txtTrasnlatedText.Text);
        }

        public async void Speak(string Text)
        {

            // The media object for controlling and playing audio.
            MediaElement mediaElement = this.media;

            // The object for controlling the speech synthesis engine (voice).
            var synth = new Windows.Media.SpeechSynthesis.SpeechSynthesizer();

            // Generate the audio stream from plain text.
            SpeechSynthesisStream stream = await synth.SynthesizeTextToStreamAsync(Text);

            // Send the stream to the media object.
            mediaElement.SetSource(stream, stream.ContentType);
            mediaElement.Play();
                }

References

a. http://blogs.windows.com/buildingapps/2014/09/18/microsoft-ocr-library-for-windows-runtime/
b. http://msdn.microsoft.com/en-us/library/windowspreview.media.ocr.ocrengine.ocrengine.aspx
c. http://msdn.microsoft.com/en-us/library/windowspreview.media.ocr.ocrengine.language.aspx
d. http://msdn.microsoft.com/en-us/library/windows/apps/br226193
e. http://msdn.microsoft.com/en-us/library/windowspreview.media.ocr.ocrengine.recognizeasync.aspx
f. https://code.msdn.microsoft.com/Uses-the-OCR-Library-to-2a9f5bf4

No comments:

Post a Comment