Develop a Smart Text Reader App with Unity

Develop a Smart Text Reader App with Unity


In this tutorial, we’re going to create an app that allows you to analyze text through your phone camera and speak it out to you.

Learn, Code, Develop banner from Zenva

If you want to follow along with the tutorial, all you need is Unity and an internet connection.

You can download the complete project from here.

Don't miss out! Offer ends in
  • Access all 200+ courses
  • New courses added monthly
  • Cancel anytime
  • Certificates of completion

Setting up Computer Vision

For this tutorial, we’re going to be using Microsoft’s Azure Cognitive Services. These are machine learning services provided to us by Microsoft. The first one we’ll be getting is Computer Vision. This allows us to send images to the API and return a JSON file containing the text in the image.

I do want to inform you before we continue – using this service will require a credit/debit/bank card. The first 30 days are free and you can choose to cancel or continue afterwards.

To begin, go to the Azure sign up page and click on Start free.

Microsoft Azure homepage

Fill in the sign up form. You’ll need to verify your account with a phone number and card information (you won’t be charged unless you upgrade your account).

Microsoft Azure sign-up page

Once that’s done, you can navigate to the portal. Click on the Create a resource button.

Microsoft Azure dashboard for creating a service

Search for Computer Vision, then click Create.

Microsoft Azure service page for Computer Vision

Here, we can fill in the info for our Computer Vision service.

    • Set the Location where you want
    • Set the Pricing tier to F0 (free)
    • Create a new Resource group

Once that’s done, we can click the Create button.

Computer Vision Create service options in Microsoft Azure

The resource will now begin to deploy. When it’s complete, you should be able to click on the Go to resource button.

Deployment in progress message for Microsoft Azure service

When you get to the resource, go to the Overview tab and copy the Endpoint. This is the URL we’ll use to connect to the API. Then click on the Show access keys… link to see our keys.

Microsoft Azure Image Analyzer page

Here, we want to copy the Key 1 key. This will identify us when calling the API.

ImageAnalyzer Keys page for Microsoft Azure

Setting up Text to Speech

Still in the Azure portal, let’s setup our text to speech service. This is done through the Speech cognitive service.

Microsoft Azure Speech service page

Since we’re using the free licence, we’re restricted on this resource to only use the West US location. With that selected though, choose the F0 pricing tier (the free one). Make sure also to set the Resource group to be the same as the Computer Vision one.

Speech service creation options for Microsoft Azure

Like before, wait for the resource to deploy, then click on the Go to resource button. On this page, go to the Overview tab. Here, we want to copy the Endpoint, then click on the Show access keys… link.

Microsoft Azure ImageTextToSpeech service page

Here, we want to copy the Name (this is our resource name) and Key 1.

Microsoft Azure ImageTextToSpeech keys

Project Setup

There’s one asset we need in order for this project to work, and that’s SimpleJSON. SimpleJSON allows us to easily convert a raw JSON file to an object structure we can easily use in a script.

Download SimpleJSON from GitHub.

Simple JSON github page

In Unity, create a new 2D project and make sure the camera’s Projection is set to Orthographic. Since we’re using 3D, rendering depth is not required.

Unity Main Camera in the Inspector window

Let’s also make the camera’s Background color black so if there’s a problem with the device camera, we’ll just see black.

Unity with Camera background set to black


Now we can work on the UI. Let’s start by creating a canvas (right click Hierarchy > UI > Canvas).

Unity project with UI Canvas added

As a child of the canvas, create a new TextMeshPro – Text object (you may need to import some TMP essentials). This is going to display the text we extract from the image.

    • Set the Anchoring to bottom-stretch
    • Set Height to 200

Unity project with TextMeshPro text added

To make our text easily readable on a mobile display, let’s change some properties:

    • Set Font Style to Bold
    • Set Font Size to 60
    • Set Alignment to center, middle

We can also remove the “New Text” text, since we want nothing there by default.

Unity TextMeshPro text settings in Inspector window

Now we need to add a Raw Image object to the canvas (call it CameraProjection). This is going to be what we apply our WebCamTexture onto (the texture that renders what our device camera sees). Make sure that it’s on-top of the text in the Hierarchy (this makes the text render in-front).

Unity Rect Transform component for UI element

That’s all for our UI! Let’s move onto the scripting now.

CameraController Script

Before we make a script, let’s create a new GameObject (right click Hierarchy > Create Empty) and call it _AppManager. This will hold all our scripts.

_AppManager object as seen in the Hierarchy and Unity Inspector

Create a new C# script (right click Project > Create > C# Script) called CameraController and drag it onto the _AppManager object. This script will render what the device camera sees to a WebCamTexture, then onto the UI.

We need to add Unity’s UI library to our using namespaces.

Next, let’s add our variables.

In the Start function, we’ll create a new WebCamTexture, assign it to the UI and start playing.

The main function is going to be TakePicture. This will be a co-routine, because we need to wait a frame at the beginning. The function converts the pixels of the camTex to a byte array – which we’ll be sending to the Computer Vision API.

Then in the Update function, we can trigger this co-routine by either a mouse press (for testing in the editor), or a touch on the screen.

We can now go back to the Editor, and drag in the CameraProjection object to the script.

Unity CameraProjection object added to Camera Controller

AppManager Script

Create a new C# script called AppManager and attach it to the object too. This script sends the image data to the Computer Vision API and receives a JSON file which we then extract the text from.

We’ll begin by adding in our using namespaces.

Our first variables are what we need to connect to the API.

Then we need the UI text element we made before.

Since this script will need to be accessed by the text-to-speech one (we’ll make that next), we’re going to create an instance of it.

The main function is a co-routine which will do what I mentioned above.

First, let’s make the text show that we’re calculating.

Then we need to create a web request (using Unity’s system). Setting the method to POST, means that we’re going to be sending data to the server.

A download handler is how we’re going to access the JSON file once, the image has been analyzed and we get a result.

Let’s then setup the upload handler.

We also need to add our subscription key to the headers.

With that all done, let’s send the web request and wait for a result.

Now that we have our data, let’s convert it to a JSON object using SimpleJSON.

With this, we want to extract just the readable text (a function we’ll make next), then display the text on screen.

The GetTextFromJSON function, takes in a JSON object and extracts just the text that’s been analyzed in the image and returns it as a string.

Let’s now go back to the CameraController script and down to the bottom of the TakePicture function. Here, we’re going to send the image data over to the AppManager script to be analyzed.

Back in the Editor, add your Computer Vision subscription key and endpoint url with /v2.0/ocr at the end. Mine is:

Unity Text Mesh Pro object added to App Manager component

TextToSpeech Script

Create a new C# script called TextToSpeech and attach this to the _AppManager object. This script will take in text, send it to the Speech API and play the TTS voice.

We’ll only be needing the networking namespace for this script.

Since we’re playing an audio clip through this script, we’ll need to make sure there’s an AudioSource attached.

For our first variables, we’re going to keep track of the info we need to connect to the API.

Then we need to keep track of our access token and audio source.

Finally, let’s create an instance and set it in the Awake function.

The first thing we’ll need to do is create a function to get an access token. This token is needed in order to use the API.

Step 1, is to create a new web request and set the url to be the endpoint with our region included.

Then we can set the request header to contain our sub key and send it off.

When we get a result, check for an error and log it if so. Otherwise, set our access token.

With this done, we can call the co-routine in the Start function.

Let’s now work on the GetSpeech function (co-routine), which will send the text to the API and return a voice clip.

The first thing we need to do, is create the body. This is where we’ll store the info for the API to read.

Then like before, we can create a new web request.

Next, we want to upload the body to the request.

Then we’ll set the headers, which will identify us and also include some info for the returning audio.

Now we can send the request and wait for a result. If we get an error, return.

The PlayTTS function (co-routine) takes in the audio data as a byte array and saves it temporarily as a .wav file. Then we load that in, convert it to an audio clip and play it through the audio source.

Back in the Editor, we can fill in the component’s properties.

Unity Text To Speech component settings

That’s it for the scripting! If you have a webcam, you can try it out right now in the Editor. Otherwise, you can build the app to your device and try it on there.


Congratulations on finishing the tutorial! If you followed along, you now have a complete app with text to speech capabilities. If you wish to make additions, or just have the project ready to use, you can download the project files here.