This month, Doug Steele looks at how you can integrate SAPI (the Speech Application Programming Interface) into your applications.
Is there some way to add Speech Technology into my application?
One of Microsoft’s less publicized features is SAPI (the Speech Application Programming Interface), which has existed as part of Windows for many years. Office 2003 has made the SAPI capabilities a little more obvious. For example, Access 2003 (plus most of the other Office products) provides easy interaction with the Speech Recognition capabilities. In addition, Excel 2003 includes a Text to Speech toolbar that allows you to read the contents of worksheet cells (the toolbar is accessible through Tools | Speech). You can configure your SAPI options through the Language tab of the Regional and Language Options applet on the Control Panel. The good news is that it’s also possible to use these capabilities in versions of Access prior to Access 2000.
Peter Vogel discussed using the TextToSpeech ActiveX control that comes with Microsoft Agent in his article “Making Your Application Talk” (Smart Access, September 2003). However, going that route puts a huge pair of Rolling Stone-type lips on your form. I’ll show you how to get the same audio equivalent without having to add any ActiveX controls to your page.
There are two parts to SAPI: Speech Recognition (SR) and Text-To-Speech (TTS), both of which are provided by separate modules, called “engines.” Together, they provide the ability to recognize human speech as input and create human-like audio output from printed text. Users can select any speech engine they prefer to use, as long as it conforms to the SAPI interface.
I’m not going to go into details about how sophisticated you can get with both. For example, you can fine-tune pronunciation of specific words if the default rendering isn’t to your satisfaction. If you want details, download the SAPI 5.1 SDK from www.microsoft.com/downloads/details.aspx?FamilyID=5e86ec97-40a7-453f-b0ee-6583171b4530&DisplayLang=en. Most of the VB samples can be run in Access as is (at least, in Access 2000 and newer; more about that later).
One warning: This article is aimed at developers working with the English language. While SAPI is supposed to work in other languages, I have no experience using it in anything other than English.
How do I implement Text-To-Speech in Access?
In a nutshell, it’s necessary to create an SpVoice object, and use the ISpVoice interface to submit and control speech synthesis.
Instantiating an instance of SpVoice is simple. Using late binding (so as not to have problems with the References collection if the same version of SAPI isn’t present on all machines), two lines of code are required:
Dim objVoice As Object Set objVoice = GetObject("SAPI.SpVoice")
To simplify (perhaps too much), there are three main properties of the SpVoice object that you’ll want to use:
- Voice–Gets and sets the currently active member of the Voices collection.
- Rate–Gets and sets the speaking rate of the voice.
- Volume–Gets and sets the base volume (loudness) level of the voice.
If SAPI has been installed on your machine, it will have been installed with a set of predefined voices. The following code will give a list of what voices exist on the workstation:
Dim objVoice As Object Dim objToken As Object Set objVoice = CreateObject("SAPI.SpVoice") For Each objToken In objVoice.GetVoices Debug.Print objToken.GetDescription Next objToken
The GetVoices method allows you to be selective as to what voices are returned by passing a criteria string to the GetVoices method. You can specify, for example, that you only want to get female voices back by specifying the Gender, as this code does:
Dim objVoice As Object Dim objToken As Object Set objVoice = CreateObject("SAPI.SpVoice") For Each objToken In objVoice.GetVoices _ ("Gender=Female") Debug.Print objToken.GetDescription Next objToken
The documentation states that Voice attributes include Gender, Age, Name, Language, and Vendor but, to be honest, I’ve never gotten Age or Vendor to work.
To find out what the active voice is, you can use the GetDescription property of the Voice object:
Dim objVoice As Object Set objVoice = CreateObject("SAPI.SpVoice") Debug.Print objVoice.Voice.GetDescription
To set the voice to an installed voice whose name you know, you set the Voice property of the spVoice object to one of the Voices returned by the GetVoices method:
Dim objVoice As Object Set objVoice = CreateObject("SAPI.SpVoice") Set objVoice.Voice = objVoice.GetVoices _ ("Name=Microsoft Mary")(0)
Note the (0) at the end of the last statement. That (0) is required because GetVoices returns a collection (even if there’s only one voice in the collection). Adding the (0) ensures the first voice in the collection is the one passed to the Voice property. If you wanted to type the command out in full, you’d use this:
Set objVoice.Voice = objVoice.GetVoices( _ "Name=Microsoft Mary").Items(0) Since the Items collection is the default, you can omit it.
The Rate property determines the speaking rate of the voice. It’s a numeric value between -10 and 10. -10 represents the slowest speaking rate, while 10 represents the fastest. The Volume property (hopefully it’s obvious what it determines) is a numeric value between 0 and 100, where 0 is the minimum volume level, and 100 is the maximum.
Armed with that information, here’s how easy it is to get Access to say something to you, using the Speak method of the SpVoice object:
Dim objVoice As Object Set objVoice = CreateObject("SAPI.SpVoice") Set objVoice.Voice = objVoice.GetVoices _ ("LH Michael")(0) objVoice.Rate = 2 objVoice.Volume = 75 objVoice.Speak _ "Hello, this is Access talking to you! "
As a more practical example, you can try the sample application in the download. I copied the Employees table and the Employees form from the Northwind database that comes with Access, and added a few lines of code to the form to have it speak the name of each employee as you scroll through the list. All that was required was to update the Current event (which fires as you move to a new record) to pass some text to the Speak method:
Dim mobjVoice As Object Private Sub Form_Current() Dim strText As String If mobjVoice Is Nothing Then Set mobjVoice = CreateObject("SAPI.SpVoice") End If If Me.NewRecord Then strText = "This is a blank record." Else strText = "This is " & Me.FirstName & " " & _ Me.LastName & ", " & Me.Title End If mobjVoice.Speak strText, 1 Or 2 End Sub
In this sample code, I’ve used the second parameter of the Speak method to set some options. By default, the Speak method uses the following settings:
- Speak the given text string synchronously (that is, hold up your program until the Speak method finishes).
- Do not purge pending speak requests.
- Parse the text as XML only if the first character is a left-angle-bracket.
- Do not persist global XML state changes across speak calls.
- Do not expand punctuation characters into words.
All of these settings can be overridden, though, by using various flags as the optional second argument to the Speak method as I did in my sample code. My settings cause the speech to be performed asynchronously and for pending requests to be purged. In other words, I’m telling SAPI that it should return control to the program as soon as it gets the text to speak (rather than waiting until the voice finishes speaking) and that SAPI should purge all pending speak requests before it attempts to speak the current phrase. If you don’t use those flags, you’ll find that the form won’t actually show the next employee until it’s finished speaking the employee’s name, and that if you switch from employee to employee too quickly, SAPI won’t be able to keep up. Table 1 lists all of the options.
Table 1. SpeechVoiceSpeakFlags values.
|Default settings (as defined above).
|Specifies that the Speak call should be asynchronous (i.e. it will return immediately after the speak request is queued).
|Purges all pending speak requests prior to this speak call.
|The string passed to the Speak method is a file name rather than text. As a result, the string itself is not spoken but rather the file the path points to is spoken.
|The input text will be parsed for XML markup.
|The input text will not be parsed for XML markup.
|Global state changes in the XML markup will persist across speak calls.
|Punctuation characters should be expanded into words (for example, “This is it.” would become “This is it period”).
Since I’m using late binding, if SAPI hasn’t been installed on the workstation where your code is running, you’ll get an error when you try to instantiate SAPI (the line of code that calls the CreateObject method). Make sure you have appropriate error handling to alert the user.
What about Speech Recognition?
Before continuing, let me mention that I haven’t been able to get Speech Recognition to work with late binding, nor have I been able to get it to work with Access 97.
You can’t use late binding with Speech Recognition because you need to be able to react to the Recognition events that are fired when Speech Recognition is working: As a sentence or phrase is recognized, SAPI raises an event that allows you to get the text equivalent of the phrase that was just recognized. In order to capture events, you must declare your variables as a specific object type (rather than the generic Object data type, which requires early binding).
The reason Speech Recognition won’t work with Access 97 is that Access 97 won’t compile the declaration for the Recognition event. To be honest, I have no idea why not, and since Access 97 is no longer officially supported by Microsoft, it’s difficult to get answers anymore.
With those caveats out of the way, here’s a brief look at Speech Recognition. One of the reasons I’ll be brief is that, as with Text-To-Speech, very little code needs to be written.
You need to declare a global object to represent an instance of the SpSharedRecoContext object (which defines a “recognition context”). You also need a second global object to represent an instance of the ISpeechRecoGrammar object (the interface that manages the words and phrases that the SR engine will recognize).
The recognition context is an object that allows you to start and stop recognition and receive recognition results (among other events). But that’s just the tip of the iceberg: The recognition context also controls the collection of available words. When attempting to decode speech, SAPI polls the words available within the context to find matches. One of the benefits of using the SpSharedRecoContext is that you can have different contexts in different parts of your application. Rather than having a single list of words that would, presumably, contain every possible word that might be spoken, you can have dedicated lists that represent just the words that are relevant to a particular part of your application.
While it’s possible to define your own list of words to use for matching purposes, I’m afraid that’s outside of the scope of this column.
You must declare a global object representing the instance of SpSharedRecoContext with the keyword WithEvents, since SpSharedRecoContext is the interface that raises the Recognition events. In the following code, the WithEvents keyword specifies that mobjRecoContext is an object variable that will accept events fired by the SpSharedRecoContext object:
Dim WithEvents mobjRecoContext _ As SpSharedRecoContext Dim mobjGrammar As ISpeechRecoGrammar
Once you’ve declared the objects, you then need to add code to the Recognition events. Recognition events occur whenever the speech recognition engine feels that it has matched a spoken word or phrase and has done so with a sufficiently high confidence level to pass the text version of the word on to you. If a spoken word isn’t matched for the specific recognition context or the quality of speech doesn’t meet the minimum confidence score, a FalseRecognition event is raised. Spoken content may not meet the confidence score for several reasons including background interference, inarticulate speech, or an uncommon word or phrase.
The Recognition event accepts four parameters:
- StreamNumber–The stream number owning the recognition.
- StreamPosition–The position within the stream.
- RecognitionType–A SpeechRecognitionType constant that specifies the RecognitionType or the recognition state of the engine.
- Result–An ISpeechRecoResult object containing the recognition results.
(The FalseRecognition event is similar, but doesn’t have the RecognitionType parameter.) For the purposes of this column, only the Result parameter is of interest. The most important property of the ISpeechRecoResult object passed in the Result parameter is the PhraseInfo property whose GetText method contains the string representing a recognized word.
In the sample database, there’s a form called frmSpeechRecognition that has a textbox txtDictation on it. As each Recognition event is raised, my code in the mobjRecoContext Recognition event adds the most recently recognized text to that textbox:
Private Sub mobjRecoContext_Recognition( _ ByVal StreamNumber As Long, _ ByVal StreamPosition As Variant, _ ByVal RecogType As SpeechRecognitionType, _ ByVal Result As ISpeechRecoResult _ ) Dim strText As String
As mentioned, what the Speech Recognition engine thinks was said is returned in the Result parameter in the GetText method of the PhraseInfo property. The next line in the routine retrieves that string:
strText = Result.PhraseInfo.GetText
Since the spoken word has now been converted into text, I just append the new text to the textbox, and add a space at the end so that it looks better:
If Len(Me.txtDictation) > 0 Then Me.txtDictation = Me.txtDictation & _ " " & strText Else Me.txtDictation = strText End If Me.txtDictation.SelStart = 0 End Sub
That was the good stuff. However, before you can use that code, you need to do some housekeeping to set up the environment. This code creates an SpSharedRecoContext object and creates an ISpeechRecoGrammar object from the context object. Once the ISpeechRecoGrammar object is created, I load it with a word list using the DictationLoad method:
If (mobjRecoContext Is Nothing) Then Set mobjRecoContext = New SpSharedRecoContext Set mobjGrammar = _ mobjRecoContext.CreateGrammar(1) mobjGrammar.DictationLoad End If
As you can see it’s not necessary to specify the specific words that SAPI will try to match. If you were designing an application with specific terms you wanted to be sure to recognize, you’d probably want to provide such a list, but SAPI does a reasonably good job without that level of customization.
There are two last housekeeping tasks. To start speech recognition, you have to turn Dictation on for the grammar object:
mobjGrammar.DictationSetState SGDSActive When you're done, you can turn Dictation off: mobjGrammar.DictationSetState SGDSInactive
I know this has been an extremely cursory look at SAPI, but hopefully it’s enough to get you interested. Good luck!