The action is basic voice recognition and text-to-speech rendering as a result right now:
(1) Dial the number
(2) The caller is greeted, and is asked to speak his/her choice from four mystery genre options: legal, noir, satirical, political
(3) The caller speaks his/her choice of setting: US or non-US
(4) The caller chooses a male or female protagonist.
(5) The caller hears a mash-up of a couple of sentences meeting the selected criteria.
SUCCESS!!! As long as you do not select “non-US”(renders as “non-us” and so is not recognized in database) or pronounce “female” with too much “fvvv” (I just got “shemale”– a new variation to add to the other erroneous pronunciation recognitions I’ve gotten in Google VR’s hearing of “female”).
VIDEO, but you can try it yourself (I just fixed my code to pull more than one “intro” from the database. In my enthusiasm last night, I left the testing single-intro sentence version (which I used because I wanted to make sure that Asterisk would handle at least one) in the code. This video shows only the single-sentence version working, but more than one sentence does get pulled. Code to come–I have to put it in Github now.
Maybe I should not dare to do it, but I’m going to try to add a couple of different spellings into the database to see if MysteryCorpsePhone will return sentences whose criteria may include “non-US”/”non-us” or “male”/(at this point, rendered consistently as “mail”) and speak them correctly. UPDATE: I have not added “a few different spellings” – that would have been a bridge too far in such short notice. I did, however, just go in and change all the “male” protagonist fields to the spelling that Google VR is rendering: “mail” I cannot test it right now, because in calling the ITP Asterisk number I got an error–bill not paid—notice. However, I will try again later and will try to pull it together on the Rackspace server Marcel now that I know my program works.
The demonstration in class of the voice interface showed me that I would need to allow people to speak the answers to the option questions directly, rather than speaking number-substitutes. I think that the numbers are more readily understood by google tts, but it’s very clunky and makes little sense. Why not keypress if you’re going to use that substitution? I actually had noted at least two instances in testing beforehand, in which my speaking number options was not recognized correctly, so it may only be a little better than throwing caution to the wind and allowing the users to say the options.
MYSTERY on Asterisk.ITP-redial.com Server: The main Asterisk number, my extension.
TWO THINGS I MISSED
(1) Wrong path on the database–kept missing and missing until I looked closely.
(2) Somehow started the Extensions.conf edits outside of the asterisk_conf folder on the server.
LIMITATIONS OF VOICE RECOGNITION
As I told the class when I demonstrated on Tuesday, I used numbers to implement the options because I thought that Asterisk would have a harder time with multi-syllabic choices. However, as I also mentioned on Tuesday, I had already had errors in voice recognition just using the numbers.
I have removed what seemed to be the system’s voice-learning aspect (the hello-pound element at the beginning), because it did not seem to provide any benefit and it slowed things down.
Given the obvious clunkiness of the interface using numbers, I reworked things to use the options themselves. This works, but somewhat erratically. Because it interests me, I want to continue to investigate, but here are some strange mis-hearings Google voice recognition has produced. These examples are from very clear pronunciations, in quiet surroundings, by a classic, plain and flat American accent (my own). For “female” which was often mis-recognized, I tried emphasizing the “f” emphasizing it more as a “fv” sound, as well as saying the word normally and briskly. I have not yet discovered the fine distinctions of pronunciation that may be required for Google VR to recognize “female” consistently and accurately, but I will keep trying:
“female” often gets recognized as “email” or “gmail”
“male” more often than not gets rendered as its homophone, “mail” which can affect adversely the result since the database spells “male” as “male”. I may change the .yml file so that it recognizes both.
“US” has been recognized as “you-off”
I was excited when “political” was recognized correctly twice in a row, and “satirical” was also rendered correctly. “Noir” is not well-recognized, or well-pronounced in this version of the text-to-speech script, for some reason.
I’ve included some screen shots of these crazy pronunciations.
US = “YOU AT”/ FEMALE = “EMAIL” In the second example, Google VR recognizes “political” but does not recognize “female” again.
Though I initially implemented my Rackspace server in April, after using all of the scripts provided to add the necessary items, I believe I may have accidentally hit “rebuilt” after re-setting my password. I surmise this is so because when I returned in late April, I found no server, no applications, nothing. I easily loaded everything back in, but I encountered some difficulties this time, as I prepared the server for immediate work.
I resolved most of these difficulties eventually. However, I have produced my final on the ITP server in the end because I wanted it to work properly there first before I moved it to rackspace.