Voice Squad
Voice command technology is the dream of every Star Trek fan. So far the reality has not lived up to the sci-fi dream, but the Web could change that. Geof Wheelwright and Steve Gold report
It seems that voice command technology has been with us for ever ? and has been awful for ever. Those of us with memories longer than we might care to admit can probably still recall the Apricot PCs of the mid 80s with the built-in microphone, and the television ads showing it in use.
In the ads, a suave yuppie with a silky voice would stroll up to the microphone and sternly command his Apricot computer to: ?Show me the sales figures.? Magically, a set of numbers would appear on the screen.
The commercial so impressed one punter that at a trade show he was seen turning blue as he yelled at one of the computers on the Apricot stand and got nothing in return for his cries of ?Show me the f***ing sales figures!?
It has all been downhill from there ? at least until the advent of the Web. Over the past 10 years, voice recognition technology has developed to the point where it is actually reasonable to think about ?voice-enabling? applications to recognise a series of verbal short cut commands. So that?s exactly what recent software offerings from IBM and Motorola have done to let users surf the Net simply by issuing commands into a microphone.
IBM has funded research into voice recognition technology for many years. It even included voice recognition in version 4 of OS/2 Warp when it was released last year.
Big Blue is not ignoring the mass market for speech recognition, however, and has developed a product aimed at users of Netscape Navigator 3 running on Windows 95 ? Voice Type Connection NS Edition. This became available in beta form on IBM?s Web site in December last year, and it promises to let people navigate the Net by talking.
Using the product, people will be able to control Navigator with their voice alone. IBM suggests they will be able to ask the computer to go to a site on the Web and move around within it just by talking to the computer.
IBM says its ?text word spotting? technology will allow people to surf by speaking only a few of the words in a hotlink, creating voice-enabled short cuts to their favourite Web sites. IBM also promises that although Navigator users will see a few changes in the interface, they should not notice any significant differences.
In addition, people who use Voice Type Connection NS Edition will get the benefit of direct dictation into Netscape email or Web pages if they add any IBM?s dictation product, such as Voice Type Simply Speaking or the recently introduced Voice Type 3.
Of course, this is what it is all about. Voice Type Connection NS edition is really meant to function as little more than a high-tech add-on to IBM?s family of voice recognition products to get people to believe the technology actually works.
The other product that this freebie software is designed to show off is the Voice Type Developers Toolkit, which was used to create Voice Type Connection NS edition. IBM says Voice Type Connection uses object code insertion techniques to integrate the Voice Type speech recognition technology with third-party software applications to make them speech-aware.
IBM has other such products in development and pledges that each future edition will be fully compatible with other Voice Type Connection editions, as well as the whole Voice Type family.
The company says Voice Type Connection NS edition works with Netscape Navigator 3 or 3.01 running on most Pentium computers equipped with a microphone and an industry-standard sound card, such as Creative?s Sound Blaster or IBM?s Mwave media processor. The software needs a 90MHz Pentium PC running Windows 95 with 11Mb additional hard disk space, 256K of level-2 cache, and a minimum 5Mb of Ram above any other memory requirements.
IBM is not alone in offering voice technology for the Net. Motorola?s information systems group put its hat into the ring when it showed off its SM34DFV V.34 data/fax/voice host-based modem at Comdex Fall in Las Vegas last year.
The demonstration by Motorola showed how users could surf the Web using verbal commands to a Pentium system configured with the modem and loaded with speech recognition software from Lexicus, another Motorola division. By speaking into the system, visitors could initiate a connection to an internet service provider and navigate their way around the Web to various sites, including some using software that supports audio streaming over the link.
The fact that voice recognition and processing technology is firmly established in the PC industry is also illustrated by the Voice Europe 96 show last autumn. Show organiser Advanstar says the event had record attendances, making it the largest computer telephony show of its kind in Europe.
Advanstar representative Paul Stockford says there was a definite increase in the number of internet-based solutions at the show.
?Web-based solutions were apparent throughout the exhibition and the ones I saw made absolute sense from an applications standpoint,? he says.
Stockford maintains that test-based information delivery is a perfect complement to traditional telephony-based voice solutions. ?These types of innovations will, I believe, positively impact on the growth of the voice processing industry well into the next decade.?
Simon Cooper, the organiser of Voice Europe, says the increase in the number of companies exhibiting reflects the show?s status in the industry. Each year, he says, the number of exhibiting firms increases substantially.
?We?re now seeing exhibitors from call centre, voice recording, and speech and cable industries, in addition to the traditional telecoms companies,? he says. ?Voice Europe has become the meeting place for the industry to exchange ideas, as well as to showcase new products and technologies.
?Three hundred and eighty delegates attended this year?s conference, including 156 who attended the first ever European symposium on speech recognition.?
Given the history of voice recognition technology, however, vocal command of Web browsers is probably not the next big thing. But it seems there are now some realistic goals for the technology.
This could create some interesting market opportunities, particularly in developing solutions for anyone who needs to navigate the Web but is unable or unwilling to use a keyboard.
Companies that concentrate on ways of selling systems that use this technology into niche areas are more likely to achieve success than those who think voice command will be the way that everyone navigates the Web.
Speak Up
Another reason why you might want to speak into the microphone on your PC while surfing the Net is to conduct internet-based telephone calls. Like voice command technology, this has been an imperfect science for quite a while ? but seems to be improving quickly and gaining customer momentum in the process.
A recent report, Internet Telephony: An Alternative Dial-tone, from research company IDC/Link suggests that in 1997 internet telephony will be a key part of the long distance and international telecoms market.
?Real-time telephony over the internet is becoming more than a standalone product,? suggests the company. ?It is catalysing the development of multimedia telephony applications such as whiteboards, application and document sharing, multiuser data conferencing and ultimately real-time video. This will lead to new revenue streams and opportunities and will account for 12.5 billion minutes of use by 2001.?
The reasons behind this have a lot to do with the fact that the quality of internet is finally making it credible as a business tool.
?Multimedia telephony applications appeal to business and consumer market segments,? says Rona Shuchat, author of the report and manager of IDC/LINK?s residential broadband services research programme. ?Multimedia telephony will permit live interaction with an operator, or interaction with voice messaging systems, while viewing the products and simultaneously placing orders online. This can be applied to customer service, helpdesk support, electronic shopping and distance learning.?
The report further suggests that the biggest impact of multimedia telephony awaits the introduction of voiceover internet services from the world?s telephone carriers.
This could put a lot of pressure on corporates to upgrade the multimedia facilities in their office PCs, in particular so that their sound cards will be capable of handling full duplex communications.
Let's Talk Business
Ovum?s report Voice Processing: Business Opportunities in Computing and Telephony, issued last year, notes that market acceptance of voice processing, which allows people to communicate with computers using the telephone, depends on how the technology is applied.
David Bradshaw, the lead author of the report, argues that callers must receive some benefit from calling voice processing systems, otherwise they will refuse to call them.
?It would be commercial suicide for companies to neglect customer service standards and focus solely on using voice processing to achieve higher efficiency and lower costs in-house,? he explains.
Bradshaw's report notes that telcos will be major beneficiaries of developments in the voice processing market. The report also predicts that telcos will generate revenues of $20.1 billion within the next five years if they:
- provide their own services for subscribers, such as voice dialling, voicemail and automated directory services;
- provide bureau services for third parties;
- support the use of voice processing by businesses, which will have nearly 21 million ports of voice processing installed worldwide by 2001.
The report claims to identify several factors behind the growth of the voice processing market. These include:
- the merging of three types of voice processing ? audiotext, voicemail and interactive voice response services;
- the increase in personal productivity systems which employ voice processing;
- the development of usable speech recognition technology.
- Speech recognition, the report argues, is the key to several major European markets with low tone dialling penetration, such as in Germany.
According to the report, the US will continue to be the largest market for voice processing, with revenues of $2.9 billion, although its growth rate will be low because of the maturity of the market.
The growth rate will be highest in the European market, the report predicts, with revenues of $2.4 billion by 2001. In Asia Pacific, where the largest voice processing markets are Australia, Japan and South Korea, revenues are predicted to reach $500 million by 2001.
PC Dealer Key Links
- The beta version of Voice Type Connection NS Edition is available at: http://www.software.ibm.com/is/voicetype/vtconn/vtconn.html
- Further details of Voice Europe, together with plans for this year?s events, can be found on Advanstar?s Web site at http://www.voicevents.com