Kuro5hin.org: technology and culture, from the trenches
create account | help/FAQ | contact | links | search | IRC | site news
[ Everything | Diaries | Technology | Science | Culture | Politics | Media | News | Internet | Op-Ed | Fiction | Meta | MLP ]
We need your support: buy an ad | premium membership

[P]
Voice recognition and Microsoft

By Yzorderex in News
Thu Jun 15, 2000 at 11:02:23 AM EST
Tags: Software (all tags)
Software

Hal Plotkin has an interesting article on SF Gate and his major premise is that "Microsoft's hammerlock on the computer industry will be extended for at least another generation or so if the company succeeds with its ambitious plan to build a voice-activated operating system that works with its legacy applications."

Voice recognition is a pretty good trick, and useful. But "keeping a record of each time the software has been corrected by an individual user and then drawing on those corrections to avoid future mistakes." for training voice has a side effect which could be the real killer app. Adaptive and programmable Voice Macros. Instead of if/then/else hardwired into an app you say, "Give me the game menu on holidays, stupid, I'm not working!". Where the definition of a holiday for your country is part of the OS localization.

I'm guessing that the correction and adaption stuff required to do voice will make a very big impact on the application level.


Sponsors

Voxel dot net
o Managed Hosting
o VoxCAST Content Delivery
o Raw Infrastructure

Login

Related Links
o article
o SF Gate
o Also by Yzorderex


Display: Sort:
Voice recognition and Microsoft | 46 comments (46 topical, editorial, 0 hidden)
The writeup needs work.... (none / 0) (#9)
by adamsc on Thu Jun 15, 2000 at 04:05:22 AM EST

adamsc voted -1 on this story.

The writeup needs work.

What's the point? This is unclear. ... (none / 0) (#12)
by cfe on Thu Jun 15, 2000 at 04:39:52 AM EST

cfe voted -1 on this story.

What's the point? This is unclear. And FUD.

of course blocking such things as '... (5.00 / 1) (#11)
by martin on Thu Jun 15, 2000 at 06:01:14 AM EST

martin voted 1 on this story.

of course blocking such things as 'format c: yes' will have to be done :-)

Re: of course blocking such things as '... (none / 0) (#37)
by mikeyo on Fri Jun 16, 2000 at 12:25:27 AM EST

Yeah, I could just see it now. An employee gets fired and immediately runs down the row of cubicles yelling "FORMAT C COLON... FORMAT C COLON"

[ Parent ]
Re: of course blocking such things as '... (none / 0) (#41)
by Anonymous Hero on Fri Jun 16, 2000 at 01:11:35 PM EST

How about the new virus making the rounds? You know the one that's just a WAV file and says "format c colon enter yes". Gee a wave file is now an executable!

[ Parent ]
While the idea of speach-recognitio... (none / 0) (#8)
by tjansen on Thu Jun 15, 2000 at 06:07:50 AM EST

tjansen voted 1 on this story.

While the idea of speach-recognition appeals to many people, especially those without computer-experience, I dont think that the speach-recognition systems will do what people expect from them, at least not in the next 10 years . What people imagine, when they hear 'speach-recognition' is that they dont have to learn how to use the computer, but instead can speak to him like to a human. But until the software is so advanced that it can understand whole sentences, all that speach recognition can do (in non-word processors) is that you can say "open file" instead of clicking on the menu item "open file" and you can say "scroll down" instead of using your cursor-keys. You still have to learn the commands, the only difference is that you have to learn what to say instead of learning where to click.

I can just see it now you tell it ... (none / 0) (#17)
by Rich on Thu Jun 15, 2000 at 07:09:32 AM EST

Rich voted 0 on this story.

I can just see it now you tell it to "Shut Down" and then windows replies "Sorry i do not know that command" :)
I Expect history will be kind to me as i intend to write is. Winston Churchill

The real direction of OS's is a bit... (none / 0) (#6)
by Neuromancer on Thu Jun 15, 2000 at 08:23:49 AM EST

Neuromancer voted 1 on this story.

The real direction of OS's is a bit farther removed from the desktop paradigm that we have known for so long. There will always be desktop computers, but in more common use will be. "Make my damn coffee" and "How long until I have to leave for work?"

too many. periods in odd. places bu... (none / 0) (#2)
by Pelorat on Thu Jun 15, 2000 at 08:33:07 AM EST

Pelorat voted 0 on this story.

too many. periods in odd. places but otherwise kinda interesting. =)

comments come later, right?... (none / 0) (#10)
by dec on Thu Jun 15, 2000 at 08:51:47 AM EST

dec voted 1 on this story.

comments come later, right?

One thing that I find funny is that... (none / 0) (#3)
by kraant on Thu Jun 15, 2000 at 09:20:43 AM EST

kraant voted 1 on this story.

One thing that I find funny is that it would be a lot easier to make a CLI speech recognition friendly than a GUI

So If one of the linux distro's felt like making some sort of Voice Activated shell they could totaly leapfrog this (possible) development... Hell I'm sure there must be some free Voice recognition software out there... :)

Anyone plan on a hack?

Daniel - off to search freshmeat now
--
"kraant, open source guru" -- tumeric
Never In Our Names...

Re: One thing that I find funny is that... (none / 0) (#44)
by Yzorderex on Fri Jun 16, 2000 at 11:41:11 PM EST

Actually this may merge the CLI and the GUI. At the least we'll have tags for common GUI operations (I'm not gonna talk in keystrokes/clicks more than once.) and any voice input will be coming to the app as text. So the Windows OS should get much friendlier.
Will bring new ways to crash your computer, sure, but just like net exploits, the whole thing gets more stable as you plug the holes.

[ Parent ]
Interesting speculation; deserves s... (none / 0) (#15)
by shadowspar on Thu Jun 15, 2000 at 09:23:04 AM EST

shadowspar voted 1 on this story.

Interesting speculation; deserves some discussion.
-- Drink Canada Dry! You might not succeed, but you'll have fun trying.

Re: Interesting speculation; deserves s... (none / 0) (#46)
by Yzorderex on Fri Jun 16, 2000 at 11:52:28 PM EST

OK :-)
1. Natural Language and perfect recognition is not really an issue. A limited subset of any language mapped to an extensible macro language is doable.
2. Microsoft is in the best position to do this on the desktop because, as Plotkin points out, it must be something that the OS does. We'll want to be controlling many legacy programs and I don't see how a third party developer can handle that. Think device driver hell.

[ Parent ]
Microsoft won't be able to pull thi... (none / 0) (#16)
by RichN on Thu Jun 15, 2000 at 09:25:26 AM EST

RichN voted -1 on this story.

Microsoft won't be able to pull this off. They may be able to get some sort of voice recognition going (IBM has done this several years ago), but not the adaptable OS stuff. That sounds like a bigger train-wreck than the registry...
-- Rich

Won't work in our office (concept n... (none / 0) (#14)
by new500 on Thu Jun 15, 2000 at 09:38:13 AM EST

new500 voted 1 on this story.

Won't work in our office (concept not apps) - too many people shouting at once all day :)

M$ has been talking about this for a while now. I remember something maybe 3 years back when they were plotting out the W98/NT5 end user techonolgy roadmaps.

I also remember Douglas Adams in the Hitch Hiker's Guide to The Galaxy describing the planet Golgrafincham whose inhabitants had evolved telepathy and had to host huge rock concerts to thwart themselves from overhearing every smallest thought by everyone else

Maybe if the noise levels became intolerable with people dictating into their Win$ box, in the UK at least the disturbance would contravene statutes governing workplace environments e.g. temperature, noise, ability to work.

Moreover I could imagine civil suits originating from workers claiming unworkable contracts because of inability to think clearly! - INAL but I've experience enough that if you could prove the disturbance affected your work, then you would have a case under (possible separately) England and Wales contract and employment laws :)

WINDOWS TCO = (HARWARE + SOFTWARE)* (SUPPORT * YEARS) + SOUNDPROOFING YOUR OFFICE


== Idle Random Thoughts. Usual disclaimers apply. ==
Whups, looks like you spilled yer bold (none / 0) (#18)
by Pelorat on Thu Jun 15, 2000 at 11:32:58 AM EST

Do please preview and double check your tags before you post, ok? =)

[ Parent ]
Re: Whups, looks like you spilled yer bold (none / 0) (#31)
by new500 on Thu Jun 15, 2000 at 04:11:16 PM EST

What's all this? I closed my tags and previewed at least once. Tags failed to close on me before 1st time my accident 2nd time after 3 previews and checks. Hey, even the *only* thing I put in bold was the last 3 words - what's going on here?

sorry guys but man it PREVIEWED fine and NOTHING SPILLS ON MY SCREEN. what's with it??????


== Idle Random Thoughts. Usual disclaimers apply. ==
[ Parent ]
Re: Whups, looks like you spilled yer bold (none / 0) (#36)
by rusty on Thu Jun 15, 2000 at 10:11:49 PM EST

I fixed it, is why it's not broken now. I'm pretty sure that my code didn't un-close your bold tag, but hey, anything's possible. Maybe it just looked ok, since it was at the end of the comment?

Anyway, I really need to fix that-- Scoop ought to catch and fix tag-slippage by itself. And it's been kind of prevalent lately, so that issue has gotten moved up in the TODO list. :-)

____
Not the real rusty
[ Parent ]

Italics? (none / 0) (#45)
by Yzorderex on Fri Jun 16, 2000 at 11:45:01 PM EST

the quotes in the article was originally in italics. They didn't show (unless someone edited the thing.)

[ Parent ]
Re: Won't work in our office (concept n... (none / 0) (#20)
by Anonymous Hero on Thu Jun 15, 2000 at 11:48:57 AM EST

In the interest of un-boldifying the rest of the message board: no more bold. I hope that worked. In regards to your post, I have that concern as well: don't some people get annoyed by the amount of noise that a keyboard makes? I think maybe that it would be much more annoying to have to actually listen to your co-workers talk to their computers all day than it would be to listen to them tapping. But that's just me.

[ Parent ]
Re: Won't work in our office (concept n... (none / 0) (#24)
by End on Thu Jun 15, 2000 at 12:59:16 PM EST

good idea, but it only worked for people who don't use threaded, since threaded mode does not give the text of your reply in the main view :-)

-JD
[ Parent ]

Re: Won't work in our office (concept n... (none / 0) (#26)
by slycer on Thu Jun 15, 2000 at 02:03:01 PM EST

We use "quiet" key keboards (ala IBM). I have a hard time hearing MYSELF type - let alone my neighbours. The other side is, I work on the phone, I answer calls and need to use my PC to gather info etc... A speech recognition thing would simply not work in this environment..

[ Parent ]
Re: Won't work in our office (concept n... (none / 0) (#40)
by tzanger on Fri Jun 16, 2000 at 11:30:56 AM EST

I have a hard time hearing MYSELF type - let alone my neighbours.

You mean your keyboard doesn't make that "water dripping into a full bucket in a very echoey room" noise like in Hackers?



[ Parent ]
I can't wait to run thru the progra... (none / 0) (#13)
by the Epopt on Thu Jun 15, 2000 at 09:52:30 AM EST

the Epopt voted 1 on this story.

I can't wait to run thru the programmers' area screaming "delete delete delete!"
-- 
Most people who need to be shot need to be shot soon and a lot.
Very few people need to be shot later or just a little.

K5_Arguing_HOWTO

If Microsoft wrote this though, and... (3.00 / 1) (#1)
by hattig on Thu Jun 15, 2000 at 10:00:24 AM EST

hattig voted 1 on this story.

If Microsoft wrote this though, and it worked, then surely they would deserve some credit? Yet on Linux et al, we can barely get audio output, nevermind input, processing, recognition etc.

Better hope this technology goes to the Applications company then.

voice activated software would be s... (none / 0) (#5)
by pope nihil on Thu Jun 15, 2000 at 10:20:17 AM EST

pope nihil voted 1 on this story.

voice activated software would be sweet. i just hope someone gets there before M$.

I voted.

I'm not a large fan of MS articles,... (4.30 / 3) (#4)
by Anonymous 242 on Thu Jun 15, 2000 at 10:31:11 AM EST

lee_malatesta voted 1 on this story.

I'm not a large fan of MS articles, nor do I think that the write up for this was sufficient, but the SF Gate's article is good for discussion for what it contained and what it left out.

I was quite surprised that Plotkin did not do his homework on the history of voice recognition. Mac users have been able to navigate since system 7 something or the other. OS/2 users have had both voice navigation and dictation built into OS/2 v4. I've seen rumors that Palm is aiming to build speach recognition into its next generation of handhelds, which might be built around the StrongARM chip instead of Motorola Dragonballs. Corel has been bundling Dragon Dictate into WordPerfect for a considerably long time now. IBM has already ported Via Voice to Linux. I don't know if it is out of beta yet, but I do know that it is included in the SuSE retail package.

In any case, speach recognition seems to be somewhat of a niche product. This is partly due to the level of inaccuracy in voice dictation. Even at a level of 99% accuracy, it will still be more time consuming to dictate and go back and correct typos than it will be to type for most, not all, people. I also don't think that very many voice products realize that more than one person needs to use the product. Consider the case of a family of five using a computer trained to listen to one voice.

Another one of the inherent problems with speach recognition is the windowing application paradigm. There is no quick, intuitive way to tell the computer to start at the second paragraph fromt the top of the screen, cut the dependant clause out of the fourth sentence and paste it into the preceding paragraph. It takes a fraction of a second to do this with a mouse.

Lastly, while I do know that some people would have a much easier time if voice recognition were built into computers, as I am a relatively quick typist, there would only be marginal incentive for me to use such software. Now what would be really hot, is a thought interface so that I could just think input into the machine. I don't know that I'd want a two way street there, with the machine giving neural feedback, but I'd love to be able to just think code and watch it scroll by on my terminal.

The recent article on using lamprey brain stems to control robots gives me hopes that such a neural interface might be functional in my life time.



Re: I'm not a large fan of MS articles,... (none / 0) (#19)
by kraant on Thu Jun 15, 2000 at 11:33:38 AM EST

The recent article on using lamprey brain stems to control robots gives me hopes that such a neural interface might be functional in my life time.

URL Gimme!

daniel
--
"kraant, open source guru" -- tumeric
Never In Our Names...
[ Parent ]

lamprey robot link (none / 0) (#30)
by Anonymous 242 on Thu Jun 15, 2000 at 03:12:51 PM EST

This link (or similiar one) was on /. a day or two ago. http://newscientist.com/news/news_224233.html

[ Parent ]

Is voice recognition such a big dea... (none / 0) (#7)
by Gentry on Thu Jun 15, 2000 at 10:54:34 AM EST

Gentry voted 1 on this story.

Is voice recognition such a big deal? You can't use it in modern offices 'cos they're open plan. You can't really use it on the train. Home use, perhaps, which isn't where the majority of MS revenue comes from. I can't see this been a big think. Gates is dillusional and has somewhat lost the plot.

Re: Is voice recognition such a big dea... (none / 0) (#23)
by abe1x on Thu Jun 15, 2000 at 12:54:44 PM EST

I think you've underestimated the power of both the software and the hardware (mics mainly) associated with voice recognition. I used Via Voice for a while and it worked fine with music pumping, construction in the background, etc. For about $70 I got both the software and a headset with mic, compressor and headphones. I doubt the headset runs more then $30 and it does a great job stripping your voice from background noise. Great for voice over IP as well. The tech ain't that hard, the person speaking is a inch away from the mic, everyone else in the office is at least 3 feet away. Think about calling tech support or the phone company, odds are there are 100's of people squashed next to each other, yet you only hear 1 person talking.

[ Parent ]
Re: Is voice recognition such a big dea... (none / 0) (#32)
by Anonymous Hero on Thu Jun 15, 2000 at 04:26:28 PM EST

It's not the computer which can't handle it, it's me

[ Parent ]
Re: Is voice recognition such a big dea... (none / 0) (#28)
by knarf on Thu Jun 15, 2000 at 02:41:18 PM EST

Maybe everyone should start wearing a necklace-cum-throatmike like the Germans used in WW-II. These are microphones which pick up `sound' directly from your throat (through some transducer stuck against your neck). They are not bothered by outside noise. Add a transmitter and presto, instant voice remote control. And while we're at it, add some fancy camera gizmo to your eye and a blinking light and we'll all turn into Borg. But that's another story...

[ Parent ]
Accessibility Question (none / 0) (#21)
by DontEatTheGlass on Thu Jun 15, 2000 at 12:25:41 PM EST

Does any one out there use a voice recognition (VR)program currently to access a computer due to disabilities? I'd be curious to hear from such a person as to what they think VR needs to work well. Also, how useful is it currently and in what situations?
DontEatTheGlass
/**************
If you understand, things are just as they are.
If you do not understand, things are just as they are.
**************/
Voice activation will never work (4.00 / 3) (#22)
by Buck Satan on Thu Jun 15, 2000 at 12:52:29 PM EST

I think it would be cool. I would love to walk in my house at night and say "turn on the coffee pot", or "Turn on the TV so I can watch Harry Enfield" or something like that. It would be excellent.

But what is gonna happen when I say "I wanna hear Motorhead REALLY LOUD". Screw it. The computer is not gonna be able to hear me over the noise. I will have to go and do whatever it is I want done by myself.

Now, this would be at my house. But let's go a step further and think about what happens at our offices.

I hate cubicals. They are the worst thing that the business world ever came up with. When I am trying to code, the last thing I want to hear is the guy in the cube next to me arguing with his girlfriend on the phone. I also don't want to hear the guy next to me using his speaker phone. It drives me nuts, to tell ya the truth. So now MS wants me to start listening to my cow-orkers talk to their computers? I am not a violent man. But this would sure make me one.

Perhaps this is not such a bad idea after all. Let MS "innovate". Let them pour all their energy into it. Let it flop like DIVX. I guarantee it will, for precisely the reasons I outlined above. But please, don't tell them I wrote this.


Think Star Trek (3.20 / 5) (#25)
by jonr on Thu Jun 15, 2000 at 01:55:43 PM EST

I think Star Trek got voice recognition right:
Use voice for simple things ("Computer, cancel my meeting with John Doe") and we will still use keyboard/mouse/whatever for more complex tasks. (Just like Star Trek!).
However, there is one thing that puzzles me:
Why haven't any advanced text input parsers been used in operating systems? I remember that input parsers where getting pretty advanced until Leisure suit Larry "pioneered" the point-and-drool interface. I still recall fondly writing something like "get the blue key and open the green door with it" and it worked or at least got "It doesn't fit" or something. Why can't I get the "Magnetic Scrolls" shell? ("find all emails with subjects in uppercase only and delete them") Remember the film "Outland" with Sean Connery, where he sits in front of a computer and writes "number of employees with criminal records" or something and get back results. No fancy GUI or voice recognition or SQL to learn, just plain english. Give me that!
Enough rant

J.

Re: Think Star Trek (none / 0) (#27)
by slycer on Thu Jun 15, 2000 at 02:10:38 PM EST

Cool idea.. I like it! Muds of course make us of this, it would be a LOT of work, but I could see a Shell built with at least some basic functionality like this.

Mind you.. I would really HATE having to retype something over and over because I keep getting back "I don't understand 'lick fido'" :-)

[ Parent ]
Re: Think Star Trek (none / 0) (#29)
by Anonymous Hero on Thu Jun 15, 2000 at 02:45:15 PM EST

Actually, text adventure gaming is still alive and well. I'm at work, so I don't have any URLs to post, but try searching for "text adventure" or "interactive fiction."

A number of languages have been designed specifically for text adventures, with built-in, extendable parsers and such. Inform and TADS are the most popular, though I prefer TADS(it's object oriented, which is well-suited for text adventure games)

Hamshrew

[ Parent ]

Re: Think Star Trek (none / 0) (#38)
by goonie on Fri Jun 16, 2000 at 03:29:59 AM EST

Writing a complete natural-languages parser is ann "AI-complete" problem (in the sense that a solution requires human-level intelligence). Therefore, you are left with text-adventure style parsers (which, clever though they are, are essentially hacks). Any adaptation of such a parser as a shell would have to have a large set of heuristics to resolve the ambiguities inherent in English, but they would fail with monotonous regularity. If this happened, the parser would spend most of its time asking the user for clarification/confirmation of their actions, or go ahead and do potentially disasterous actions.

Read this entry in the jargon file for a simple example of the risk that these "Do What I Mean" interfaces pose.

In essence, nice idea, but I wouldn't recommend it.

[ Parent ]

Re: Think Star Trek (none / 0) (#39)
by HiQ on Fri Jun 16, 2000 at 04:04:05 AM EST

In my spare time I'm working on an OS-shell, which is sorta inspired by the computers used in StarTrek. It will not use windows, though you can run multiple applications. One of the main features is a simple language interface, but as said in another post, you cannot use language for all tasks; for some tasks the use of a mouse is better. Right now I'm not focussing on speech input, just keyboard (but when that is working, I imagine that it is not so hard to use speech as well).

But it is funny that you mention the StarTrek computers, because I think that they are a very well thought out item in the series. That serie inspired me to start working on a computer interface with similar features (though less advanced)
How to make a sig
without having an idea
just made a HiQ
[ Parent ]

Unpredictability (none / 0) (#33)
by Anonymous Hero on Thu Jun 15, 2000 at 06:18:08 PM EST

One of the things I like about Unix is its predictability. Windows has all sorts of horrible features to make things friendly but they don't actually work very reliably and so they merely serve to make things unpredictable. Voice recognition macros are a prime example. We all know they'll fail a lot of the time. We all know it will try to guess (wrongly) what you really mean. I like simple, unfriendly but repeatable and predictable.

hack the phone (none / 0) (#34)
by mercenary on Thu Jun 15, 2000 at 07:44:09 PM EST

I was just playing with a voice recogntion app builder on my telephone. check out http://studio.tellme.com/ basically, it runs VoiceXML applications for you over the telephone. You give Studio a URL, call the phone number, and the server slurps the VXML and runs it on the phone. I might put this into the story queue, since it's a lot of fun to play with.

Yet another MS scheme that markets well but won't (4.50 / 2) (#35)
by daninja on Thu Jun 15, 2000 at 08:28:01 PM EST

MS has been touting this "wave of the future" for some time. It perfectly illustrates that their core competency lies in marketing, not technology (and certainly not "innovation").

Voice is just another medium for language. If MS has developed a language with which a layperson and a computer can converse, there's no reason it need be restricted to an audio medium. Being able to type "Give me the game menu on holidays, stupid, I'm not working!", and have the computer understand it, is almost as useful as being able to shout it (although not quite as satisfying). But the computer cannot understand it (typed or shouted), and it won't be able to for 10 years or so. MS is promoting this silver-bullet UI direction because they've failed so miserably at all feasible UIs, so they paint a picture in which their silver bullet solution is just around the corner. It isn't.

Voice is a minor issue. Sure, to some it may be more efficient than typing, and sure it is an enabling technology to the digitally impaired. But in the big picture of CHI it is minor. The real issue is language. Computer's aren't close to understanding natural language (on a useful scale).

MS has long been the joke of the CHI world, and this voice controlled computer initiative just another joke.

It aint gonna happen for 10 years.

MS Past Record (none / 0) (#42)
by Itsik on Fri Jun 16, 2000 at 02:14:16 PM EST

From past experience we all know of how microsoft has been releasing numerous half baked applications and OSs. It scares me to think that we will find ourselves pulling our hair when "Dear Sir" is confused with "Down Server"

Fits with the current campaign (none / 0) (#43)
by error 404 on Fri Jun 16, 2000 at 03:49:36 PM EST

The voice interface is part of the "see what wonderful innovation we were going to do, but now we can't because of that Communist Grinch, Jackson?" package.

And what, exactly, is the integrated version supposed to do that Dargon Dictate and a bunch of others can't?

If it's just "Oh, it's Saturday - 'Half Life' is a game not a field in a spreadsheet like on weekdays" then that's pretty sad.

..................................
Electrical banana is bound to be the very next phase
- Donovan

Voice recognition and Microsoft | 46 comments (46 topical, 0 editorial, 0 hidden)
Display: Sort:

kuro5hin.org

[XML]
All trademarks and copyrights on this page are owned by their respective companies. The Rest 2000 - Present Kuro5hin.org Inc.
See our legalese page for copyright policies. Please also read our Privacy Policy.
Kuro5hin.org is powered by Free Software, including Apache, Perl, and Linux, The Scoop Engine that runs this site is freely available, under the terms of the GPL.
Need some help? Email help@kuro5hin.org.
My heart's the long stairs.

Powered by Scoop create account | help/FAQ | mission | links | search | IRC | YOU choose the stories!