Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.
Voice: The New UI for Mobile Devices                                        Jan Šedivý                        WORLD USIBIL...
Fred Jelinek (1932-2010)    During 21 years at IBM Research and    nearly two decades at Johns    Hopkins, he has pioneere...
WHY SPEECH RECOGNITION?                          3
Speech reco benefits    Speech     •               •                   Speech is much richer then two mouse buttons       ...
Elements of success                • Access to huge content: Internet, YouTube,                  maps, music, pictures, SM...
WHERE IS SPEECH RECOGNITIONUSEFUL?                              6
Speech recognition areas Command          Creation of                                     Telephonycontrol, digit   texts,...
The main areas in time perspective            PC – C&C, dictation                       Telephony                         ...
Little more history       1993 IBM Personal Dictation System IBM PC, audio adapter card       1996 VoiceType (Win 95, di...
HOW SPEECH RECOGNITIONWORKS                         10
Speech recognition – high level                                         Digitize audio                                    ...
APPLICATIONS DEVELOPMENTCHRONOLOGICALLY                           12
IBM speech recognition – the early days    Large vocabulary, dictation (1990…)    Office correspondence task – Tangora ...
How to get reco running on PC -1994                 • Add-on board with ASIC     Front End   • Integer version on CPU     ...
How get reco running on Embedded 1999                  •   Resource efficient speech recognition engineEasy Port to      •...
MOBILE DEVICES                 16
7 billion people     Over 5.3 billion people or        77% of the world’s      population are now on              mobile. ...
Mobile operating system preferences                               Sparkwiz 201218
19   ECSS 2010, 10/122/2010
Mobile Internet Access20
Factors accelerating better mobile apps              Basic phone       More powerful CPU more memory                Connec...
Why is reco so important for mobile?Small screenLimited keyboardDifficult text entryDifficult to navigateSlow, not reliabl...
LATEST APPLICATIONS                      23
Google Now, Google search                            Some Android                            phones: two                  ...
iOS Siri25
Poor performance in the Czech Rep.26
iOS Siri versus Google search    Siri are "natural language     processing" apps that use statistical    Siri is deep in...
FUTURE         28
Future challengesBetter recognition, ROBUSTNES (noisy conditions,dictation)Better UI integration (speech button)Multiple l...
QUESTIONS & THANK YOU                        30
You’ve finished this document.
Download and read it offline.
Upcoming SlideShare
Hravost v SYMBIU - Jakub Španihel, Václav Krbůšek
Next
Upcoming SlideShare
Hravost v SYMBIU - Jakub Španihel, Václav Krbůšek
Next
Download to read offline and view in fullscreen.

Share

Aplikace pro rozpoznávání řeči - Jan Šedivý

Download to read offline

Prezentace z World Usabilty Day, 8.11.2012

Related Audiobooks

Free with a 30 day trial from Scribd

See all
  • Be the first to like this

Aplikace pro rozpoznávání řeči - Jan Šedivý

  1. 1. Voice: The New UI for Mobile Devices Jan Šedivý WORLD USIBILITY DAY – 20121
  2. 2. Fred Jelinek (1932-2010) During 21 years at IBM Research and nearly two decades at Johns Hopkins, he has pioneered the statistical methods that enable modern computers to understand spoken language. “He envisioned applying the mathematics of probability to the problem of processing speech and language,” said Sanjeev Khudanpur, a Johns Hopkins associate2
  3. 3. WHY SPEECH RECOGNITION? 3
  4. 4. Speech reco benefits Speech • • Speech is much richer then two mouse buttons Disambiguation, dialog • Show me all emails from David about Linux server is rich • “Call David”, David Smith or Stone? Home or cell? Text • Speech expresses not only text entry but C&C, search, URI entry • Speech entry is part of the keyboard entry • “command box”, general source of information WYSIWYG == What You Say Is What You Get4
  5. 5. Elements of success • Access to huge content: Internet, YouTube, maps, music, pictures, SMS, email… • Train on all available data: contact, location Best names addresses, email, documents content, history, personalization and other sensors: GPS, accuracy: accelerometers, camera, compass • Computationally expensive - huge clusters of computers to speed up training • speech reco must not introduce any friction to the interface • keyboard, touch screen, multi-touch, keyboard, Great UI speaker, microphone • OS control, part of the OS, noise reduction, AD design: converter • Use all sensors available on the phone to inject extra information to app5
  6. 6. WHERE IS SPEECH RECOGNITIONUSEFUL? 6
  7. 7. Speech recognition areas Command Creation of Telephonycontrol, digit texts, dictati IVR dictation on Mobile VoiceAutomotive devices search Speech is the most natural way we communicate7
  8. 8. The main areas in time perspective PC – C&C, dictation Telephony Automotive Mobile devices UI 1995 2000 20058
  9. 9. Little more history 1993 IBM Personal Dictation System IBM PC, audio adapter card 1996 VoiceType (Win 95, dictation, isolated words, email, …) 1996 Nuance deployed its first commercial speech application 1997 Dragon Systems unveiled its Naturally Speaking 1999 VoiceXML 2000 Telephony applications, IVR 2002 Car control (control car equipment, make a phone call, select music, dictate address to navigation) 2003 Microsoft includes speech to Office 2003 2007 Growth of mobile phones/devices 2008 Google launches speech to Search iPhone 2009 Nuance Acquires IBMs patents Speech Technology rights 2011 iOS 5, Siri 9
  10. 10. HOW SPEECH RECOGNITIONWORKS 10
  11. 11. Speech recognition – high level Digitize audio AD convertor FFT, Non-lin, DFFT Front End feature extraction Application API Labeling triphones, prototy pes Text output Search LM, HMM, Viter Back End bi classification11
  12. 12. APPLICATIONS DEVELOPMENTCHRONOLOGICALLY 12
  13. 13. IBM speech recognition – the early days Large vocabulary, dictation (1990…) Office correspondence task – Tangora Written in Fortran IBM RISC System/6000, AIX, Tangora Albert Tangora (July 2, 1903 – April 7, 1978) set the world speed record for sustained typing on a manual keyboard for one hour, 147 words per minute, on 13 October 22, 1923.
  14. 14. How to get reco running on PC -1994 • Add-on board with ASIC Front End • Integer version on CPU • Input - 39 dim cepstrum coeffs feature vectorHierarchical each 10 ms • Output - 100 most likely prototypes out of labeler 30k, diagonal Gaussians • Statistical LM – high compression, log, Search • Viterbi search, Hidden Markov Models14
  15. 15. How get reco running on Embedded 1999 • Resource efficient speech recognition engineEasy Port to • Written in C/C++ • Integer implementation, GCC compilerEmbedded • Simple API to customize for any platform • Grammar support for command control applications Basic reco • Special emphasis on digit recognition • Robust front end for noisy environments • Command control Cars • Digit and name dialing • Navigation controlapplications: • On-board entertainment control15
  16. 16. MOBILE DEVICES 16
  17. 17. 7 billion people Over 5.3 billion people or 77% of the world’s population are now on mobile. according to WIPRO17
  18. 18. Mobile operating system preferences  Sparkwiz 201218
  19. 19. 19 ECSS 2010, 10/122/2010
  20. 20. Mobile Internet Access20
  21. 21. Factors accelerating better mobile apps Basic phone More powerful CPU more memory Connectivity, Internet Much better UI, multi- touch screen Rapid growth of mobile phones/devices is driving the adoption of speech recognition21
  22. 22. Why is reco so important for mobile?Small screenLimited keyboardDifficult text entryDifficult to navigateSlow, not reliable connectivity (latency) Speech is fundamentally changing the mobile user22 experience
  23. 23. LATEST APPLICATIONS 23
  24. 24. Google Now, Google search Some Android phones: two mics24
  25. 25. iOS Siri25
  26. 26. Poor performance in the Czech Rep.26
  27. 27. iOS Siri versus Google search Siri are "natural language processing" apps that use statistical Siri is deep in iOS, start apps, make calls, set meetings Google is deep in the search engine Cant launch apps with Google, you can dictate an email or a text message. Google is faster (much faster) Future – combination of AI and different UI 27
  28. 28. FUTURE 28
  29. 29. Future challengesBetter recognition, ROBUSTNES (noisy conditions,dictation)Better UI integration (speech button)Multiple languages (how would a German native search foran address in France?)Switching between multiple languagesUI combining multiplemodalities, (voice, text, video, sensors)Work on dictated text correctionBetter integration of speech reco to special applications29 ECSS 2010, 10/12/2010
  30. 30. QUESTIONS & THANK YOU 30

Prezentace z World Usabilty Day, 8.11.2012

Views

Total views

548

On Slideshare

0

From embeds

0

Number of embeds

1

Actions

Downloads

6

Shares

0

Comments

0

Likes

0

×