SlideShare una empresa de Scribd logo
1 de 30
Voice: The New UI for Mobile Devices

                                        Jan Šedivý
                        WORLD USIBILITY DAY – 2012




1
Fred Jelinek (1932-2010)

    During 21 years at IBM Research and
    nearly two decades at Johns
    Hopkins, he has pioneered the
    statistical methods that enable modern
    computers to understand spoken
    language.




     “He envisioned applying the mathematics of
     probability to the problem of processing speech
     and language,” said Sanjeev Khudanpur, a Johns
     Hopkins associate
2
WHY SPEECH RECOGNITION?


                          3
Speech reco benefits

    Speech     •
               •
                   Speech is much richer then two mouse buttons
                   Disambiguation, dialog
               •   Show me all emails from David about Linux server
     is rich   •   “Call David”, David Smith or Stone? Home or cell?




     Text      • Speech expresses not only text entry but C&C,
                 search, URI entry
               • Speech entry is part of the keyboard
     entry     • “command box”, general source of information



                                WYSIWYG == What You Say
                                Is What You Get
4
Elements of success
                • Access to huge content: Internet, YouTube,
                  maps, music, pictures, SMS, email…
                • Train on all available data: contact, location
      Best        names addresses, email, documents content,
                  history, personalization and other sensors: GPS,
    accuracy:     accelerometers, camera, compass
                • Computationally expensive - huge clusters of
                  computers to speed up training


                • speech reco must not introduce any friction to
                  the interface
                • keyboard, touch screen, multi-touch, keyboard,
    Great UI      speaker, microphone
                • OS control, part of the OS, noise reduction, AD
    design:       converter
                • Use all sensors available on the phone to inject
                  extra information to app

5
WHERE IS SPEECH RECOGNITION
USEFUL?

                              6
Speech recognition areas

 Command          Creation of
                                     Telephony
control, digit   texts, dictati
                                        IVR
  dictation           on

                   Mobile                Voice
Automotive
                   devices              search

                        Speech is the most natural
                        way
                        we communicate
7
The main areas in time perspective
            PC – C&C, dictation
                       Telephony
                         Automotive
                         Mobile devices
                                      UI
     1995       2000         2005

8
Little more history
       1993 IBM Personal Dictation System IBM PC, audio adapter card
       1996 VoiceType (Win 95, dictation, isolated words, email, …)
       1996 Nuance deployed its first commercial speech application
       1997 Dragon Systems unveiled its Naturally Speaking
       1999 VoiceXML
       2000 Telephony applications, IVR
       2002 Car control (control car equipment, make a phone call, select
        music, dictate address to navigation)
       2003 Microsoft includes speech to Office 2003
       2007 Growth of mobile phones/devices
       2008 Google launches speech to Search iPhone
       2009 Nuance Acquires IBM's patents Speech Technology rights
       2011 iOS 5, Siri




    9
HOW SPEECH RECOGNITION
WORKS

                         10
Speech recognition – high level
                                         Digitize audio
                                         AD convertor


                                         FFT, Non-lin,
                                            DFFT                    Front End
                                                            feature extraction
                   Application


                                 API
                                            Labeling
                                       triphones, prototy
                                              pes

     Text output                           Search
                                       LM, HMM, Viter              Back End
                                             bi                 classification

11
APPLICATIONS DEVELOPMENT
CHRONOLOGICALLY

                           12
IBM speech recognition – the early days
    Large vocabulary, dictation (1990…)
    Office correspondence task – Tangora
    Written in Fortran
    IBM RISC System/6000, AIX, Tangora




                       Albert Tangora (July 2, 1903 – April
                       7, 1978) set the world speed record for
                       sustained typing on a manual keyboard for
                       one hour, 147 words per minute, on
    13
                       October 22, 1923.
How to get reco running on PC -1994

                 • Add-on board with ASIC
     Front End   • Integer version on CPU



                 • Input - 39 dim cepstrum coeffs feature vector
Hierarchical       each 10 ms
                 • Output - 100 most likely prototypes out of
  labeler          30k, diagonal Gaussians



                 • Statistical LM – high compression, log,
      Search     • Viterbi search, Hidden Markov Models



14
How get reco running on Embedded 1999
                  •   Resource efficient speech recognition engine
Easy Port to      •   Written in C/C++
                  •   Integer implementation, GCC compiler
Embedded          •   Simple API to customize for any platform


                  • Grammar support for command control
                    applications
     Basic reco   • Special emphasis on digit recognition
                  • Robust front end for noisy environments


                  •   Command control
   Cars           •   Digit and name dialing
                  •   Navigation control
applications:     •   On-board entertainment control


15
MOBILE DEVICES


                 16
7 billion people

     Over 5.3 billion people or
        77% of the world’s
      population are now on
              mobile.
                        according to
                        WIPRO
17
Mobile operating system preferences




                               Sparkwiz 2012
18
19   ECSS 2010, 10/122/2010
Mobile Internet Access




20
Factors accelerating better mobile apps
              Basic phone

       More powerful CPU more memory

                Connectivity, Internet
                     Much better UI, multi-
                        touch screen
               Rapid growth of mobile phones/devices is
               driving the adoption of speech recognition

21
Why is reco so important for mobile?
Small screen

Limited keyboard

Difficult text entry

Difficult to navigate

Slow, not reliable connectivity (latency)
                         Speech is fundamentally
                         changing the mobile user
22
                         experience
LATEST APPLICATIONS


                      23
Google Now, Google search




                            Some Android
                            phones: two
                            mics
24
iOS Siri




25
Poor performance in the Czech Rep.




26
iOS Siri versus Google search
    Siri are "natural language
     processing" apps that use statistical
    Siri is deep in iOS, start apps,
     make calls, set meetings
    Google is deep in the search engine
    Can't launch apps with Google, you
     can dictate an email or a text message.
    Google is faster (much faster)


                                 Future – combination of AI and
                                 different UI
    27
FUTURE


         28
Future challenges
Better recognition, ROBUSTNES (noisy conditions,
dictation)
Better UI integration (speech button)
Multiple languages (how would a German native search for
an address in France?)
Switching between multiple languages
UI combining multiple
modalities, (voice, text, video, sensors)
Work on dictated text correction

Better integration of speech reco to special applications

29                                          ECSS 2010, 10/12/2010
QUESTIONS & THANK YOU


                        30

Más contenido relacionado

La actualidad más candente

AVGirl Gadget Lab Event Camp
AVGirl Gadget Lab Event CampAVGirl Gadget Lab Event Camp
AVGirl Gadget Lab Event CampMidori Connolly
 
Introduction to iOS Development
Introduction to iOS DevelopmentIntroduction to iOS Development
Introduction to iOS DevelopmentAsim Rais Siddiqui
 
Nokia 500 specifications
Nokia 500 specificationsNokia 500 specifications
Nokia 500 specificationsmaddymotul
 
Evernote overview sept 2011
Evernote overview   sept 2011Evernote overview   sept 2011
Evernote overview sept 2011Rodion Nasakin
 
Lenovo LePhone Introduction
Lenovo LePhone IntroductionLenovo LePhone Introduction
Lenovo LePhone IntroductionWinsoon Tang
 
Programming languages
Programming languagesProgramming languages
Programming languagesOlya Ivanova
 
Ipad -- Applications Presentation
Ipad -- Applications PresentationIpad -- Applications Presentation
Ipad -- Applications PresentationTom Seymour, PhD
 
What's been cooking in india: Presentation at Indian Digital Summit, 2012
What's been cooking in india: Presentation at Indian Digital Summit, 2012What's been cooking in india: Presentation at Indian Digital Summit, 2012
What's been cooking in india: Presentation at Indian Digital Summit, 2012LunaErgonomics
 
What's been cooking in IAMAI india
What's been cooking in IAMAI indiaWhat's been cooking in IAMAI india
What's been cooking in IAMAI indiaTanuj Agrawal
 
A Quick Check List Before Buying Smartphone.
A Quick Check List Before Buying Smartphone.A Quick Check List Before Buying Smartphone.
A Quick Check List Before Buying Smartphone.Chico Mobile
 
IP's 20 year evolution - adaptation or extinction
IP's 20 year evolution - adaptation or extinction IP's 20 year evolution - adaptation or extinction
IP's 20 year evolution - adaptation or extinction Design And Reuse
 
A Mobile Centric View of Silicon Valley - January 2011
A Mobile Centric View of Silicon Valley - January 2011A Mobile Centric View of Silicon Valley - January 2011
A Mobile Centric View of Silicon Valley - January 2011Lars Kamp
 

La actualidad más candente (20)

AVGirl Gadget Lab Event Camp
AVGirl Gadget Lab Event CampAVGirl Gadget Lab Event Camp
AVGirl Gadget Lab Event Camp
 
Fernando Kanacri - Nokia
Fernando Kanacri - NokiaFernando Kanacri - Nokia
Fernando Kanacri - Nokia
 
Voice recognition
Voice recognitionVoice recognition
Voice recognition
 
Introduction to iOS Development
Introduction to iOS DevelopmentIntroduction to iOS Development
Introduction to iOS Development
 
2012 03-19
2012 03-192012 03-19
2012 03-19
 
Nokia 500 specifications
Nokia 500 specificationsNokia 500 specifications
Nokia 500 specifications
 
Evernote overview sept 2011
Evernote overview   sept 2011Evernote overview   sept 2011
Evernote overview sept 2011
 
Show me softwares
Show me softwaresShow me softwares
Show me softwares
 
Presentation1
Presentation1Presentation1
Presentation1
 
Droid 4
Droid 4Droid 4
Droid 4
 
Iphone
IphoneIphone
Iphone
 
Lenovo LePhone Introduction
Lenovo LePhone IntroductionLenovo LePhone Introduction
Lenovo LePhone Introduction
 
Polycom SoundPoint IP 331
Polycom SoundPoint IP 331Polycom SoundPoint IP 331
Polycom SoundPoint IP 331
 
Programming languages
Programming languagesProgramming languages
Programming languages
 
Ipad -- Applications Presentation
Ipad -- Applications PresentationIpad -- Applications Presentation
Ipad -- Applications Presentation
 
What's been cooking in india: Presentation at Indian Digital Summit, 2012
What's been cooking in india: Presentation at Indian Digital Summit, 2012What's been cooking in india: Presentation at Indian Digital Summit, 2012
What's been cooking in india: Presentation at Indian Digital Summit, 2012
 
What's been cooking in IAMAI india
What's been cooking in IAMAI indiaWhat's been cooking in IAMAI india
What's been cooking in IAMAI india
 
A Quick Check List Before Buying Smartphone.
A Quick Check List Before Buying Smartphone.A Quick Check List Before Buying Smartphone.
A Quick Check List Before Buying Smartphone.
 
IP's 20 year evolution - adaptation or extinction
IP's 20 year evolution - adaptation or extinction IP's 20 year evolution - adaptation or extinction
IP's 20 year evolution - adaptation or extinction
 
A Mobile Centric View of Silicon Valley - January 2011
A Mobile Centric View of Silicon Valley - January 2011A Mobile Centric View of Silicon Valley - January 2011
A Mobile Centric View of Silicon Valley - January 2011
 

Destacado

Kreativita jako součást user experience - Jan Jelínek
Kreativita jako součást user experience - Jan JelínekKreativita jako součást user experience - Jan Jelínek
Kreativita jako součást user experience - Jan JelínekAsociace UX (Prague ACM SIGCHI)
 
Buzz Meet User Experience
Buzz Meet User ExperienceBuzz Meet User Experience
Buzz Meet User ExperienceJosef Holy
 
Zapojte uživatele do návrhu webu
Zapojte uživatele do návrhu webuZapojte uživatele do návrhu webu
Zapojte uživatele do návrhu webuSherpas
 
Orientace na User Experience jako marketingová strategie?
Orientace na User Experience jako marketingová strategie?Orientace na User Experience jako marketingová strategie?
Orientace na User Experience jako marketingová strategie?Jakub Krčmář
 
Co vás bude bavit na uživatelských testech - Kamila Varadzinová
Co vás bude bavit na uživatelských testech - Kamila VaradzinováCo vás bude bavit na uživatelských testech - Kamila Varadzinová
Co vás bude bavit na uživatelských testech - Kamila VaradzinováAsociace UX (Prague ACM SIGCHI)
 
Design studio: workshop k webu Asociace UX
Design studio: workshop k webu Asociace UXDesign studio: workshop k webu Asociace UX
Design studio: workshop k webu Asociace UXHelena Šimková
 
UX mindset – aneb jak dělat digitální projekty pořádně
UX mindset – aneb jak dělat digitální projekty pořádněUX mindset – aneb jak dělat digitální projekty pořádně
UX mindset – aneb jak dělat digitální projekty pořádněJakub Krčmář
 
Úvod do UX designu
Úvod do UX designuÚvod do UX designu
Úvod do UX designuPetr Douša
 
Rozum a cit, aneb měřit, nebo věřit? (UX Monday Brno, 16. června 2014)
Rozum a cit, aneb měřit, nebo věřit? (UX Monday Brno, 16. června 2014)Rozum a cit, aneb měřit, nebo věřit? (UX Monday Brno, 16. června 2014)
Rozum a cit, aneb měřit, nebo věřit? (UX Monday Brno, 16. června 2014)Ondrej Havlicek
 
Stahování z utb katalogu v prostředí ref works
Stahování z utb katalogu v prostředí ref worksStahování z utb katalogu v prostředí ref works
Stahování z utb katalogu v prostředí ref worksKnihovnaUTB
 
Stav českého UX
Stav českého UXStav českého UX
Stav českého UXExperienceU
 
Proč UX? Ergonomie pro weby a e-shopy
Proč UX? Ergonomie pro weby a e-shopyProč UX? Ergonomie pro weby a e-shopy
Proč UX? Ergonomie pro weby a e-shopyIvo Kylián
 
UX není cesta, ale cíl
UX není cesta, ale cílUX není cesta, ale cíl
UX není cesta, ale cílLukas Marvan
 

Destacado (20)

Ux Cspc
Ux CspcUx Cspc
Ux Cspc
 
Oční pohyby a jejich interpretace - Vratislav Fabián
Oční pohyby a jejich interpretace - Vratislav FabiánOční pohyby a jejich interpretace - Vratislav Fabián
Oční pohyby a jejich interpretace - Vratislav Fabián
 
Kreativita jako součást user experience - Jan Jelínek
Kreativita jako součást user experience - Jan JelínekKreativita jako součást user experience - Jan Jelínek
Kreativita jako součást user experience - Jan Jelínek
 
Buzz Meet User Experience
Buzz Meet User ExperienceBuzz Meet User Experience
Buzz Meet User Experience
 
Zapojte uživatele do návrhu webu
Zapojte uživatele do návrhu webuZapojte uživatele do návrhu webu
Zapojte uživatele do návrhu webu
 
Kouzlo dotyku - Adam Fendrych
Kouzlo dotyku - Adam FendrychKouzlo dotyku - Adam Fendrych
Kouzlo dotyku - Adam Fendrych
 
Orientace na User Experience jako marketingová strategie?
Orientace na User Experience jako marketingová strategie?Orientace na User Experience jako marketingová strategie?
Orientace na User Experience jako marketingová strategie?
 
Co vás bude bavit na uživatelských testech - Kamila Varadzinová
Co vás bude bavit na uživatelských testech - Kamila VaradzinováCo vás bude bavit na uživatelských testech - Kamila Varadzinová
Co vás bude bavit na uživatelských testech - Kamila Varadzinová
 
Design studio: workshop k webu Asociace UX
Design studio: workshop k webu Asociace UXDesign studio: workshop k webu Asociace UX
Design studio: workshop k webu Asociace UX
 
UX mindset – aneb jak dělat digitální projekty pořádně
UX mindset – aneb jak dělat digitální projekty pořádněUX mindset – aneb jak dělat digitální projekty pořádně
UX mindset – aneb jak dělat digitální projekty pořádně
 
Úvod do UX designu
Úvod do UX designuÚvod do UX designu
Úvod do UX designu
 
Rozum a cit, aneb měřit, nebo věřit? (UX Monday Brno, 16. června 2014)
Rozum a cit, aneb měřit, nebo věřit? (UX Monday Brno, 16. června 2014)Rozum a cit, aneb měřit, nebo věřit? (UX Monday Brno, 16. června 2014)
Rozum a cit, aneb měřit, nebo věřit? (UX Monday Brno, 16. června 2014)
 
Stahování z utb katalogu v prostředí ref works
Stahování z utb katalogu v prostředí ref worksStahování z utb katalogu v prostředí ref works
Stahování z utb katalogu v prostředí ref works
 
Metaforami k lepšímu designu - Ondřej Válka
Metaforami k lepšímu designu - Ondřej VálkaMetaforami k lepšímu designu - Ondřej Válka
Metaforami k lepšímu designu - Ondřej Válka
 
Interakční design v urbánním prostoru
Interakční design v urbánním prostoruInterakční design v urbánním prostoru
Interakční design v urbánním prostoru
 
Stav českého UX
Stav českého UXStav českého UX
Stav českého UX
 
Plány Asociace UX na rok 2015
Plány Asociace UX na rok 2015Plány Asociace UX na rok 2015
Plány Asociace UX na rok 2015
 
Proč UX? Ergonomie pro weby a e-shopy
Proč UX? Ergonomie pro weby a e-shopyProč UX? Ergonomie pro weby a e-shopy
Proč UX? Ergonomie pro weby a e-shopy
 
Ceny UX služeb v ČR
Ceny UX služeb v ČRCeny UX služeb v ČR
Ceny UX služeb v ČR
 
UX není cesta, ale cíl
UX není cesta, ale cílUX není cesta, ale cíl
UX není cesta, ale cíl
 

Similar a Aplikace pro rozpoznávání řeči - Jan Šedivý

Android
AndroidAndroid
Androiddavs7
 
ibm språkbanken websphere
ibm språkbanken websphereibm språkbanken websphere
ibm språkbanken webspherealkfdsj
 
Use of assembly language[edit]Historical perspective[edit]Assemb.pdf
Use of assembly language[edit]Historical perspective[edit]Assemb.pdfUse of assembly language[edit]Historical perspective[edit]Assemb.pdf
Use of assembly language[edit]Historical perspective[edit]Assemb.pdfannethafashion
 
Ishiriya Wireless Technologies-Mobile Application Development
Ishiriya Wireless Technologies-Mobile Application DevelopmentIshiriya Wireless Technologies-Mobile Application Development
Ishiriya Wireless Technologies-Mobile Application Developmentbhadrah
 
Wearable Computing and Human Computer Interfaces
Wearable Computing and Human Computer InterfacesWearable Computing and Human Computer Interfaces
Wearable Computing and Human Computer InterfacesJeffrey Funk
 
Operating Systems
Operating SystemsOperating Systems
Operating SystemsLeon Lei
 
Speech Recognition Technology
Speech Recognition TechnologySpeech Recognition Technology
Speech Recognition TechnologyAamir-sheriff
 
Abstract of speech recognition
Abstract of speech recognitionAbstract of speech recognition
Abstract of speech recognitionVinay Jaisriram
 
The web is the platform - why FirefoxOS matters
The web is the platform - why FirefoxOS mattersThe web is the platform - why FirefoxOS matters
The web is the platform - why FirefoxOS mattersTristan Nitot
 
Chap3 Device Technology
Chap3 Device TechnologyChap3 Device Technology
Chap3 Device TechnologyANUSUYA T K
 
Radisys - Engage Digital - TADSummit Nov 2022
Radisys - Engage Digital - TADSummit Nov 2022Radisys - Engage Digital - TADSummit Nov 2022
Radisys - Engage Digital - TADSummit Nov 2022Alan Quayle
 
Mobile sector's idea
Mobile sector's ideaMobile sector's idea
Mobile sector's ideaChen Chen
 
Extending softwareintomobile 11 28-2012
Extending softwareintomobile 11 28-2012Extending softwareintomobile 11 28-2012
Extending softwareintomobile 11 28-2012CorSource
 

Similar a Aplikace pro rozpoznávání řeči - Jan Šedivý (20)

Android
AndroidAndroid
Android
 
Iitdmj 1
Iitdmj 1Iitdmj 1
Iitdmj 1
 
ibm språkbanken websphere
ibm språkbanken websphereibm språkbanken websphere
ibm språkbanken websphere
 
Use of assembly language[edit]Historical perspective[edit]Assemb.pdf
Use of assembly language[edit]Historical perspective[edit]Assemb.pdfUse of assembly language[edit]Historical perspective[edit]Assemb.pdf
Use of assembly language[edit]Historical perspective[edit]Assemb.pdf
 
Ishiriya Wireless Technologies-Mobile Application Development
Ishiriya Wireless Technologies-Mobile Application DevelopmentIshiriya Wireless Technologies-Mobile Application Development
Ishiriya Wireless Technologies-Mobile Application Development
 
Wearable Computing and Human Computer Interfaces
Wearable Computing and Human Computer InterfacesWearable Computing and Human Computer Interfaces
Wearable Computing and Human Computer Interfaces
 
Operating Systems
Operating SystemsOperating Systems
Operating Systems
 
Mobile Application development
Mobile Application developmentMobile Application development
Mobile Application development
 
Speech Recognition Technology
Speech Recognition TechnologySpeech Recognition Technology
Speech Recognition Technology
 
Abstract of speech recognition
Abstract of speech recognitionAbstract of speech recognition
Abstract of speech recognition
 
Introducción a iOS
Introducción a iOSIntroducción a iOS
Introducción a iOS
 
The web is the platform - why FirefoxOS matters
The web is the platform - why FirefoxOS mattersThe web is the platform - why FirefoxOS matters
The web is the platform - why FirefoxOS matters
 
Chap3 Device Technology
Chap3 Device TechnologyChap3 Device Technology
Chap3 Device Technology
 
Radisys - Engage Digital - TADSummit Nov 2022
Radisys - Engage Digital - TADSummit Nov 2022Radisys - Engage Digital - TADSummit Nov 2022
Radisys - Engage Digital - TADSummit Nov 2022
 
01 introduction
01 introduction01 introduction
01 introduction
 
smartphones
smartphonessmartphones
smartphones
 
Technology Offer Intro
Technology Offer IntroTechnology Offer Intro
Technology Offer Intro
 
Mobile sector's idea
Mobile sector's ideaMobile sector's idea
Mobile sector's idea
 
Extending softwareintomobile 11 28-2012
Extending softwareintomobile 11 28-2012Extending softwareintomobile 11 28-2012
Extending softwareintomobile 11 28-2012
 
Presentation.ai
Presentation.aiPresentation.ai
Presentation.ai
 

Más de Asociace UX (Prague ACM SIGCHI)

Más de Asociace UX (Prague ACM SIGCHI) (6)

Vzdělávání v UX
Vzdělávání v UXVzdělávání v UX
Vzdělávání v UX
 
UX laboratoře v ČR
UX laboratoře v ČRUX laboratoře v ČR
UX laboratoře v ČR
 
Platy UX specialistů v ČR
Platy UX specialistů v ČRPlaty UX specialistů v ČR
Platy UX specialistů v ČR
 
Zombie apokalypsa jako nástroj gamifikace - Tomáš Pekárek
Zombie apokalypsa jako nástroj gamifikace - Tomáš PekárekZombie apokalypsa jako nástroj gamifikace - Tomáš Pekárek
Zombie apokalypsa jako nástroj gamifikace - Tomáš Pekárek
 
Domácnost má být natolik špinavá, aby byla šťastná - platí toto i o místech, ...
Domácnost má být natolik špinavá, aby byla šťastná - platí toto i o místech, ...Domácnost má být natolik špinavá, aby byla šťastná - platí toto i o místech, ...
Domácnost má být natolik špinavá, aby byla šťastná - platí toto i o místech, ...
 
Moderování testů použitelnosti - Jakub Franc
Moderování testů použitelnosti - Jakub FrancModerování testů použitelnosti - Jakub Franc
Moderování testů použitelnosti - Jakub Franc
 

Aplikace pro rozpoznávání řeči - Jan Šedivý

  • 1. Voice: The New UI for Mobile Devices Jan Šedivý WORLD USIBILITY DAY – 2012 1
  • 2. Fred Jelinek (1932-2010) During 21 years at IBM Research and nearly two decades at Johns Hopkins, he has pioneered the statistical methods that enable modern computers to understand spoken language. “He envisioned applying the mathematics of probability to the problem of processing speech and language,” said Sanjeev Khudanpur, a Johns Hopkins associate 2
  • 4. Speech reco benefits Speech • • Speech is much richer then two mouse buttons Disambiguation, dialog • Show me all emails from David about Linux server is rich • “Call David”, David Smith or Stone? Home or cell? Text • Speech expresses not only text entry but C&C, search, URI entry • Speech entry is part of the keyboard entry • “command box”, general source of information WYSIWYG == What You Say Is What You Get 4
  • 5. Elements of success • Access to huge content: Internet, YouTube, maps, music, pictures, SMS, email… • Train on all available data: contact, location Best names addresses, email, documents content, history, personalization and other sensors: GPS, accuracy: accelerometers, camera, compass • Computationally expensive - huge clusters of computers to speed up training • speech reco must not introduce any friction to the interface • keyboard, touch screen, multi-touch, keyboard, Great UI speaker, microphone • OS control, part of the OS, noise reduction, AD design: converter • Use all sensors available on the phone to inject extra information to app 5
  • 6. WHERE IS SPEECH RECOGNITION USEFUL? 6
  • 7. Speech recognition areas Command Creation of Telephony control, digit texts, dictati IVR dictation on Mobile Voice Automotive devices search Speech is the most natural way we communicate 7
  • 8. The main areas in time perspective PC – C&C, dictation Telephony Automotive Mobile devices UI 1995 2000 2005 8
  • 9. Little more history  1993 IBM Personal Dictation System IBM PC, audio adapter card  1996 VoiceType (Win 95, dictation, isolated words, email, …)  1996 Nuance deployed its first commercial speech application  1997 Dragon Systems unveiled its Naturally Speaking  1999 VoiceXML  2000 Telephony applications, IVR  2002 Car control (control car equipment, make a phone call, select music, dictate address to navigation)  2003 Microsoft includes speech to Office 2003  2007 Growth of mobile phones/devices  2008 Google launches speech to Search iPhone  2009 Nuance Acquires IBM's patents Speech Technology rights  2011 iOS 5, Siri 9
  • 11. Speech recognition – high level Digitize audio AD convertor FFT, Non-lin, DFFT Front End feature extraction Application API Labeling triphones, prototy pes Text output Search LM, HMM, Viter Back End bi classification 11
  • 13. IBM speech recognition – the early days  Large vocabulary, dictation (1990…)  Office correspondence task – Tangora  Written in Fortran  IBM RISC System/6000, AIX, Tangora Albert Tangora (July 2, 1903 – April 7, 1978) set the world speed record for sustained typing on a manual keyboard for one hour, 147 words per minute, on 13 October 22, 1923.
  • 14. How to get reco running on PC -1994 • Add-on board with ASIC Front End • Integer version on CPU • Input - 39 dim cepstrum coeffs feature vector Hierarchical each 10 ms • Output - 100 most likely prototypes out of labeler 30k, diagonal Gaussians • Statistical LM – high compression, log, Search • Viterbi search, Hidden Markov Models 14
  • 15. How get reco running on Embedded 1999 • Resource efficient speech recognition engine Easy Port to • Written in C/C++ • Integer implementation, GCC compiler Embedded • Simple API to customize for any platform • Grammar support for command control applications Basic reco • Special emphasis on digit recognition • Robust front end for noisy environments • Command control Cars • Digit and name dialing • Navigation control applications: • On-board entertainment control 15
  • 17. 7 billion people Over 5.3 billion people or 77% of the world’s population are now on mobile. according to WIPRO 17
  • 18. Mobile operating system preferences  Sparkwiz 2012 18
  • 19. 19 ECSS 2010, 10/122/2010
  • 21. Factors accelerating better mobile apps Basic phone More powerful CPU more memory Connectivity, Internet Much better UI, multi- touch screen Rapid growth of mobile phones/devices is driving the adoption of speech recognition 21
  • 22. Why is reco so important for mobile? Small screen Limited keyboard Difficult text entry Difficult to navigate Slow, not reliable connectivity (latency) Speech is fundamentally changing the mobile user 22 experience
  • 24. Google Now, Google search Some Android phones: two mics 24
  • 26. Poor performance in the Czech Rep. 26
  • 27. iOS Siri versus Google search  Siri are "natural language processing" apps that use statistical  Siri is deep in iOS, start apps, make calls, set meetings  Google is deep in the search engine  Can't launch apps with Google, you can dictate an email or a text message.  Google is faster (much faster) Future – combination of AI and different UI 27
  • 28. FUTURE 28
  • 29. Future challenges Better recognition, ROBUSTNES (noisy conditions, dictation) Better UI integration (speech button) Multiple languages (how would a German native search for an address in France?) Switching between multiple languages UI combining multiple modalities, (voice, text, video, sensors) Work on dictated text correction Better integration of speech reco to special applications 29 ECSS 2010, 10/12/2010