Follow us on Twitter
twitter icon@FreshPatents


Speech Recognition patents

      

This page is updated frequently with new Speech Recognition-related patent applications.

Systems and methods for manipulating electronic content based on speech recognition
Systems and methods are disclosed for displaying electronic multimedia content to a user. One computer-implemented method for manipulating electronic multimedia content includes generating, using a processor, a speech model and at least one speaker model of an individual speaker.

Management layer for multiple intelligent personal assistant services
Performing speech recognition in a multi-device system includes receiving a first audio signal that is generated by a first microphone in response to a verbal utterance, and a second audio signal that is generated by a second microphone in response to the verbal utterance; dividing the first audio signal into a first sequence of temporal segments; dividing the second audio signal into a second sequence of temporal segments; comparing a sound energy level associated with a first temporal segment of the first sequence to a sound energy level associated with a first temporal segment of the second sequence; based on the comparing, selecting, as a first temporal segment of a speech recognition audio signal, one of the first temporal segment of the first sequence and the first temporal segment of the second sequence; and performing speech recognition on the speech recognition audio signal.. .

Speech recognition method, device and system based on artificial intelligence
The present disclosure provides a speech recognition method, device and system based on artificial intelligence. The method includes: collecting speech data to be recognized in a speech recognition process; sending uplink data stream to a server via an uplink connection to the server, in which the uplink data stream includes the speech data; and receiving downlink data stream sent by the server via a downlink connection to the server in parallel with sending the uplink data stream to the server, in which the downlink data stream includes result data, and the result data is obtained by the server performing speech recognition according to the speech data..

Electronic devices with voice command and contextual data processing capabilities
An electronic device may capture a voice command from a user. The electronic device may store contextual information about the state of the electronic device when the voice command is received.

Automatic learning of language models
Techniques and systems are disclosed for context-dependent speech recognition. The techniques and systems described enable accurate recognition of speech by accessing sub-libraries associated with the context of the speech to be recognized.

Terminal, unlocking method, and program
A terminal comprises: a speech receiving unit that receives speech in a locked state; a voiceprint authentication unit that performs voiceprint authentication based on the speech received in the locked state and determining whether or not a user is legitimate; a speech recognition unit that performs speech recognition of the speech received in the locked state; and an execution unit that executes an application using a result of the speech recognition.. .

Spoken language understanding based on buffered keyword spotting and speech recognition
Techniques are provided for spoken language understanding based on keyword spotting and speech recognition. A methodology implementing the techniques according to an embodiment includes detecting a user spoken keyword or key-phrase embedded in an initial segment of a received audio signal, which is stored in a buffer.

Systems and methods for energy efficient and low power distributed automatic speech recognition on wearable devices
Methods, apparatus, systems and articles of manufacture are disclosed for distributed automatic speech recognition. An example apparatus includes a detector to process an input audio signal and identify a portion of the input audio signal including a sound to be evaluated, the sound to be evaluated organized into a plurality of audio features representing the sound.

Communication device
Provided is a technology which improves reliability of the interaction between devices in a system where the devices communicate. In an information appliance system, multiple information appliances and a communication device such as a smart phone are in m2m communication.

Hotword detection on multiple devices
Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for hotword detection on multiple devices are disclosed. In one aspect, a method includes the actions of receiving audio data that corresponds to an utterance.

Call control system and call control method

An information processor requests a recognition result manager to transmit recording information about a call including a keyword and a recognition result of speech recognition using an extension number as a key. The manager transmits the recording information about the call including the keyword corresponding to the extension number and the recognition result of the speech recognition to the processor.

System and correlating mouth images to input commands

A system for automated speech recognition utilizes computer memory, a processor executing imaging software and audio processing software, and a camera transmitting images of a physical source of speech input. Audio processing software includes an audio data stream of audio samples derived from at least one speech input.

Method for operating speech recognition service, electronic device and system supporting the same

An electronic device is provided. The electronic device includes a communication module, a microphone receiving a voice input according to user speech, a memory storing information about an operation of the speech recognition service, a display, and a processor electrically connected with the communication module, the microphone, the memory, and the display.

Method for operating speech recognition service and electronic device supporting the same

An electronic device includes a communication module, a sensor module, a microphone, a memory, a display, and a processor. The processor is configured to determine whether a user is in proximity to the electronic device, to transmit at least one of voice input information or information on the proximity of the user to the external device, to receive at least one action associated with the execution of the function of the electronic device corresponding to a recognition result from the external device, based on voice input recognition of the external device, to output content associated with execution and/or processing of each of the at least one action, when the user is spaced apart from the electronic device, and to prevent at least a portion of the at least one content from being output in a state that the user is within the specified proximity to the electronic device..

Object authentication device and object authentication method

An object authentication device includes a speech recognition unit configured to obtain candidates for a speech recognition result for an input speech and a likelihood of the speech as a speech likelihood and an image model generation unit configured to obtain image models of a predetermined number of candidates for the speech recognition result in descending order of speech likelihoods, wherein the image model generation unit initially performs retrieval from an image model database storing the image models when the image models for the candidates for the speech recognition result are generated and generates an image model from information acquired from a network if the image model is not stored in the image model database.. .

Object authentication device and object authentication method

An object authentication device includes a speech recognition unit configured to obtain candidates for a speech recognition result for an input speech and a likelihood of the speech as a speech likelihood, an image model generation unit configured to obtain image models of a predetermined number of candidates for the speech recognition result in descending order of speech likelihoods, an image likelihood calculation unit configured to obtain an image likelihood based on an image model of an input image, and an object authentication unit configured to perform object authentication using the image likelihood, wherein vocabularies predicted through speech recognition are categorized and the image model is formed in association with a category.. .

Speech recognition devices and speech recognition methods

The present disclosure provides a speech recognition method and a speech recognition device. The speech recognition method includes receiving a voice instruction of a user.

Method and system for predicting speech recognition performance using accuracy scores

A system and method are presented for predicting speech recognition performance using accuracy scores in speech recognition systems within the speech analytics field. A keyword set is selected.

Method for operating speech recognition service and electronic device supporting the same

An electronic device is provided. The electronic device includes a processor, and a memory, wherein the memory stores instructions that, when executed, cause the processor to receive a user input including a request for performing a first task that requires at least one parameter for execution and not including the entire at least one parameter, transmit first data related to the user input to an external server, receive a first sequence of states of the electronic device for performing the first task from the external server, perform not all but some of the first sequence of the states while displaying at least some of states changed in the first sequence, and after the performing of the some of the first sequence, display a gui that is required for performing the first task and represents that a user is requested to provide at least one parameter omitted in the user input..

Information processing apparatus, information processing method, and computer program product

According to an embodiment, an information processing apparatus includes one or more processors. The one or more processors are configured to acquire target sentence data including a plurality of morphemes obtained by speech recognition and speech generation time of each morpheme from the plurality of morphemes; and assign display time according to a difference between a confirmed sentence of which a user's correction for the target sentence data is confirmed and a second confirmed sentence of a previous speech generation time..
Kabushiki Kaisha Toshiba

Network access speech recognition service based on artificial intelligence

The present disclosure discloses a network access method and a network access apparatus for speech recognition service based on artificial intelligence. The network access method includes: judging whether there is available ip address information in an ip buffer module when a speech recognition request is received, in which the ip buffer module is configured to buffer ip address information used for a speech recognition performed successfully last time; performing an identity authentication on the available ip address information when there is the available ip address information in the ip buffer module; and accessing to the speech recognition service via the available ip address information passing the identity authentication, in which the speech recognition service is configured to recognize a speech in the speech recognition request..
Baidu Online Network Technology (beijing) Co., Ltd.

Speech dialogue device and speech dialogue method

A correspondence relationship between keywords for instructing the start of a speech dialogue and modes of a response is defined in a response-mode correspondence table. A response-mode selecting unit selects a mode of a response corresponding to a keyword included in the recognition result of a speech recognition unit using the response-mode correspondence table.
Mitsubishi Electric Corporation

Speech recognition device, speech recognition method, non-transitory recording medium, and robot

A feature extractor extracts feature quantities from a digitized speech signal and outputs the feature quantities to a likelihood calculator. A distance determiner determines the distance between a user providing speech and a speech input unit.
Casio Computer Co., Ltd.

Execution of voice commands in a multi-device system

Performing speech recognition in a multi-device system includes receiving a first audio signal that is generated by a first microphone in response to a verbal utterance, and a second audio signal that is generated by a second microphone in response to the verbal utterance; dividing the first audio signal into a first sequence of temporal segments; dividing the second audio signal into a second sequence of temporal segments; comparing a sound energy level associated with a first temporal segment of the first sequence to a sound energy level associated with a first temporal segment of the second sequence; based on the comparing, selecting, as a first temporal segment of a speech recognition audio signal, one of the first temporal segment of the first sequence and the first temporal segment of the second sequence; and performing speech recognition on the speech recognition audio signal.. .
Harman International Industries, Inc.

Sound identification utilizing periodic indications

A computer-implemented method is provided. The computer-implemented method is performed by a speech recognition system having at least a processor.
International Business Machines Corporation

Constructing speech decoding network for numeric speech recognition

The embodiments of the present disclosure disclose a method for constructing a speech decoding network in digital speech recognition. The method comprises acquiring training data obtained by digital speech recording, the training data comprising a plurality of speech segments, and each speech segment comprising a plurality of digital speeches; performing acoustic feature extraction on the training data to obtain a feature sequence corresponding to each speech segment; performing progressive training starting from a mono-phoneme acoustic model to obtain an acoustic model; acquiring a language model, and constructing a speech decoding network by the language model and the acoustic model obtained by training..
Tencent Technology (shenzhen) Company Limited

System and optimizing speech recognition and natural language parameters with user feedback

Disclosed herein are systems, methods, and non-transitory computer-readable storage media for assigning saliency weights to words of an asr model. The saliency values assigned to words within an asr model are based on human perception judgments of previous transcripts.
Nuance Communications, Inc.

Relative excitation features for speech recognition

Described herein is a major breakthrough for explaining and simulating the human auditory perception and its robustness.. .

Methods for the automated generation of speech sample asset production scores for users of a distributed language learning system, automated accent recognition and quantification and improved speech recognition

Methods for automated generation of speech sample asset production scores for users of a distributed language learning system, automated accent recognition and quantification and improved speech recognition. Utilising a trained supervised machine learning module which is trained utilising a training set comprising a plurality of production speech sample asset recordings, associated production scores generated by system users performing perception exercises and user background information.
Vendome Consulting Pty Ltd

Training speech recognition

A training method and apparatus for speech recognition is disclosed, where an example of the training method includes determining whether a current iteration for training a neural network is performed by an experience replay iteration using an experience replay set, selecting a sample from at least one of the experience replay set and a training set based on a result of the determining, and training the neural network based on the selected sample.. .
Samsung Electronics Co., Ltd.

Speech recognition device, speech recognition method and storage medium

In a speech recognition device according to one embodiment, a microphone detects sound and generates an audio signal corresponding to the sound, an adjustment processor adjusts a threshold to be a value less than a first volume level of first input audio signal generated by the microphone, and registers the adjusted threshold, a recognition processor reads the registered threshold, compares the registered threshold with a second input audio signal, discards the second input audio signal when a second volume level of the second input audio signal is less than the registered threshold, and performs a recognition process as the audio signal of a user to be recognized when the second volume level of the second input audio signal is greater than or equal to the registered threshold.. .
Kabushiki Kaisha Toshiba

Cognitive journey companion system

A system and method for cognitive risk mitigation are presented. Embodiments comprise journey prediction, parsing of data sources, risk assessment and mitigation, and natural-language user interaction by a cognitive processor.
University College Dublin

Semiautomated relay method and apparatus

A relay for captioning a hearing user's (hu's) voice signal during a phone call between an hu and a hearing assisted user (au), the hu using an hu device and the au using an au device where the hu voice signal is transmitted from the hu device to the au device, the relay comprising a display screen, a processor linked to the display and programmed to perform the steps of receiving the hu voice signal from the au device, transmitting the hu voice signal to a remote automatic speech recognition (asr) server running asr software that converts the hu voice signal to asr generated text, the remote asr server located at a remote location from the relay, receiving the asr generated text from the asr server, present the asr generated text for viewing by a call assistant (ca) via the display and transmitting the asr generated text to the au device.. .
Ultratec, Inc.

Systems, methods and devices for intelligent speech recognition and processing

Systems, methods, and devices for intelligent speech recognition and processing are disclosed. According to one embodiment, a method for improving intelligibility of a speech signal may include (1) at least one processor receiving an incoming speech signal comprising a plurality of sound elements; (2) the at least one processor recognizing a sound element in the incoming speech signal to improve the intelligibility thereof; (3) the at least one processor processing the sound element by at least one of modifying and replacing the sound element; and (4) the at least one processor outputting the processed speech signal comprising the processed sound element..
Audimax Llc

Communication terminal, communication method, and computer program product

A communication terminal includes circuitry. The circuitry receives audio data collected by an audio collecting device, and transmits the audio data to a speech recognition apparatus via a network.
Ricoh Company, Ltd.

Information processing apparatus, information processing method, and computer program product

An information processing apparatus is communicable with a speech recognition apparatus via a network includes circuitry. The circuitry implements a speech recognition function that performs speech recognition on audio data collected by an audio collecting device.
Ricoh Company, Ltd.

Quality feedback on user-recorded keywords for automatic speech recognition systems

Systems and methods are provided for an automated speech recognition system. A microphone records a keyword spoken by a user, and a front end divides the recorded keyword into a plurality of subunits, each containing a segment of recorded audio, and extracts a set of features from each of the plurality of subunits.
Texas Instruments Incorporated

System and rapid customization of speech recognition models

Disclosed herein are systems, methods, and non-transitory computer-readable storage media for generating domain-specific speech recognition models for a domain of interest by combining and tuning existing speech recognition models when a speech recognizer does not have access to a speech recognition model for that domain of interest and when available domain-specific data is below a minimum desired threshold to create a new domain-specific speech recognition model. A system configured to practice the method identifies a speech recognition domain and combines a set of speech recognition models, each speech recognition model of the set of speech recognition models being from a respective speech recognition domain.
Nuance Communications, Inc.

Electronic apparatus, speech recognition method thereof, and non-transitory computer readable recording medium

An electronic apparatus is provided. The electronic apparatus according to an embodiment includes an audio input unit configured to receive sound sources from different positions and generate a plurality of voice signals, a pre-processor configured to perform pre-processing of the plurality of voice signals, and a voice recognition unit configured to perform voice recognition using the plurality of voice signals pre-processed by the pre-processor, and in response to a predetermined trigger being detected as a result of the voice recognition, generate trigger information, wherein the pre-processor is further configured to receive feedback on the trigger information generated by the voice recognition unit, change a pre-processing method according to the trigger information, process the plurality of voice signals using the changed pre-processing method, and generate enhanced voice signals..
Samsung Electronics Co., Ltd.

Systems and methods for diagnosing and analyzing concussions

A mobile device is programmed with an application that uses the mobile device's camera, accelerometer and microphone to enable a parent, coach or player to use it as a tool to diagnose a concussion. The tool may diagnose concussion on the basis of one or multiple factors that are scored, for example the player's balance, eye movement, speech responses to questions, button pressing response time, and other information about the location of the impact.
Apptek, Inc.

System and multichannel end-to-end speech recognition

A speech recognition system includes a plurality of microphones to receive acoustic signals including speech signals, an input interface to generate multichannel inputs from the acoustic signals, one or more storages to store a multichannel speech recognition network, wherein the multichannel speech recognition network comprises mask estimation networks to generate time-frequency masks from the multichannel inputs, a beamformer network trained to select a reference channel input from the multichannel inputs using the time-frequency masks and generate an enhanced speech dataset based on the reference channel input and an encoder-decoder network trained to transform the enhanced speech dataset into a text. The system further includes one or more processors, using the multichannel speech recognition network in association with the one or more storages, to generate the text from the multichannel inputs, and an output interface to render the text..

Convolutional recurrent neural networks for small-footprint keyword spotting

Described herein are systems and methods for creating and using convolutional recurrent neural networks (crnns) for small-footprint keyword spotting (kws) systems. Inspired by the large-scale state-of-the-art speech recognition systems, in embodiments, the strengths of convolutional layers to utilize the structure in the data in time and frequency domains are combined with recurrent layers to utilize context for the entire processed frame.

Speech recognition device, speech recognition method, and computer program product

A speech recognition device includes one or more processors configured to calculate a score vector sequence on the basis of a speech signal, search a search model to detect a path following the input symbol from which a likely acoustic score in the score vector sequence is obtained, and output an output symbol allocated to the detected path. The symbol set includes a symbol representing a phonetic unit to be recognized, and an additional symbol representing at least one of a filler, a disfluency, and a non-speech sound.

Speech recognition using an operating system hooking component for context-aware recognition models

Inputs provided into user interface elements of an application are observed. Records are made of the inputs and the state(s) the application was in while the inputs were provided.

Speech recognition

The present disclosure provides a speech recognition method and device. The method includes: receiving a speech signal; decoding the speech signal according to an acoustic model, a language model and a decoding network established in advance, and dynamically adding a blank unit in a decoding process to obtain an optimum decoding path with the added blank unit, in which the acoustic model is obtained based on connectionist temporal classification training, the acoustic model includes basic pronunciation units and the blank unit, and the decoding network includes a plurality of decoding paths consisting of the basic pronunciation units; and outputting the optimum decoding path as a recognition result of the speech signal..
Baidu Online Network Technology (beijing) Co., Ltd.

System and personalization in speech recognition

Systems, methods, and computer-readable storage devices are for identifying a user profile for speech recognition. The user profile is selected from one of several user profiles which are all associated with a speaker, and can be selected based on the identity of the speaker, the location of the speaker, the device the speaker is using, or other relevant parameters.
At&t Intellectual Property I, L.p.

Speech recognition method and apparatus

A speech recognition method comprises: generating, based on a preset speech knowledge source, a search space comprising preset client information and for decoding a speech signal; extracting a characteristic vector sequence of a to-be-recognized speech signal; calculating a probability at which the characteristic vector corresponds to each basic unit of the search space; and executing a decoding operation in the search space by using the probability as an input to obtain a word sequence corresponding to the characteristic vector sequence.. .
Alibaba Group Holding Limited

Hyperarticulation detection in repetitive voice queries using pairwise comparison for improved speech recognition

Automatic speech recognition systems can benefit from cues in user voice such as hyperarticulation. Traditional approaches typically attempt to define and detect an absolute state of hyperarticulation, which is very difficult, especially on short voice queries.
Microsoft Technology Licensing, Llc

Speech recognition program medium, speech recognition apparatus, and speech recognition method

A speech recognition method to be performed by a computer, the method including: detecting a first keyword uttered by a user from an audio signal representing voice of the user; detecting a term indicating a request of the user from sections that follow the first keyword in the audio signal; and determining a type of speech recognition processing applied to the following sections in accordance with the detected term indicating the request of the user.. .
Fujitsu Limited

Vehicle and control method thereof

A vehicle includes: an input unit configured to receive an execution command for speech recognition; a calculator configured to calculate a time in which the vehicle is expected to arrive at an obstacle existing on a road on which the vehicle travels; and a speech recognition controller configured to compare the calculated time in which the vehicle is expected to arrive at the obstacle to a time in which a voice command input is expected to be completed to determine whether to perform dynamic noise removal pre-processing.. .
Kia Motors Corporation

Vehicle control multi-intent queries input by voice

An infotainment system of a vehicle includes: a primary intent module configured to determine a primary intent included in voice input using automated speech recognition (asr); and an execution module configured to, via a first hardware output device of the vehicle, execute the primary intent. A secondary intent module is configured to: based on the primary intent, determine a first domain of the primary intent; based on the first domain of the primary intent, determine a second domain; and based on the voice input and the second domain, determine a secondary intent included in the voice input using asr.
Gm Global Technology Operations Llc

Enhanced voice recognition task completion

A method for recognizing speech in a vehicle includes receiving speech at a microphone installed to a vehicle, and determining whether the speech includes a navigation instruction. If the speech includes a navigation instruction, the speech may be sent to a remote facility.
Gm Global Technology Operations Llc

Wfst decoding system, speech recognition system including the same and storing wfst data

A weighted finite-state transducer (wfst) decoding system is provided. The wfst decoding system includes a memory that stores wfst data and a wfst decoder including a data fetch logic.
Samsung Electronics Co., Ltd.

Language model biasing system

Methods, systems, and apparatus for receiving audio data corresponding to a user utterance and context data, identifying an initial set of one or more n-grams from the context data, generating an expanded set of one or more n-grams based on the initial set of n-grams, adjusting a language model based at least on the expanded set of n-grams, determining one or more speech recognition candidates for at least a portion of the user utterance using the adjusted language model, adjusting a score for a particular speech recognition candidate determined to be included in the expanded set of n-grams, determining a transcription of user utterance that includes at least one of the one or more speech recognition candidates, and providing the transcription of the user utterance for output.. .
Google Inc.

Enhanced automatic speech recognition

Devices, systems, and methods of enhanced automatic speech recognition. An acoustic microphone senses or captures acoustic signals that are uttered by a human speaker.
Vocalzoom Systems Ltd.

Enhanced speech generation

In a particular aspect, an apparatus includes an audio sensor configured to receive an input audio signal. The apparatus also includes speech generative circuitry configured to generate a synthesized audio signal based at least partly on automatic speech recognition (asr) data associated with the input audio signal and based on one or more parameters indicative of state information associated with the input audio signal..
Qualcomm Incorporated

Configurable phone with interactive voice response engine

A land-based or mobile phone and methods are provided for receiving inbound communications as either voice or text, and then based on the user's configuration settings, the inbound communication is provided to the user as it was received or is automatically converted into a format that is desired by the user. The phone also takes voice or text that is input by the user of the phone and converts the user's input to either voice or text based on the configuration settings stored in the users contact list or otherwise.

Method and speech recognition

A speech recognition method includes receiving a sentence generated through speech recognition, calculating a degree of suitability for each word in the sentence based on a relationship of each word with other words in the sentence, detecting a target word to be corrected among the words in the sentence based on the degree of suitability for each word, and replacing the target word with any one of candidate words corresponding to the target word.. .
Samsung Electronics Co., Ltd.

System and three-way call detection

A system for detecting three-way calls in a monitored telephone conversation includes a speech recognition processor that transcribes the monitored telephone conversation and associates characteristics of the monitored telephone conversation with a transcript thereof, a database to store the transcript and the characteristics associated therewith, and a three-way call detection processor to analyze the characteristics of the conversation and to detect therefrom the addition of one or more parties to the conversation. The system preferably includes at least one domain-specific language model that the speech recognition processor utilizes to transcribe the conversation.
Dsi-iti, Llc

Input generation for classifier

A computer-implemented method for generating an input for a classifier. The method includes obtaining n-best hypotheses which is an output of an automatic speech recognition (asr) for an utterance, combining the n-best hypotheses horizontally in a predetermined order with a separator between each pair of hypotheses, and outputting the combined n-best hypotheses as a single text input to a classifier..
International Business Machines Corporation

Speech recognition involving a mobile device

A system and method of speech recognition involving a mobile device. Speech input is received (202) on a mobile device (102) and converted (204) to a set of phonetic symbols.
Apple Inc.

System and speech-based interaction resolution

A method for automatically retrieving documents based on customer speech received at a contact center of an organization includes: receiving, by a processor, at the contact center of the organization, speech from a customer; performing, by the processor, automatic speech recognition on the received speech to generate recognized text; generating, by the processor, a search query from the recognized text; searching, by the processor, a knowledge base specific to the organization for one or more documents relevant to the search query; and returning, by the processor, the one or more documents relevant to the search query.. .
Interactive Intelligence Group, Inc.

Control translation device, translation device, and non-transitory computer-readable recording medium storing a program

A translation device includes a microphone, a sensor that detects an attitude of the translation device, and a display. In a control method of the translation device, audio signals indicating audio from a first user are generated by the microphone, and change in the attitude of the translation device, detected by the sensor, is detected, and second text, generated by translation processing performed on first text obtained by speech recognition of the audio signals generated until detection of change in attitude of the translation device, is displayed on the display..
Panasonic Intellectual Property Management Co., Ltd.

Announcement system and speech-information conversion apparatus

An announcement system includes: a sound-pickup apparatus for receiving a speech expressing a fixed-form sentence; a conversion apparatus for generating a translation of the fixed-form sentence based on the speech received; and output apparatus for presenting information indicating the translation. The conversion apparatus includes: a storage unit for storing first-information indicating a predetermined sentence in a mode, and second-information indicating the predetermined sentence in another mode; an audio-input unit for receiving speech-information indicating the fixed-form sentence; speech recognition unit for generating text-information based on the speech-information; conversion processing unit for identifying the first-information corresponding to the fixed-form sentence, based on the text-information and a part of the first-information, before the sound pickup apparatus finishes receiving the speech expressing the whole fixed-form sentence; and transmission unit for transmitting the second-information corresponding to the identified first-information.
Panasonic Intellectual Property Management Co., Ltd.

Speech recognition based on context and multiple recognition engines

Using many speech recognition engines, one can select which one is best at any given iteration of sending a command to a device to be interpreted and carried out. Depending on the context, a different result of many results received from speech recognition engines is chosen.
Essence, Inc

Reduced latency speech recognition system using multiple recognizers

Method and apparatus for providing visual feedback on an electronic device in a client/server speech recognition system comprising the electronic device and a network device remotely located from the electronic device. The method comprises processing, by an embedded speech recognizer of the electronic device, at least a portion of input audio comprising speech to produce local recognized speech, sending at least a portion of the input audio to the network device for remote speech recognition, and displaying, on a user interface of the electronic device, visual feedback based on at least a portion of the local recognized speech prior to receiving streaming recognition results from the network device..
Nuance Communications, Inc.

Voice control of an integrated room automation system

A voice controlled room automation system that includes a speaker device situated in a guest room, a hotel automation controller operatively coupled to one or more components in the guest room, and a web service operatively coupled to the speaker device and the hotel automation controller. The web service is configured to receiving voice commands from the speaker device, process the voice command using speech recognition, interpret the voice command to determine a corresponding command for the hotel automation controller, and transmit the corresponding command to the hotel automation controller.
Honeywell International Inc.

Speech recognition apparatus with cancellation period

A speech recognition apparatus includes a recognition unit configured to perform, in response to audio data, a recognition process with respect to a first word registered in advance and a recognition process with respect to a second word registered in advance, the recognition process with respect to the second word being performed during a cancellation period associated with the first word upon the first word being recognized, and a control unit configured to perform a control operation associated with the recognized first word upon the first word being recognized by the recognition unit, and to cancel the control operation upon the second word being recognized by the recognition unit.. .
Alpine Electronics, Inc.

Speech recognition method and apparatus

A speech recognition method includes generating pieces of candidate text data from a speech signal of a user, determining a decoding condition corresponding to an utterance type of the user, and determining target text data among the pieces of candidate text data by performing decoding based on the determined decoding condition.. .
Samsung Electronics Co., Ltd.

National Institute Of Information And Communications Technology

. .

Altering audio to improve automatic speech recognition

Techniques for altering audio being output by a voice-controlled device, or another device, to enable more accurate automatic speech recognition (asr) by the voice-controlled device. For instance, a voice-controlled device may output audio within an environment using a speaker of the device.
Amazon Technologies, Inc.

Speech recognition device and computer program

[solution] a speech recognition device includes: an acoustic model 308 implemented by a rnn (recurrent neural network) for calculating, for each state sequence, the posterior probability of a state sequence in response to an observed sequence consisting of prescribed speech features obtained from a speech; a wfst 320 based on s−1hclg calculating, for each word sequence, posterior probability of a word sequence in response to a state sequence; and a hypothesis selecting unit 322, performing speech recognition of the speech signal based on a score calculated for each hypothesis of a word sequence corresponding to the speech signal, using the posterior probabilities calculated by the acoustic model 308 and the wfst 320 for the input observed sequence.. .

Automatic language model update

A method for generating a speech recognition model includes accessing a baseline speech recognition model, obtaining information related to recent language usage from search queries, and modifying the speech recognition model to revise probabilities of a portion of a sound occurrence based on the information. The portion of a sound may include a word.
Google Llc

Method and device for updating language model and performing speech recognition based on language model

A method of updating a grammar model used during speech recognition includes obtaining a corpus including at least one word, obtaining the at least one word from the corpus, splitting the at least one obtained word into at least one segment, generating a hint for recombining the at least one segment into the at least one word, and updating the grammar model by using at least one segment comprising the hint.. .
Samsung Electronics Co., Ltd.

System and improving speech recognition accuracy using textual context

Disclosed herein are systems, methods, and computer-readable storage media for improving speech recognition accuracy using textual context. The method includes retrieving a recorded utterance, capturing text from a device display associated with the spoken dialog and viewed by one party to the recorded utterance, and identifying words in the captured text that are relevant to the recorded utterance.
Nuance Communications, Inc.

Mobile phone having temporary activation of voice command interface after receiving a message

The embodiments provided herein are directed to a system and method of message-triggered voice command interface in portable electronic devices. The voice command interface is normally not activated until a message (e.g., an e-mail, a text message, or a voice mail) has been received by a portable electronic device.

Methods and hybrid speech recognition processing

Methods and apparatus for selectively performing speech processing in a hybrid speech processing system. The hybrid speech processing system includes at least one mobile electronic device and a network-connected server remotely located from the at least one mobile electronic device.
Nuance Communications, Inc.

Mixed model speech recognition

In one aspect, a method comprises accessing audio data generated by a computing device based on audio input from a user, the audio data encoding one or more user utterances. The method further comprises generating a first transcription of the utterances by performing speech recognition on the audio data using a first speech recognizer that employs a language model based on user-specific data.
Google Llc

Method of and system for providing adaptive respondent training in a speech recognition application

A system for conducting a telephonic speech recognition application includes an automated telephone device for making telephonic contact with a respondent and a speech recognition device which, upon the telephonic contact being made, presents the respondent with at least one introductory prompt for the respondent to reply to; receives a spoken response from the respondent; and performs a speech recognition analysis on the spoken response to determine a capability of the respondent to complete the application. If the speech recognition device, based on the spoken response to the introductory prompt, determines that the respondent is capable of competing the application, the speech recognition device presents at least one application prompt to the respondent.
Eliza Corporation

Adaptive audio enhancement for multichannel speech recognition

Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for neural network adaptive beamforming for multichannel speech recognition are disclosed. In one aspect, a method includes the actions of receiving a first channel of audio data corresponding to an utterance and a second channel of audio data corresponding to the utterance.
Google Llc

Timer apparatus and method

A multi-channel timing device is described that can be completely controlled by a user's voice for hands free operation, and has a wireless communications link to a mobile device. Preferably the timing device comprises a processor, a memory, a display, a microphone, a speaker, a processor, a transceiver, and an audio chip.
Peter C. Salmon, Llc

Real-time script for live broadcast

In one embodiment, a method includes retrieving, from one or more data stores, a script including multiple text strings, where the script is associated with a user of a social-networking system. The method also includes capturing an incoming media stream including audio data corresponding to vocal expression by the user, where the media stream is transmitted to the social-networking system for broadcast and identifying, using a speech recognition process, one or more words in the vocal expression corresponding to a text string of the script.
Facebook, Inc.

Voice interface and vocal entertainment system

A system and method that enhances spoken utterances and provides entertainment by capturing one or more microphone signals containing echo and decomposing the one or more microphone signals into a plurality of signal paths through a synthesizer that adds or makes non-linear modifications to some of the captured one or more microphone signals. The system and method and estimates multiple echo paths from each of the one the one or more microphones.
Blackberry Limited

Voice interface and vocal entertainment system

A system and method that enhances spoken utterances and provides entertainment by capturing one or more microphone signals containing echo and decomposing the one or more microphone signals into a plurality of signal paths through a synthesizer that adds or makes non-linear modifications to some of the captured one or more microphone signals. The system and method and estimates multiple echo paths from each of the one the one or more microphones.
Blackberry Limited

Method of providing voice command and electronic device supporting the same

An electronic device, a method, and a chip set are provided. The electronic device includes a memory configured to store at least one of audio feature data of audio data and speech recognition data obtained by speech recognition of audio data; and a control module connected to the memory, wherein the control module is configured to update a voice command that is set to execute a function through voice, the function being selected based on at least one of the audio feature data, the speech recognition data, and function execution data executed in relation to the audio data..
Samsung Electronics Co., Ltd.

System and performing automatic speech recognition using local private data

A method of providing hybrid speech recognition between a local embedded speech recognition system and a remote speech recognition system relates to receiving speech from a user at a device communicating with a remote speech recognition system. The system recognizes a first part of speech by performing a first recognition of the first part of the speech with the embedded speech recognition system that accesses private user data, wherein the private user data is not available to the remote speech recognition system.
Nuance Communications, Inc.

Pronunciation guided by automatic speech recognition

Speech synthesis chooses pronunciations of words with multiple acceptable pronunciations based on an indication of a personal, class-based, or global preference or an intended non-preferred pronunciation. A speaker's words can be parroted back on personal devices using preferred pronunciations for accent training.
Soundhound, Inc.

System and neural network based feature extraction for acoustic model development

A system and method are presented for neural network based feature extraction for acoustic model development. A neural network may be used to extract acoustic features from raw mfccs or the spectrum, which are then used for training acoustic models for speech recognition systems.
Interactive Intelligence Group, Inc.

Conference word cloud

Various disclosed implementations involve processing and/or playback of a recording of a conference involving a plurality of conference participants. Some implementations disclosed herein involve receiving speech recognition results data, including a plurality of speech recognition lattices and a word recognition confidence score for each of a plurality of hypothesized words of the speech recognition lattices, for a conference recording.
Dolby Laboratories Licensing Corporation

Multiple input multiple output (mimo) audio signal processing for speech de-reverberation

Audio signal processing for adaptive de-reverberation uses a least mean squares (lms) filter that has improved convergence over conventional lms filters, making embodiments practical for reducing the effects of reverberation for use in many portable and embedded devices, such as smartphones, tablets, laptops, and hearing aids, for applications such as speech recognition and audio communication in general. The lms filter employs a frequency-dependent adaptive step size to speed up the convergence of the predictive filter process, requiring fewer computational steps compared to a conventional lms filter applied to the same inputs.
Synaptics Incorporated

Control control device, control apparatus control system, and control device

Provided is a control method for a control device including: acquiring a user instruction to control a control target apparatus by a user, generating control speech information in response to the user instruction, the control speech information being speech information representing content of control on the control target apparatus and including auxiliary speech information which is different information from the user instruction, and outputting the generated control speech information to a speech recognition server which executes speech recognition processing.. .
Yamaha Corporation

Multi-speaker speech recognition correction system

The present invention relates to a multi-speaker speech recognition correction system for determining a speaker of an utterance with a simple method and easily correcting speech-recognized text during speech recognition for a plurality of speakers. According to the present invention, when speech signals are input to a multi-speaker speech recognition system from a plurality of microphones which are each provided to a corresponding one of a plurality of speakers, the multi-speaker speech recognition correction system may detect a speech session from a time point at which input of each of the speech signals is started to a time point at which the input of the speech signal is stopped, and a speech recognizer may convert only the detected speech sessions into text so that a speaker of an utterance can be identified by a simple method and speech recognition can be carried out at a low cost..
Sorizava Co., Ltd.

Identification of taste attributes from an audio signal

A system, method and computer product are provided for processing audio signals. An audio signal of a voice and background noise is input, and speech recognition is performed to retrieve speech content of the voice.
Spotify Ab

Security enhanced speech recognition

A security-enhanced speech recognition method and electronic device are provided. The electronic device according includes an input device configured to receive a speech signal, and a processor configured to perform speech recognition, wherein the processor determines whether to perform speech recognition based on whether the input device has been activated..
Samsung Electronics Co., Ltd.

Electronic device and speech recognition method therefor

Provided are an electronic device and speech recognition method therefor. The electronic device may include a communication interface to receive speech data from an external electronic device, a memory to store a common language model used by default for speech recognition, a first language model designated for each user, a second language model associated with context information of each user, and a third language model associated with words collected by the electronic device for a preset period of time from the reception time of the speech data; and a processor to perform a procedure of combining at least one of the first language model, the second language model, and the third language model with the common language model to construct an integrated language model, performing speech recognition on the basis of the speech data and the integrated language model, and outputting a speech recognition result corresponding to the speech data..
Samsung Electronics Co., Ltd.

Natural language grammar enablement by speech characterization

Either or both of voice speaker identification or utterance classification such as by age, gender, accent, mood, and prosody characterize speech utterances in a system that performs automatic speech recognition (asr) and natural language processing (nlp). The characterization conditions nlp, either through application to interpretation hypotheses or to specific grammar rules.
Soundhound, Inc.

Dialogue processing apparatus, a vehicle having same, and a dialogue processing method

A dialogue processing apparatus and method monitor an intensity of an acoustic signal that is input in real time and determine that speech recognition has started, when the intensity of the input acoustic signal is equal to or greater than a reference value, allowing a user to start speech recognition by an utterance without an additional trigger. A vehicle can include the apparatus and method.
Kia Motors Corporation

System and detecting phonetically similar imposter phrases

A system and method for detecting phonetically similar imposter phrases may include using automatic speech recognition (asr) to search for a first phrase in a set of objects; producing a list of references by searching for the first phrase in the set of objects using phonetic search; using output produced by the asr to determine whether or not a reference in the list points to a phrase that is the same as the first phrase; and if it is determined that the reference points to a second phrase that is different from the first phrase then marking the second phrase as a potential cause for a phrase search false positive.. .
Nice Ltd.

Method and device for extracting speech feature based on artificial intelligence

Embodiments of the present disclosure provide a method and a device for extracting a speech feature based on artificial intelligence. The method includes performing a spectrum analysis on a speech to be recognized, to obtain a spectrum program of the speech; and extracting features of the spectrum program by using an inception convolution structure of an image recognition algorithm, to obtain the speech feature of the speech.
Baidu Online Network Technology (beijing) Co., Ltd

Rank-reduced token representation for automatic speech recognition

The present disclosure generally relates to processing speech or text using rank-reduced token representation. In one example process, speech input is received.
Apple Inc.

Method and user device for providing context awareness service using speech recognition

A method for providing a context awareness service is provided. The method includes defining a control command for the context awareness service depending on a user input, triggering a playback mode and the context awareness service in response to a user selection, receiving external audio through a microphone in the playback mode, determining whether the received audio corresponds to the control command, and executing a particular action assigned to the control command when the received audio corresponds to the control command..
Samsung Electronics Co., Ltd.

System and assessing expressive language development of a key child

A method of assessing expressive language development of a key child. The method can include processing an audio recording taken in a language environment of the key child to identify segments of the audio recording that correspond to vocalizations of the key child.
Lena Foundation

Speech recognition method and apparatus

Disclosed is a speech recognition method and apparatus, the method including two recognition processes, a first recognition process being performed using an acoustic model and a language model and a second recognition process being performed without distinguishing between the acoustic model and the language model in response to an accuracy of a result of the first recognition process not meeting a threshold. The apparatus including a processor configured to acquire a first text from a speech sequence using an acoustic model and a language model, determine whether an accuracy of the first text meets a threshold, and acquire a second text from the first text based on a parameter generated in acquiring the first text, in response to the accuracy of the first text being below the threshold..
Samsung Electronics Co., Ltd.

Methods and post-processing speech recognition results of received radio voice messages onboard an aircraft

A method for displaying received radio voice messages onboard an aircraft is provided. The method post-processes, by at least one processor onboard the aircraft, a set of speech recognition (sr) hypothetical data to increase accuracy of an associated sr system, by: obtaining, by the at least one processor, secondary source data from a plurality of secondary sources; comparing, by the at least one processor, the set of sr hypothetical data to the secondary source data; and identifying, by the at least one processor, an aircraft tail number using the set of sr hypothetical data and the secondary source data; identifies, by the at least one processor, a subset of the received radio voice messages including the tail number; and presents, via a display device onboard the aircraft, the subset using distinguishing visual characteristics..
Honeywell International Inc.

Speech recognition using depth information

An example apparatus for detecting speech includes an image receiver to receive depth information corresponding to a face. The apparatus also includes a landmark detector to detect the face comprising lips and track a plurality of descriptor points comprising lip descriptor points located around the lips.
Intel Corporation

Methods and reducing latency in speech recognition applications

The method comprises receive first audio comprising speech from a user of a computing device, detecting an end of speech in the first audio, generating an asr result based, at least in part, on a portion of the first audio prior to the detected end of speech, determining whether a valid action can be performed by a speech-enabled application installed on the computing device using the asr result, and processing second audio when it is determined that a valid action cannot be performed by the speech-enabled application using the asr result.. .
Nuance Communications, Inc.

Speech recognition method and apparatus

A speech recognition method and a speech recognition apparatus which pre-download a speech recognition model predicted to be used and use the speech recognition model in speech recognition is provided. The speech recognition method, performed by the speech recognition apparatus, includes determining a speech recognition model, based on user information downloading the speech recognition model, performing speech recognition, based on the speech recognition model, and outputting a result of performing the speech recognition..
Samsung Electronics Co., Ltd.

Word hash language model

A language model may be used in a variety of natural language processing tasks, such as speech recognition, machine translation, sentence completion, part-of-speech tagging, parsing, handwriting recognition, or information retrieval. A natural language processing task may use a vocabulary of words, and a word hash vector may be created for each word in the vocabulary.
Asapp, Inc

Acoustic-to-word neural network speech recognizer

Methods, systems, and apparatus, including computer programs encoded on computer storage media for large vocabulary continuous speech recognition. One method includes receiving audio data representing an utterance of a speaker.
Google Llc

Complex linear projection for acoustic modeling

Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for speech recognition using complex linear projection are disclosed. In one aspect, a method includes the actions of receiving audio data corresponding to an utterance.
Google Llc

Visually-impaired-accessible building safety system

Building safety systems, methods, and mediums are provided. A method includes receiving a voice input by the building safety system.
Siemens Industry, Inc.

Speech recognition method and apparatus

A speech recognition method and apparatus for performing speech recognition in response to an activation word determined based on a situation are provided. The speech recognition method and apparatus include an artificial intelligence (ai) system and its application, which simulates functions such as recognition and judgment of a human brain using a machine learning algorithm such as deep learning..
Samsung Electronics Co., Ltd.

Information input method, apparatus and computing device

The present disclosure discloses an information input method and device, and a computing apparatus. The information input method comprises receiving a voice input of a user, acquiring a recognition result on the received voice input, and enabling editing of the acquired recognition result in a text format.
Guangzhou Shenma Mobile Information Technology Co. Ltd.

Method and recognizing speech

A method performed by a speech recognizing apparatus to recognize speech includes: obtaining a distance from the speech recognizing apparatus to a user generating a speech signal; determining a normalization value for the speech signal based on the distance; normalizing a feature vector extracted from the speech signal based on the normalization value; and performing speech recognition based on the normalized feature vector.. .
Samsung Electronics Co., Ltd.

Facilitating creation and playback of user-recorded audio

Methods, apparatus, and computer readable media are described related to recording, organizing, and making audio files available for consumption by voice-activated products. In various implementations, in response to receiving an input from a first user indicating that the first user intends to record audio content, audio content may be captured and stored.
Google Inc.

Speech recognition without interrupting the playback audio

Systems, methods, and devices for capturing speech input from a user are disclosed herein. A system includes a playback audio component, an audio rendering component, a capture component, a filter component, and a speech recognition component.
Ford Global Technologies, Llc

Method of automatically classifying speaking rate and speech recognition system using the same

Provided are a method of automatically classifying a speaking rate and a speech recognition system using the method. The speech recognition system using automatic speaking rate classification includes a speech recognizer configured to extract word lattice information by performing speech recognition on an input speech signal, a speaking rate estimator configured to estimate word-specific speaking rates using the word lattice information, a speaking rate normalizer configured to normalize a word-specific speaking rate into a normal speaking rate when the word-specific speaking rate deviates from a preset range, and a rescoring section configured to rescore the speech signal whose speaking rate has been normalized..
Electronics And Telecommunications Research Institute

System and mobile automatic speech recognition

A system and method of updating automatic speech recognition parameters on a mobile device are disclosed. The method comprises storing user account-specific adaptation data associated with asr on a computing device associated with a wireless network, generating new asr adaptation parameters based on transmitted information from the mobile device when a communication channel between the computing device and the mobile device becomes available and transmitting the new asr adaptation data to the mobile device when a communication channel between the computing device and the mobile device becomes available.
Nuance Communications, Inc.

Speech recognition system and method thereof, vocabulary establishing method and computer program product

A speech recognition system and method thereof, a vocabulary establishing method and a computer program product are provided. The speech recognition method includes: storing a speech recognition model including speech-units and basic components of acoustic models, wherein each of the speech-units includes at least one state and each state corresponds to one of the basic components of acoustic models; receiving first and second speech signals; obtaining a speech-unit sequence of a native/non-native vocabulary from a speech-analysis and unit-expansion module; recognizing the first speech signal according to the speech recognition model and the speech-unit sequence of the native/non-native vocabulary and further outputting a recognition result; and selecting an optimal component from the basic components of acoustic models according to the speech recognition model, the second speech signal, and the word corresponding to the second speech signal, and further updating the speech-units according to the best basic component of acoustic model..
Industrial Technology Research Institute

Cloud and name optimized speech recognition

A name file service is described that optimizes speech recognition in the cloud environment. The name file service monitors changes of users associated with tenant accounts and automatically updates a name file (or dictionary of names) for generating a grammar file used by speech recognition services.
Microsoft Technology Licensing, Llc

Using recurrent neural network for partitioning of audio data into segments that each correspond to a speech feature cluster identifier

Audio features, such as perceptual linear prediction (plp) features and time derivatives thereof, are extracted from frames of training audio data including speech by multiple speakers, and silence, such as by using linear discriminant analysis (lda). The frames are clustered into k-means clusters using distance measures, such as mahalanobis distance measures, of means and variances of the extracted audio features of the frames.
International Business Machines Corporation

Using long short-term memory recurrent neural network for speaker diarization segmentation

Speaker diarization is performed on audio data including speech by a first speaker, speech by a second speaker, and silence. The speaker diarization includes segmenting the audio data using a long short-term memory (lstm) recurrent neural network (rnn) to identify change points of the audio data that divide the audio data into segments.
International Business Machines Corporation

User dedicated automatic speech recognition

A multi-mode voice controlled user interface is described. The user interface is adapted to conduct a speech dialog with one or more possible speakers and includes a broad listening mode which accepts speech inputs from the possible speakers without spatial filtering, and a selective listening mode which limits speech inputs to a specific speaker using spatial filtering.
Nuance Communications, Inc.

Speech recognition device and method thereof

A speech recognition device and a method thereof are disclosed. The speech recognition method includes detecting an error word of a recognized received speech, separately encoding a right part and a left part with respect to the detected error word, and decoding while correcting the error word based on a vector of the encoded right and left parts, a last character of the encoded right part, and a last character of the left part..
Postech Academy-industry Foundation

Motion adaptive speech recognition for enhanced voice destination entry

A method or associated system for motion adaptive speech processing includes dynamically estimating a motion profile that is representative of a user's motion based on data from one or more resources, such as sensors and non-speech resources, associated with the user. The method includes effecting processing of a speech signal received from the user, for example, while the user is in motion, the processing taking into account the estimated motion profile to produce an interpretation of the speech signal.
Nuance Communications, Inc.

Speech recognition apparatus and speech recognition method

A speech recognition apparatus according to an embodiment includes a microphone that acquires an audio stream in which speech vocalized by a person is recorded, a camera that acquires an image data in which at least a mouth of the person is captured, and an operation element that recognizes speech including a consonant vocalized by the person, based on the audio stream, estimates the consonant vocalized by the person, based on the shape of the mouth of the person in the image data, and specifies the consonant based on the estimated consonant and the speech-recognized consonant.. .
Olympus Corporation

Detecting potential significant errors in speech recognition results

In some embodiments, recognition results produced by a speech processing system (which may include two or more recognition results, including a top recognition result and one or more alternative recognition results) based on an analysis of a speech input, are evaluated for indications of potential errors. In some embodiments, the indications of potential errors may include discrepancies between recognition results that are meaningful for a domain, such as medically-meaningful discrepancies.
Nuance Communications, Inc.

Systems and methods for recognizing, classifying, recalling and analyzing information utilizing ssm sequence models

A biologically-inspired model for sequence representation, method of construction and application of such models, and systems incorporating same are provided. The model captures the statistical nature of sequences and uses that for sequence encoding, recognition, and recall.
Suphatchatwong Innovation Co., Ltd.

System and multi-factor authentication using voice biometric verification

A system and method are presented for multi-factor authentication using voice biometric verification. When a user requests access to a system or application, voice identification may be triggered.
Interactive Intelligence Group, Inc.

Speech recognition system and method using an adaptive incremental learning approach

The present disclosure relates to speech recognition systems and methods using an adaptive incremental learning approach. More specifically, the present disclosure relates to adaptive incremental learning in a self-taught vocal user interface..
Katholieke Universiteit Leuven

Generating structured text content using speech recognition models

Methods, systems, and apparatus, including computer programs encoded on computer storage media for speech recognition. One method includes obtaining an input acoustic sequence, the input acoustic sequence representing one or more utterances; processing the input acoustic sequence using a speech recognition model to generate a transcription of the input acoustic sequence, wherein the speech recognition model comprises a domain-specific language model; and providing the generated transcription of the input acoustic sequence as input to a domain-specific predictive model to generate structured text content that is derived from the transcription of the input acoustic sequence..
Google Inc.

Information processing device, information processing, and program

To provide a technology capable of allowing a user to find whether speech is uttered with a volume at which speech recognition can be performed. Provided is an information processing device including: a determination portion configured to determine a user-uttered speech volume on the basis of input speech; and a display controller configured to control a display portion so that the display portion displays a display object.
Sony Corporation

Speech recognition apparatus and method

A speech recognition apparatus and method. The speech recognition apparatus includes one or more processors configured to reflect a final recognition result for a previous audio signal in a language model, generate a first recognition result of an audio signal, in a first linguistic recognition unit, by using an acoustic model, generate a second recognition result of the audio signal, in a second linguistic recognition unit, by using the language model reflecting the final recognition result for the previous audio signal, and generate a final recognition result for the audio signal in the second linguistic recognition unit based on the first recognition result and the second recognition result.
Samsung Electronics Co., Ltd.

Controlling a user interface console using speech recognition

Disclosed are examples of systems, apparatus, methods, and computer program products for controlling a user interface console using speech recognition. In some implementations, user interface consoles and speech commands are maintained.
Salesforce.com, Inc.

Methods and systems for locating the end of the keyword in voice sensing

Systems and methods for locating the end of a keyword in voice sensing are provided. An example method includes receiving an acoustic signal that includes a keyword portion immediately followed by a query portion.
Knowles Electronics, Llc

Dynamic pitch adjustment of inbound audio to improve speech recognition

Aspects of the present disclosure relate to dynamic pitch adjustment of inbound audio to improve speech recognition. Inbound audio may be received.
Sphero, Inc

Multi-microphone speech recognition systems and related techniques

A speech recognition system for resolving impaired utterances can have a speech recognition engine configured to receive a plurality of representations of an utterance and concurrently to determine a plurality of highest-likelihood transcription candidates corresponding to each respective representation of the utterance. The recognition system can also have a selector configured to determine a most-likely accurate transcription from among the transcription candidates.
Apple Inc.

Speech recognition apparatus, speech recognition method, and computer program product

According to an embodiment, a speech recognition apparatus includes a calculation unit that calculates, based on a speech signal, a score vector sequence including score vectors including an acoustic score for each of input symbols, a search unit that generates an input symbol string by searching for a path of the input symbol tracing the acoustic score having a high likelihood in the score vector sequence and that generates an output symbol representing a recognition result of the speech signal based on a recognition target symbol representing linguistic information as a recognition target among the input symbols, an additional symbol acquisition unit that obtains an additional symbol representing paralinguistic information and/or non-linguistic information from among the input symbols included in a range corresponding to the output symbol, and an output unit that outputs the output symbol and the obtained additional symbol in association with each other.. .
Kabushiki Kaisha Toshiba

Information processing apparatus, information processing method, and program

The present disclosure relates to an information processing apparatus, an information processing method, and a program that are capable of providing better user experience. An information processing apparatus includes an activation word setting unit that sets, on the basis of a detection result of detecting a user operation, a word used as an activation word for activating a predetermined function, the activation word being uttered by a user, the number of activation words being increased or decreased by the setting; and an activation word recognition unit that performs speech recognition on speech uttered by the user and recognizes that the word set by the activation word setting unit to be used as the activation word is uttered.
Sony Corporation

System and ranking of hybrid speech recognition results with neural networks

A method for ranking candidate speech recognition results includes generating, with a controller, a plurality of feature vectors for the candidate speech recognition results, each feature vector including one or more of trigger pair features, a confidence score feature, and word-level features. The method further includes providing the plurality of feature vectors as inputs to a neural network, generating a plurality of ranking scores corresponding to the plurality of feature vectors for the plurality of candidate speech recognition results based on an output layer of the neural network, and operating the automated system using the candidate speech recognition result in the plurality of candidate speech recognition results corresponding to a highest ranking score in the plurality of ranking scores as input..
Robert Bosch Gmbh

Sensor enhanced speech recognition

A system for sensor enhanced speech recognition is disclosed. The system may obtain visual content or other content associated with a user and an environment of the user.
At&t Intellectual Property I, L.p.

Methodology for automatic multilingual speech recognition

A method and device are provided for multilingual speech recognition. In one example, a speech recognition method includes receiving a multilingual input speech signal, extracting a first phoneme sequence from the multilingual input speech signal, determining a first language likelihood score indicating a likelihood that the first phoneme sequence is identified in a first language dictionary, determining a second language likelihood score indicating a likelihood that the first phoneme sequence is identified in a second language dictionary, generating a query result responsive to the first and second language likelihood scores, and outputting the query result..
The Charles Stark Draper Laboratory, Inc.

Electronic device and controlling electronic device using speech recognition

According to an embodiment of the present disclosure, an electronic device may comprise a microphone, a display, and a processor, wherein the processor may be configured to control the display to display, on the display, content including at least one object and at least one text corresponding to the at least one object, the at least one text obtained based on a resource comprising the content, to determine a first text from among the at least one text corresponding to a voice received using the microphone, and to execute a command corresponding to the received voice on a first object corresponding to the first text from among the at least one object based on at least one command registered to control the at least one object.. .
Samsung Electronics Co., Ltd.

Method for determining hearing thresholds in the absence of pure-tone testing

A method for determining hearing thresholds in the absence of pure-tone testing includes an adaptive speech recognition test which includes a calibrated item pool, each item having a difficulty value (di) and associated standard error (se). Test items are administered to a group of individuals having mild to moderately severe hearing loss to obtain responses.
Rochester Institute Of Technology

Mobile terminal and control method therefor

Provided are a mobile terminal and a control method therefor. The mobile terminal includes: a touchscreen disposed on one side of a main body of the terminal; a push key mounted on the main body to receive a push input; and a controller that displays virtual keys associated with the settings or control of the terminal on the touchscreen when the push key is pushed, and executes functions associated with a speech recognition mode when a set period of time elapses without at least one touch input on the virtual keys being sensed..
Lg Electronics Inc.

Methods and biometric authentication in an electronic device

Embodiments of the disclosure provide methods and apparatus in which a biometric authentication score generated as the result of a biometric authentication algorithm is compared to a threshold value that can be dynamically varied as required to provide a variable level of security. For example, the threshold value may be varied in dependence on the semantic content of a voice signal, and/or the context in which the voice signal was acquired.
Cirrus Logic International Semiconductor Ltd.

Speech recognition with acoustic models

Methods, systems, and apparatus, including computer programs encoded on computer storage media for learning pronunciations from acoustic sequences. One method includes receiving an acoustic sequence, the acoustic sequence representing an utterance, and the acoustic sequence comprising a sequence of multiple frames of acoustic data at each of a plurality of time steps; stacking one or more frames of acoustic data to generate a sequence of modified frames of acoustic data; processing the sequence of modified frames of acoustic data through an acoustic modeling neural network comprising one or more recurrent neural network (rnn) layers and a final ctc output layer to generate a neural network output, wherein processing the sequence of modified frames of acoustic data comprises: subsampling the modified frames of acoustic data; and processing each subsampled modified frame of acoustic data through the acoustic modeling neural network..
Google Llc

In-vehicle speech recognition device and in-vehicle equipment

A speech recognition unit recognizes speech within a preset period. A determination unit determines whether the number of utterers in a vehicle is singular or plural.
Mitsubishi Electric Corporation

Domestic appliance and operating a domestic appliance

A domestic appliance includes a user interface for a user to input commands, a camera for taking an image of an operating area from which the user interface can be operated by the user, a speech recognition device for detecting a speech command, and a control device configured to determine a level of security depending on the image that was taken by the camera and to execute the speech command detected by the speech recognition device depending on the level of security that has been determined.. .
Bsh Hausgerate Gmbh

Apparatus and correcting pronunciation by contextual recognition

Disclosed is an apparatus and method for correcting pronunciation by contextual recognition. The apparatus may include an interface configured to receive, from a speech recognition server, first text data obtained by converting speech data to a text, and a processor configured to extract a keyword from the received first text data, calculate a suitability of a word in the first text data in association with the extracted keyword, and update the first text data to second text data by replacing, with an alternative word, a word in the first text data having a suitability less than a preset reference value..
Linearhub

Splitting utterances for quick responses

Methods, a system, and a classifier are provided. A method includes preparing, by a processor, pairs for an information retrieval task.
International Business Machines Corporation

Information processing device and information processing method

Included are a speech recognition result obtainer that obtains a speech recognition result, which is text data obtained by speech recognition processing, a priority obtainer that obtains priority corresponding to each of a plurality of tasks that are each identified by a plurality of dialog processing based on the speech recognition result; and a dialog processing controller that causes a plurality of devices to perform the distributed execution of the plurality of dialog processing mutually different from each other. The dialog processing controller provides, based on the priority, control information in accordance with a task identified by the distributed execution to an executer that operates based on the control information..
Panasonic Intellectual Property Corporation Of America

Distinguishable open sounds

Systems for speech enabling devices perform methods of configuring distinct open sounds for different devices to indicate to users when each device is recognizing speech. Open sounds are stored both on computer-readable media within a device and on server systems to which devices interface over networks.
Soundhound, Inc.

System and parameterization of speech recognition grammar specification (srgs) grammars

A method includes: loading, by a processor, a grammar specification defining at least one parameterizable grammar including a plurality of rules; setting, by the processor, an initial state of a grammar processor as a current state, the current state including parameters supplied to the rules; selecting, by the processor, a rule of the plurality of rules matching the parameters of the current state of the grammar processor; applying, by the processor, the selected rule to the audio and updating the current state; determining, by the processor, whether termination conditions have been met; in response to determining the termination conditions are not met, selecting, by the processor, from the plurality of rules in accordance with parameters of the updated state; and in response to determining the termination conditions are met, outputting, by the processor, a recognizer result of the current state.. .
Interactive Intelligence Group, Inc.

Recurrent neural network training method, computer program therefor and speech recognition device

[solution] the training method includes a step 220 of initializing the rnn, and a training step 226 of training the rnn by designating a certain vector as a start position and optimizing various parameters to minimize error function. The training step 226 includes: an updating step 250 of updating rnn parameters through truncated bptt using consecutive n (n≥3) vectors having a designated vector as a start point and using a reference value of a tail vector as a correct label; and a first repetition step 240 of repeating the process of executing the training step by newly designating a vector at a position satisfying a prescribed relation with the tail of n vectors used at the updating step until an end condition is satisfied.

Privacy-preserving training corpus selection

The present disclosure relates to training a speech recognition system. A system that includes an automated speech recognizer and receives data from a client device.
Google Llc

Identifying contacts using speech recognition

A system receives candidate strings from a speech recognition engine. Where the speech recognition indicates success, the candidate string may be reported or otherwise used.
Ford Global Technologies, Llc

Method and discovering trending terms in speech requests

Systems and processes are disclosed for discovering trending terms in automatic speech recognition. Candidate terms (e.g., words, phrases, etc.) not yet found in a speech recognizer vocabulary or having low language model probability can be identified based on trending usage in a variety of electronic data sources (e.g., social network feeds, news sources, search queries, etc.).
Apple Inc.

Information processing device, control method, and program

There is provided an information processing device, control method, and program that can improve convenience of a speech recognition system by deciding an appropriate response output method in accordance with a current surrounding environment. A response to a speech from a user is generated, a response output method is decided in accordance with a current surrounding environment, and control is performed such that the generated response is output by using the decided response output method..
Sony Corporation

Speech recognition power management

Power consumption for a computing device may be managed by one or more keywords. For example, if an audio input obtained by the computing device includes a keyword, a network interface module and/or an application processing module of the computing device may be activated.
Amazon Technologies, Inc.

Electronic speech recognition name directory prognostication system

A speech recognizer performs speech recognition on a spoken name supplied by a user, producing a list of possible matches and corresponding confidence scores. If the top scoring match for a spoken name does not correctly identify the spoken name or if the spoken name's confidence score is below a first threshold, the user name is flagged to the system administrator as having a potential speech recognition problem.
Avaya Inc.

System and speech recognition

A method for automated speech recognition includes generating first and second pluralities of candidate speech recognition results corresponding to audio input data using a first general-purpose speech recognition engine and a second domain-specific speech recognition engine, respectively. The method further includes generating a third plurality of candidate speech recognition result including a plurality of words included in one of the first plurality of speech recognition results and at least one word included in another one of the second plurality of speech recognition results, ranking the third plurality of candidate speech recognition results using a pairwise ranker to identify a highest ranked candidate speech recognition result, and operating the automated system using the highest ranked speech recognition result as an input from the user..
Robert Bosch Gmbh

Input support apparatus and computer program product

An input support apparatus of an embodiment includes a template storage unit configured to store a form template that is a template for form data having one or more slots to which item values are input in correspondence with item names, the form template describing item names of the respective slots and alternatives of an alternative type slot in which an item value is selected from a plurality of alternatives together with respective readings thereof; an acquisition unit configured to acquire recognition result data obtained by speech recognition performed on utterance of a user, the recognition result data containing a transcription and a reading; and a determination unit configured to determine the item values to be input to the slots of the form data based on the reading of the recognition result data and the readings of the item names and the alternatives described in the form template.. .
Toshiba Digital Solutions Corporation

Automated software execution using intelligent speech recognition

Methods and apparatuses are described for automated execution of computer software using intelligent speech recognition techniques. A server captures a digitized voice segment from a remote device, the first digitized voice segment corresponding to speech submitted by a user of the remote device during a voice call.
Fmr Llc

Utilization of location and environment to improve recognition

A portable terminal has a network interface that receives a set of instructions having a sequence of at least one location and audio properties associated with the at least one location from a server. An audio circuit receives audio signals picked up by a microphone and processes the audio signals in a manner defined by the audio properties associated with the at least one location.
Vocollect, Inc.

Technologies for improved keyword spotting

Technologies for improved keyword spotting are disclosed. A compute device may capture speech data from a user of the compute device, and perform automatic speech recognition on the captured speech data.
Intel Corporation

System and personalization of acoustic models for automatic speech recognition

Disclosed herein are methods, systems, and computer-readable storage media for automatic speech recognition. The method includes selecting a speaker independent model, and selecting a quantity of speaker dependent models, the quantity of speaker dependent models being based on available computing resources, the selected models including the speaker independent model and the quantity of speaker dependent models.
Nuance Communications, Inc.

System and speech personalization by need

Disclosed herein are systems, computer-implemented methods, and tangible computer-readable storage media for speaker recognition personalization. The method recognizes speech received from a speaker interacting with a speech interface using a set of allocated resources, the set of allocated resources including bandwidth, processor time, memory, and storage.
Nuance Communications, Inc.

Discriminating ambiguous expressions to enhance user experience

Methods and systems are provided for discriminating ambiguous expressions to enhance user experience. For example, a natural language expression may be received by a speech recognition component.
Microsoft Technology Licensing, Llc

Audio signal emulation method and apparatus

Embodiments of the present disclosure provide techniques and configurations for an apparatus for audio signal emulation, based on a vibration signal generated in response to a user's voice. In some embodiments, the apparatus may include at least one sensor disposed on the apparatus to generate a sensor signal indicative of vibration induced by a user's voice in a portion of a user's head.
Intel Corporation

Speaker-dependent voice-activated camera system

A voice-activated camera system for a computing device. The voice-activated camera system includes a processor, a camera module, a speech recognition module and a microphone for accepting user voice input.

Suitability score based on attribute scores

A plurality of attribute scores are calculated for data including audio based on a plurality of acoustic attributes. Each of the attribute scores relate to a detection of one of the acoustic attributes in the data including audio.
Longsand Limited

Distributed environmental microphones to minimize noise during speech recognition

A device, system, and method whereby a speech-driven system used in an industrial environment distinguishes speech obtained from users of the system from other background sounds. In one aspect, the present system and method provides for a first audio stream from a user microphone collocated with a source of human speech (that is, a user) and a second audio stream from a environmental microphone which is proximate to the source of human speech but more remote than the user microphone.
Vocollect, Inc.

Information providing system, information providing method, and computer-readable recording medium

An information providing system (1) has a sound receiver (22) that receives a guidance voice and generates a sound signal (sg); a text identifier (114) that identifies, from among registered texts representing contents of utterances of different guidance voices, a registered text that is similar to an uttered text (l) that represents a content of an utterance of one of the guidance voices (v), the uttered text having been obtained by analyzing the sound signal (sg) by use of speech recognition, and a sound outputter (26) that transmits distribution information (d) that indicates the registered text identified by the text identifier (114) to a terminal device capable of presenting to a user (u) guidance information (g) corresponding to the distribution information (d) from among pieces of guidance information (g) that correspond to the respective guidance voices.. .
Yamaha Corporation

Speaker recognition in the call center

Utterances of at least two speakers in a speech signal may be distinguished and the associated speaker identified by use of diarization together with automatic speech recognition of identifying words and phrases commonly in the speech signal. The diarization process clusters turns of the conversation while recognized special form phrases and entity names identify the speakers.
Pindrop Security, Inc.

Speech recognition method and speech recognition apparatus

A speech recognition method in a system is provided that controls one or more devices by using speech recognition. The method includes obtaining speech information representing speech spoken by a user, and determining whether the speech is spoken to the one or more devices.
Panasonic Intellectual Property Corporation Of America

Networked audible and visual alarm light system and method with voice command control and base station having alarm for smoke, carbon monoxide and gas

A networked visual and audible alarm light system and method with voice command control and base station having alarm for smoke, carbon monoxide, and gas provides illuminating leds, audible alerts, a base control, and a voice command control. The system detects and alerts to smoke, carbon monoxide, and gas.

Syntactic re-ranking of potential transcriptions during automatic speech recognition

A system and method for syntactic re-ranking of possible transcriptions generated by automatic speech recognition are disclosed. A computer system accesses acoustic data for a recorded spoken language and generates a plurality of potential transcriptions for the acoustic data.
Intel Corporation

Centered, left- and right-shifted deep neural networks and their combinations

Deep neural networks (dnn) are time shifted relative to one another and trained. The time-shifted networks may then be combined to improve recognition accuracy.
Apptek, Inc.

Detecting customers with low speech recognition accuracy by investigating consistency of conversation in call-center

Methods and a system are provided for estimating automatic speech recognition (asr) accuracy. A method includes obtaining transcriptions of utterances in a conversation over two channels.
International Business Machines Corporation

Secure nonscheduled video visitation system

Described are methods and systems in which the censorship and supervision tasks normally performed by secured facility personnel are augmented or automated entirely by a secure nonscheduled video visitation system. In embodiments, the secure nonscheduled video visitation system performs voice biometrics, speech recognition, non-verbal audio classification, fingerprint and other biometric authentication, image object classification, facial recognition, body joint location determination analysis, and/or optical character recognition on the video visitation data.
Global Tel*link Corporation

Speech recognition and system for determining the status of an answered telephone during the course of an outbound telephone call

A system for determining the status of an answered telephone during the course of an outbound telephone call includes an automated telephone calling device for placing a telephone call to a location having a telephone number at which a target person is listed, upon the telephone call being answered, initiating a prerecorded greeting which asks for the target person and receiving a spoken response from an answering person and a speech recognition device for performing a speech recognition analysis on the spoken response to determine a status of the spoken response. If the speech recognition device determines that the answering person is the target person, the speech recognition device initiates a speech recognition application with the target person..
Eliza Corporation

Smart phone with a text recognition module

A portable device can transmit information through one of a mobile phone network and an internet, wherein the portable device includes a text-based communication module to allow a user may synchronously transmit or receive data through a local area network, wherein the data is text, audio, video or the combination thereof. The text-based communication module of the portable device includes a text-to-speech recognition module used to convert a text data for outputting the text data by vocal, and a read determination module for determining read target terminals and unread target terminals when a user of the portable phone device activates the read determination module..

Speech recognition

An optical microphone arrangement comprises: an array of optical microphones (4) on a substrate (8), each of said optical microphones (4) providing a signal indicative of displacement of a respective membrane (24) as a result of an incoming audible sound; at first processor (12) arranged to receive said signals from said optical microphones (4) and to perform a first processing step on said signals to produce a first output; and a second processor (14) arranged to receive at least one of said signals or said first output; wherein at least said second processor (14) determines presence of at least one element of human speech from said audible sound.. .
Sintef Tto As

Method for microphone selection and multi-talker segmentation with ambient automated speech recognition (asr)

Disclosed methods and systems are directed to determining a best microphone pair and segmenting sound signals. The methods and systems may include receiving a collection of sound signals comprising speech from one or more audio sources (e.g., meeting participants) and/or background noise.
Nuance Communications, Inc.

Speech recognition system and method

A speech recognition method capable of automatic generation of phones according to the present invention includes: unsupervisedly learning a feature vector of speech data; generating a phone set by clustering acoustic features selected based on an unsupervised learning result; allocating a sequence of phones to the speech data on the basis of the generated phone set; and generating an acoustic model on the basis of the sequence of phones and the speech data to which the sequence of phones is allocated.. .
Electronics And Telecommunications Research Institute

Remote speech recognition at a vehicle

A system and method of using remote speech recognition at a vehicle includes: receiving speech at the vehicle from a vehicle occupant; determining a wireless quality of service between the vehicle and a speech processing remote facility; transmitting the received speech to the remote speech processing facility when the wireless quality of service is above a threshold; and processing the received speech at the vehicle when the wireless quality of service is below the threshold.. .
Gm Global Technology Operations Llc

Technologies for end-of-sentence detection using syntactic coherence

Technologies for detecting an end of a sentence in automatic speech recognition are disclosed. An automatic speech recognition device may acquire speech data, and identify phonemes and words of the speech data.
Intel Corporation

Food recognition using visual analysis and speech recognition

A method and system for analyzing at least one food item on a food plate is disclosed. A plurality of images of the food plate is received by an image capturing device.
Sri International

Visually-impaired-accessible building safety system

Building safety systems, methods, and mediums are provided. A method includes receiving a voice input by the building safety system.
Siemens Industry, Inc.

Information processing device, control method, and program

There is provided an information processing device, control method, and program that can improve convenience of a speech recognition system by outputting appropriate responses to respective users when the plurality of users are talking, the information processing device including: a response generation unit configured to generate responses to speeches from a plurality of users; a decision unit configured to decide methods of outputting the responses to the respective users on the basis of priorities according to order of the speeches from the plurality of users; and an output control unit configured to perform control such that the generated responses are output by using the decided methods of outputting the responses.. .
Sony Corporation

Enhanced multi-channel acoustic models

This specification describes computer-implemented methods and systems. One method includes receiving, by a neural network of a speech recognition system, first data representing a first raw audio signal and second data representing a second raw audio signal.
Google Inc.

Systems and methods for adaptive proper name entity recognition and understanding

Various embodiments contemplate systems and methods for performing automatic speech recognition (asr) and natural language understanding (nlu) that enable high accuracy recognition and understanding of freely spoken utterances which may contain proper names and similar entities. The proper name entities may contain or be comprised wholly of words that are not present in the vocabularies of these systems as normally constituted.
Promptu Systems Corporation

Method and system of automatic speech recognition using posterior confidence scores

A system, article, and method include techniques of automatic speech recognition using posterior confidence scores.. .
Intel Ip Corporation

Apparatus and training a neural network language model, speech recognition apparatus and method

According to one embodiment, an apparatus trains a neural network language model. The apparatus includes a calculating unit and a training unit.
Kabushiki Kaisha Toshiba

System and automated evaluation of transcription quality

Systems and methods automatedly evaluate a transcription quality. Audio data is obtained.
Verint Systems Ltd.

Information presentation method, non-transitory recording medium storing thereon computer program, and information presentation system

Provided are an information presentation method, a non-transitory recording medium storing thereon a computer program, and an information presentation system. A speech recognition unit performs speech recognition on speech pertaining to a dialogue and thereby generates dialogue text, a translation unit translates the dialogue text and thereby generates translated dialogue text, and a speech waveform synthesis unit performs speech synthesis on the translated dialogue text and thereby generates translated dialogue speech.
Panasonic Intellectual Property Management Co., Ltd.

Generalized phrases in automatic speech recognition systems

A method for generating a suggested phrase having a similar meaning to a supplied phrase in an analytics system includes: receiving, on a computer system comprising a processor and memory storing instructions, the supplied phrase, the supplied phrase including one or more terms; identifying, on the computer system, a term of the phrase belonging to a semantic group; generating the suggested phrase using the supplied phrase and the semantic group; and returning the suggested phrase.. .
Genesys Telecommunications Laboratories, Inc.

Hotword detection on multiple devices

Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for hotword detection on multiple devices are disclosed. In one aspect, a method includes the actions of receiving audio data that corresponds to an utterance.
Google Inc.

System and voice actuated configuration of a controlling device

A speech recognition engine is provided voice data indicative of at least a brand of a target appliance. The speech recognition engine uses the voice data indicative of at least a brand of the target appliance to identify within a library of codesets at least one codeset that is cross-referenced to the brand of the target appliance.
Universal Electronics Inc.

Speech recognition method and apparatus based on speaker recognition

A speech recognition method and an apparatus which recognize speech, based on speaker recognition, and output a result of the speech recognition are provided. The speech recognition method includes activating a session for receiving an input of an audio signal, performing speech recognition on a speech signal detected from the input audio signal while the session is maintained, determining whether a speaker of the speech signal is a registered speaker based on speaker information generated from the speech signal, determining whether to maintain the session based on a result of the determination, and outputting a result of performing the speech recognition..
Samsung Electronics Co., Ltd.

Automatic speech recognition (asr) utilizing gps and sensor data

An automatic speech recognition (asr) system is disclosed that compensates for different noise environments and types of speech. The asr system may be implemented as part of an action camera that collects status data, such as geographic location data and/or sensor data.
Garmin Switzerland Gmbh

Spoken utterance stop event other than pause or cessation in spoken utterances stream

Speech recognition of a stream of spoken utterances is initiated. Thereafter, a spoken utterance stop event to stop the speech recognition is detected, such as in in relation to the stream.
Lenovo Enterprise Solutions (singapore) Pte. Ltd.

Speech recognition method and apparatus

The present application discloses speech recognition methods and apparatuses. An exemplary method may include extracting, via a first neural network, a vector containing speaker recognition features from speech data.
Alibaba Group Holding Limited

Apparatus and training a neural network auxiliary model, speech recognition apparatus and method

According to one embodiment, an apparatus trains a neural network auxiliary model used to calculate a normalization factor of a neural network language model. The apparatus includes a calculating unit and a training unit.
Kabushiki Kaisha Toshiba

System, method and computer program product for extracting user profiles and habits based on speech recognition and calling history for telephone system advertising

A system, method and computer program product for providing targeted messages to a person using telephony services by generating user profile information from telephony data and using the user profile information to retrieve targeted messages.. .
Iii Holdings 1, Llc

User interface for dictation application employing automatic speech recognition

In an automatic speech recognition (asr) dictation application, a user interface may be provided for informing a user how to dictate desired text. Input may be received from the user of the dictation application, specifying a desired text sequence.
Nuance Communications, Inc.

Intelligent switch and intelligent home system using the same

Disclosed is an intelligent switch and an intelligent home system using the same. The intelligent switch includes a main control chip, a relay, a wi-fi communication module, an ac-dc power module, a sound collection module and an audio playback module.
Shenzhen Xinguodu Technology Co., Ltd

Interaction and management of devices using gaze detection

User gaze information, which may include a user line of sight, user point of focus, or an area that a user is not looking at, is determined from user body, head, eye and iris positioning. The user gaze information is used to select a context and interaction set for the user.
Microsoft Technology Licensing, Llc

Method and apparatus to improve speech recognition in a high audio noise environment

A method improves speech recognition using a device located in proximity to a machine emitting high levels of audio noise. The microphone of the device receives the audio noise emitted by the machine and the speech emitted by a user and generates a composite signal.
Vocollect, Inc.

Method and speech recognition using device usage pattern of user

A method and apparatus for improving the performance of voice recognition in a mobile device are provided. The method of recognizing a voice includes: monitoring the usage pattern of a user of a device for inputting a voice; selecting predetermined words from among words stored in the device based on the result of monitoring, and storing the selected words; and recognizing a voice based on an acoustic model and predetermined words.
Samsung Electronics Co., Ltd.

Speech recognition system, speech recognition device, speech recognition method, and control program

A voice recognition device includes a storage, a voice recognizer, and a reject information generator. The storage stores reject information for use in specifying a voice.
Panasonic Intellectual Property Management Co., Ltd.

Language models using domain-specific model components

Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for language models using domain-specific model components. In some implementations, context data for an utterance is obtained.
Google Inc.

Multi-accent speech recognition

Methods, systems, and apparatus, including computer programs encoded on computer storage media for training a hierarchical recurrent neural network (hrnn) having a plurality of parameters on a plurality of training acoustic sequences to generate phoneme representations of received acoustic sequences. One method includes, for each of the received training acoustic sequences: processing the received acoustic sequence in accordance with current values of the parameters of the hrnn to generate a predicted grapheme representation of the received acoustic sequence; processing an intermediate output generated by an intermediate layer of the hrnn during the processing of the received acoustic sequence to generate one or more predicted phoneme representations of the received acoustic sequence; and adjusting the current values of the parameters of the hrnn based at (i) the predicted grapheme representation and (ii) the one or more predicted phoneme representations..
Google Inc.

Training of front-end and back-end neural networks

Methods, systems, and computer programs are provided for training a front-end neural network (“front-end nn”) and a back-end neural network (“back-end nn”). The method includes: combining the back-end nn with the front-end nn so that an output layer of the front-end nn is also an input layer of the back-end nn to form a joint layer to thereby generate a combined nn; and training the combined nn for a speech recognition with a set of utterances as training data, a plurality of specific units in the joint layer being dropped during the training and the plurality of the specific units corresponding to one or more common frequency bands.
International Business Machines Corporation

Conversational chatbot for translated speech conversations

A server includes a processor and memory, a network interface, and a first application executed by the processor and memory. The first application is configured to receive an input in a first language based on a call received via the network interface by a voice over internet protocol (voip) application executed by the server.
Microsoft Technology Licensing, Llc

Scripting support for data identifiers, voice recognition and speech in a telnet session

Methods of adding data identifiers and speech/voice recognition functionality are disclosed. A telnet client runs one or more scripts that add data identifiers to data fields in a telnet session.
Crimson Corporation

Training deep neural network for acoustic modeling in speech recognition

A method is provided for training a deep neural network (dnn) for acoustic modeling in speech recognition. The method includes reading central frames and side frames as input frames from a memory.
International Business Machines Corporation

Voice print identification portal

Systems and methods providing for a dual use voice analysis system are disclosed herein. Speech recognition is achieved by comparing characteristics of words spoken by a speaker to one or more templates of human language words.

Modeling a class posterior probability of context dependent phonemes in a speech recognition system

What is disclosed is a system and method for modelling a class posterior probability of context dependent phonemes in a speech recognition system. A representation network is trained by projecting a n-dimensional feature vector into g intermediate layers of nodes.
Conduent Business Services, Llc

Hybrid phoneme, diphone, morpheme, and word-level deep neural networks

An approach of hybrid frame, phone, diphone, morpheme, and word-level deep neural networks (dnn) in model training and applications is described. The approach can be applied to many applications.
Apptek, Inc.

Data processing method and live broadcasting

Data processing methods, live broadcasting methods and devices are disclosed. An example data processing method may comprise converting audio and video data into broadcast data in a predetermined format, and performing speech recognition on audio data in the audio and video data, and adding the text information obtained from speech recognition into the broadcast data.
Alibaba Group Holding Limited

Method and identifying acoustic background environments based on time and speed to enhance automatic speech recognition

Disclosed are systems, methods, and computer readable media for identifying an acoustic environment of a caller. The method embodiment comprises analyzing acoustic features of a received audio signal from a caller, receiving meta-data information based on a previously recorded time and speed of the caller, classifying a background environment of the caller based on the analyzed acoustic features and the meta-data, selecting an acoustic model matched to the classified background environment from a plurality of acoustic models, and performing speech recognition as the received audio signal using the selected acoustic model..
Nuance Communications, Inc.

Neural network based acoustic models for speech recognition by grouping context-dependent targets

Methods and systems for training a neural network include identifying weights in a neural network between a final hidden neuron layer and an output neuron layer that correspond to state matches between a neuron of the final hidden neuron layer and a respective neuron of the output neuron layer. The identified weights are initialized to a predetermined non-zero value and initializing other weights between the final hidden neuron layer and the output neuron layer to zero.
International Business Machines Corporation

Knowledge sharing based on meeting information

Features are disclosed for automatically facilitating knowledge sharing using information collected during meetings. Collected information may include both the content and context of a meeting.
Audible, Inc.

System and speech-enabled access to media content by a ranked normalized weighted graph using speech recognition

Disclosed herein are systems, methods, and computer-readable storage media for generating a speech recognition model for a media content retrieval system. The method causes a computing device to retrieve information describing media available in a media content retrieval system, construct a graph that models how the media are interconnected based on the retrieved information, rank the information describing the media based on the graph, and generate a speech recognition model based on the ranked information.
Nuance Communications, Inc.

Distinguishing user speech from background speech in speech-dense environments

A device, system, and method whereby a speech-driven system can distinguish speech obtained from users of the system from other speech spoken by background persons, as well as from background speech from public address systems. In one aspect, the present system and method prepares, in advance of field-use, a voice-data file which is created in a training environment.
Vocollect, Inc.

Speech recognition method, speech wakeup apparatus, speech recognition apparatus, and terminal

Embodiments of the present invention provide a speech recognition method and a terminal. The method includes: listening, by a speech wakeup apparatus, to speech information in a surrounding environment; when determining that the speech information obtained by listening matches a speech wakeup model, buffering, by the speech wakeup apparatus, speech information, of first preset duration, obtained by listening, and sending a trigger signal for triggering enabling of a speech recognition apparatus, where the trigger signal is used to instruct the speech recognition apparatus to read and recognize the speech information buffered by the speech wakeup apparatus; and recognizing first speech information buffered by the speech wakeup apparatus and the second speech information obtained by listening, to obtain a recognition result..
Huawei Technologies Co., Ltd.

Information processing system and information processing method

[object] it is desirable to provide a technology capable of flexibly starting speech recognition processing in accordance with a situation. [solution] provided is an information processing system including: an output controller that causes an output portion to output a start condition for speech recognition processing to be performed by a speech recognition portion on sound information input from a sound collecting portion, in which the output controller dynamically changes the start condition for the speech recognition processing to be output from the output portion..
Sony Corporation

Speech recognition transformation system

A speech recognition method may include preprocessing a first signal to generate a second signal, where the first signal corresponds to an audio signal that includes at least one voice audio signal generated by a speaker, extracting a feature point associated with the second signal and converting the second signal into a third signal by converting the feature point using a transformation model, applying a recognition model to the third signal to recognize a voice language corresponding to the at least one voice audio signal; and generating a recognition result output including information indicating the recognized language.. .
Samsung Electronics Co., Ltd.

Acoustic model training using corrected terms

Methods, systems, and apparatus, including computer programs encoded on computer storage media, for speech recognition. One of the methods includes receiving first audio data corresponding to an utterance; obtaining a first transcription of the first audio data; receiving data indicating (i) a selection of one or more terms of the first transcription and (ii) one or more of replacement terms; determining that one or more of the replacement terms are classified as a correction of one or more of the selected terms; in response to determining that the one or more of the replacement terms are classified as a correction of the one or more of the selected terms, obtaining a first portion of the first audio data that corresponds to one or more terms of the first transcription; and using the first portion of the first audio data that is associated with the one or more terms of the first transcription to train an acoustic model for recognizing the one or more of the replacement terms..
Google Inc.

Validating provided information in a conversation

For validating information provided in a conversation, apparatus, methods, and program products are disclosed. The apparatus includes an association module that associates a plurality of items of caller identification data with a caller, an information module that identifies, using a speech recognition application, caller information from speech of the caller during a telephonic conversation with a call recipient, a comparison module that compares the plurality of items of caller identification data with the caller information, and a validation module that calculates a confidence score based on the comparison of the plurality of items of caller identification data with the caller information and presents, to the call recipient, the confidence score..
Lenovo Enterprise Solutions (singapore) Pte. Ltd.

Circuit and speech recognition

The invention concerns a circuit for speech recognition comprising: a voice detection circuit configured to detect, based on at least one input parameter, the presence of a voice signal in an input audio signal and to generate an activation signal on each voice detection event; a speech recognition circuit configured to be activated by the activation signal and to perform speech recognition on the input audio signal, the speech recognition circuit being further configured to generate an output signal indicating, based on the speech recognition, whether each voice detection event is true or false; and an analysis circuit configured to generate, based on the output signal of the speech recognition circuit, a control signal for modifying one or more of said input parameters.. .
Dolphin Integration

Audio-visual speech recognition with scattering operators

Aspects described herein are directed towards methods, computing devices, systems, and computer-readable media that apply scattering operations to extracted visual features of audiovisual input to generate predictions regarding the speech status of a subject. Visual scattering coefficients generated according to one or more aspects described herein may be used as input to a neural network operative to generate the predictions regarding the speech status of the subject.
Nuance Communications, Inc.

System and enhancing speech recognition accuracy using weighted grammars based on user profile including demographic, account, time and date information

Disclosed herein are systems, computer-implemented methods, and computer-readable media for enhancing speech recognition accuracy. The method includes dividing a system dialog turn into segments based on timing of probable user responses, generating a weighted grammar for each segment, exclusively activating the weighted grammar generated for a current segment of the dialog turn during the current segment of the dialog turn, and recognizing user speech received during the current segment using the activated weighted grammar generated for the current segment.
Nuance Communications, Inc.

Automatic speech recognition using multi-dimensional models

Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for automatic speech recognition using multi-dimensional models. In some implementations, audio data that describes an utterance is received.
Google Inc.

Optimizations to decoding of wfst models for automatic speech recognition

A method in a computing device for decoding a weighted finite state transducer (wfst) for automatic speech recognition is described. The method includes sorting a set of one or more wfst arcs based on their arc weight in ascending order.
Intel Corporation

Led light bulb, lamp fixture with self-networking intercom, system and method therefore

A networked light for illumination and intercom for communications in a single housing, with voice command and control, hands-free. The system in a housing configured to conventional looking lamp, bulb, fixture, lighting devices, suitable for a direct replacement of conventional illuminating devices typical found in homes or buildings.
Athena Patent Development Llc.

Call forwarding to unavailable party based on artificial intelligence

A called party indicates that he or she is unavailable to receive a call. However, by way of a combination or any one of determining aspects of the who the caller is, where the caller is located, what he is speaking about, or the like as well as comparing this to prior calls, the call might be sent to a called party to be on the call.
Circle River, Inc.

Techniques to provide a standard interface to a speech recognition platform

Techniques and systems to provide speech recognition services over a network using a standard interface are described. In an embodiment, a technique includes accepting a speech recognition request that includes at least audio input, via an application program interface (api).
Microsoft Technology Licensing, Llc

Systems and methods for automatic repair of speech recognition engine output

Text output of speech recognition engines tend to be erroneous when spoken data has domain specific terms. The present disclosure facilitates automatic correction of errors in speech to text conversion using abstractions of evolutionary development and artificial development.
Tata Consultancy Services Limited

Language merge

Systems and methods are described for processing and interpreting audible commands spoken in one or more languages. Speech recognition systems disclosed herein may be used as a stand-alone speech recognition system or comprise a portion of another content consumption system.
Comcast Cable Communications, Llc

Method and system for providing captioned telephone service with automated speech recognition

Embodiments of the present invention are directed to methods for providing captioned telephone service. One method includes initiating a first captioned telephone service call.
Clearcaptions, Llc

Phonetic posteriorgrams for many-to-one voice conversion

A method for converting speech using phonetic posteriorgrams (ppgs). A target speech is obtained and a ppg is generated based on acoustic features of the target speech.
The Chinese University Of Hong Kong

System and methods for pronunciation analysis-based non-native speaker verification

A system and method for non-native speaker verification based on using n-best speech recognition results.. .

System and methods for pronunciation analysis-based speaker verification

A system and method for speaker verification based on using n-best speech recognition results.. .

Call management system and its speech recognition control method

A speech recognition server has a speech recognition engine, and a mode control table to hold a speech recognition mode for each call. The speech recognition engine has a mode management unit to designate a speech recognition mode for a decoder, and an output analysis unit to analyze recognition result data speech-to-text converted by speech recognition.
Hitachi Information & Telecommunication Engineering, Ltd.

Architecture for multi-domain natural language processing

Features are disclosed for processing a user utterance with respect to multiple subject matters or domains, and for selecting a likely result from a particular domain with which to respond to the utterance or otherwise take action. A user utterance may be transcribed by an automatic speech recognition (“asr”) module, and the results may be provided to a multi-domain natural language understanding (“nlu”) engine.
Amazon Technologies, Inc.

Keyword detection modeling using contextual information

Features are disclosed for detecting words in audio using contextual information in addition to automatic speech recognition results. A detection model can be generated and used to determine whether a particular word, such as a keyword or “wake word,” has been uttered.
Amazon Technologies, Inc.

Selecting alternates in speech recognition

Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for selecting alternates in speech recognition. In some implementations, data is received that indicates multiple speech recognition hypotheses for an utterance.
Google Inc.

Enhanced speech endpointing

Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for receiving audio data including an utterance, obtaining context data that indicates one or more expected speech recognition results, determining an expected speech recognition result based on the context data, receiving an intermediate speech recognition result generated by a speech recognition engine, comparing the intermediate speech recognition result to the expected speech recognition result for the audio data based on the context data, determining whether the intermediate speech recognition result corresponds to the expected speech recognition result for the audio data based on the context data, and setting an end of speech condition and providing a final speech recognition result in response to determining the intermediate speech recognition result matches the expected speech recognition result, the final speech recognition result including the one or more expected speech recognition results indicated by the context data.. .
Google Inc.

Speech recognition

A speech recognition method includes clustering feature vectors of training data to obtain clustered feature vectors of training data performing interpolation calculation on feature vectors of data to be recognized using the clustered feature vectors of training data, and inputting the feature vectors of data to be recognized after the interpolation calculation into a speech recognition model to adaptively adjust the speech recognition model. The techniques of the present disclosure improve speech recognition accuracy and adaptive processing efficiency..
Alibaba Group Holding Limited

Device including speech recognition function and recognizing speech

A device including a speech recognition function which recognizes speech from a user, includes: a loudspeaker which outputs speech to a space; a microphone which collects speech in the space; a first speech recognition unit which recognizes the speech collected by the microphone; a command control unit which issues a command for controlling the device, based on the speech recognized by the first speech recognition unit; and a control unit which prohibits the command issuance unit from issuing the command, based on the speech to be output from the loudspeaker.. .
Socionext Inc.