Follow us on Twitter
twitter icon@FreshPatents


Speech Recognition patents

      

This page is updated frequently with new Speech Recognition-related patent applications.

SALE: 220+ Speech Recognition-related patent PDFs



 Electronic device and  voice command processing therefor patent thumbnailElectronic device and voice command processing therefor
Provided are an electronic device and method of voice command processing therefor. The electronic device may include: a housing having a surface; a display disposed in the housing and exposed through the surface; an audio input interface comprising audio input circuitry disposed in the housing; an audio output interface comprising audio output circuitry disposed in the housing; at least one wireless communication circuit disposed in the housing and configured to select one of plural communication protocols for call setup; a processor disposed in the housing and electrically connected with the display, the audio input interface, the audio output interface, the at least one wireless communication circuit, and a codec; and a memory electrically connected with the processor.
Samsung Electronics Co., Ltd.


 Re-recognizing speech with external data sources patent thumbnailRe-recognizing speech with external data sources
Methods, including computer programs encoded on a computer storage medium, for improving speech recognition based on external data sources. In one aspect, a method includes obtaining an initial candidate transcription of an utterance using an automated speech recognizer and identifying, based on a language model that is not used by the automated speech recognizer in generating the initial candidate transcription, one or more terms that are phonetically similar to one or more terms that do occur in the initial candidate transcription.
Google Inc.


 Hybridized client-server speech recognition patent thumbnailHybridized client-server speech recognition
A recipient computing device can receive a speech utterance to be processed by speech recognition and segment the speech utterance into two or more speech utterance segments, each of which can be to one of a plurality of available speech recognizers. A first one of the plurality of available speech recognizers can be implemented on a separate computing device accessible via a data network.
Speak With Me, Inc.


 Information processing device,  information processing, and program patent thumbnailInformation processing device, information processing, and program
There is provided an information processing device technology that enables an improvement in precision of sound recognition processing based on collected sound information, the information processing device including: a recognition controller that causes a speech recognition processing portion to execute sound recognition processing based on collected sound information obtained by a sound collecting portion; and an output controller that generates an output signal to output a recognition result obtained through the sound recognition processing. The output controller causes an output portion to output an evaluation result regarding a type of sound based on the collected sound information prior to the recognition result..
Sony Corporation


 Motor vehicle operating device with a correction strategy for voice recognition patent thumbnailMotor vehicle operating device with a correction strategy for voice recognition
The invention relates to a method for operating a motor vehicle, wherein a first speech input of a user is received, at least one recognition result (a-d) is determined by means of a speech recognition system, at least one recognition result (a-d) is output to an output device of the motor vehicle as a result list and a second speech output of the user is received. The objective of the invention is to avoid a double input of false recognition results.
Audi Ag


 Method of searching for multimedia image patent thumbnailMethod of searching for multimedia image
The present invention provides a method of searching for multimedia image. When a camera and a microphone are used for recording an image file, a software of speech recognition is used for converting the speech recorded by the microphone into a text file, and then the image file and the text file are combined to form a folder for storing into a data base.
National Taipei University Of Technology


 Speech recognition  automated driving patent thumbnailSpeech recognition automated driving
Methods and systems are provided for processing speech for a vehicle having at least one autonomous vehicle system. In one embodiment, a method includes: receiving, by a processor, context data generated by an autonomous vehicle system; receiving, by a processor, a speech utterance from a user interacting with the vehicle; processing, by a processor, the speech utterance based on the context data; and selectively communicating, by a processor, at least one of a dialog prompt to the user and a control action to the autonomous vehicle system based on the context data..
Gm Global Technology Operations Llc


 System and  improving speech recognition using context patent thumbnailSystem and improving speech recognition using context
A system and method are provided for improving speech recognition accuracy. Contextual information about user speech may be received, and then speech recognition analysis can be performed on the user speech using the contextual information.
Paypal, Inc.


 Method and system for training language models to reduce recognition errors patent thumbnailMethod and system for training language models to reduce recognition errors
A method and for training a language model to reduce recognition errors, wherein the language model is a recurrent neural network language model (rnnlm) by first acquiring training samples. An automatic speech recognition system (asr) is appled to the training samples to produce recognized words and probabilites of the recognized words, and an n-best list is selected from the recognized words based on the probabilities.
Mitsubishi Electric Research Laboratories, Inc.


 Improved fixed point integer implementations for neural networks patent thumbnailImproved fixed point integer implementations for neural networks
Techniques related to implementing neural networks for speech recognition systems are discussed. Such techniques may include processing a node of the neural network by determining a score for the node as a product of weights and inputs such that the weights are fixed point integer values, applying a correction to the score based a correction value associated with at least one of the weights, and generating an output from the node based on the corrected score..
Intel Corporation


Method for launching web search on handheld computer

A method for remotely launching web search on a smartphone is disclosed. The method includes the steps of: a) wirelessly connecting a remote control with a microphone to the smartphone; b) opening a searching text input box; c) enabling the searching text input box with a voice ime (input method editor); d) speaking a word to be searched to the remote control; e) sending the voiced word to the handheld computer; f) transmitting the voiced word to a search engine with a speech recognition function through internet; and g) the search engine transmitting a search result to the smartphone..
I/o Interconnect, Ltd.

Vehicle and control the vehicle

A vehicle connected to a terminal of a user to perform dialing includes a speech input unit that receives a speech of the user, a speech recognition unit that recognizes a command included in the received speech, a control unit that determines a preparation state for performing the dialing, and a display unit that displays information about the preparation state when preparation for performing the dialing is not completed.. .
Hyundai Motor Company

Electronic device and speech recognition method thereof

An electronic device and a speech recognition method that is capable of adjusting an end-of-utterance detection period dynamically are disclosed. The electronic device includes a microphone, a display, an input device formed as a part of the display or connected to the electronic device as a separate device, a processor electrically connected to the microphone, the display, and the input device, and a memory electrically connected to the processor.
Samsung Electronics Co., Ltd.

Methods and speech segmentation using multiple metadata

Methods and apparatus to process microphone signals by a speech enhancement module to generate an audio stream signal including first and second metadata for use by a speech recognition module. In an embodiment, speech recognition is performed using endpointing information including transitioning from a silence state to a maybe speech state, in which data is buffered, based on the first metadata and transitioning to a speech state, in which speech recognition is performed, based upon the second metadata..
Nuance Communications, Inc.

Acoustic and domain based speech recognition for vehicles

A processor of a vehicle speech recognition system recognizes speech via domain-specific language and acoustic models. The processor further, in response to the acoustic model having a confidence score for recognized speech falling within a predetermined range defined relative to a confidence score for the domain-specific language model, recognizes speech via the acoustic model only..
Ford Global Technologies, Llc

Dynamic acoustic model switching to improve noisy speech recognition

An automatic speech recognition system for a vehicle includes a controller configured to select an acoustic model from a library of acoustic models based on ambient noise in a cabin of the vehicle and operating parameters of the vehicle. The controller is further configured to apply the selected acoustic model to noisy speech to improve recognition of the speech..
Ford Global Technologies, Llc

System and personalization in speech recognition

Systems, methods, and computer-readable storage devices are for identifying a user profile for speech recognition. The user profile is selected from one of several user profiles which are all associated with a speaker, and can be selected based on the identity of the speaker, the location of the speaker, the device the speaker is using, or other relevant parameters.
At&t Intellectual Property I, L.p.

Systems and methods for engaging an audience in a conversational advertisement

A system and method are described for engaging an audience in a conversational advertisement. A conversational advertising system converses with an audience using spoken words.
Nuance Communications, Inc.

Encoding and adaptive, scalable accessing of distributed models

Systems, methods, and apparatus for accessing distributed models in automated machine processing, including using large language models in machine translation, speech recognition and other applications.. .
Google Inc.

Semiautomated relay method and apparatus

A captioning system comprising a processor and a memory having stored thereon software such that, when the software is executed by the one or more processors, the system generates text captions from speech data, including at least the following, receiving, from a hearing user's (hu's) device, an hu's speech data, generating, at the one or more hardware processors, first text captions from the speech data using a speech recognition algorithm, automatically determining, at the one or more processors, whether the generated first text captions meet a first accuracy threshold and when the first text captions meet the first accuracy threshold, sending the first text captions to an assisted user's (au's) device for display, when the first text captions do not meet the first accuracy threshold, generating, at the one or more processors, second text captions from the speech data based on user input to the speech recognition algorithm from a call assistant and sending the second text captions to the au's device for display.. .
Ultratec, Inc.

Speech recognition method and apparatus using device information

A speech recognition method includes: storing at least one acoustic model (am); obtaining, from a device located outside the asr server, a device id for identifying the device; obtaining speech data from the device; selecting an am based on the device id; performing speech recognition on the speech data by using the selected am; and outputting a result of the speech recognition.. .
Samsung Electronics Co., Ltd.

User configurable speech commands

A speech recognition method and system enables user-configurable speech commands. For a given speech command, the speech recognition engine provides a mechanism for the end-user to select speech command terms to use in substitution for the given speech command.
Kopin Corporation

Systems and methods for assisting automatic speech recognition

Systems and methods for assisting automatic speech recognition (asr) are provided. An example method includes generating, by a mobile device, a plurality of instantiations of a speech component in a captured audio signal, each instantiation of the plurality of instantiations being in support of a particular hypothesis regarding the speech component.
Knowles Electronics, Llc

Apparatus and recognizing speech

A speech recognition apparatus based on a deep-neural-network (dnn) sound model includes a memory and a processor. As the processor executes a program stored in the memory, the processor generates sound-model state sets corresponding to a plurality of pieces of set training speech data included in multi-set training speech data, generates a multi-set state cluster from the sound-model state sets, and sets the multi-set training speech data as an input node and the multi-set state cluster as output nodes so as to learn a dnn structured parameter..
Electronics And Telecommunications Research Institute

Speaker-adaptive speech recognition

(b) providing the test-speaker-specific adaptive system comprising the input network component, the trained test-speaker-specific adaptive model component, and the speaker-adaptive output network.. .

Material selection for language model customization in speech recognition for speech analytics

A method for extracting, from non-speech text, training data for a language model for speech recognition includes: receiving, by a processor, non-speech text; selecting, by the processor, text from the non-speech text; converting, by the processor, the selected text to generate converted text comprising a plurality of phrases consistent with speech transcription text; training, by the processor, a language model using the converted text; and outputting, by the processor, the language model.. .
Genesys Telecommunications Laboratories, Inc.

Language model customization in speech recognition for speech analytics

A method for generating a language model for an organization includes: receiving, by a processor, organization-specific training data; receiving, by the processor, generic training data; computing, by the processor, a plurality of similarities between the generic training data and the organization-specific training data; assigning, by the processor, a plurality of weights to the generic training data in accordance with the computed similarities; combining, by the processor, the generic training data with the organization-specific training data in accordance with the weights to generate customized training data; training, by the processor, a customized language model using the customized training data; and outputting, by the processor, the customized language model, the customized language model being configured to compute the likelihood of phrases in a medium.. .
Genesys Telecommunications Laboratories, Inc.

Predicting recognition quality of a phrase in automatic speech recognition systems

A method for predicting a speech recognition quality of a phrase comprising at least one word includes: receiving, on a computer system including a processor and memory storing instructions, the phrase; computing, on the computer system, a set of features comprising one or more features corresponding to the phrase; providing the phrase to a prediction model on the computer system and receiving a predicted recognition quality value based on the set of features; and returning the predicted recognition quality value.. .
Genesys Telecommunications Laboratories, Inc.

Safety system and method

A system and method are described. The system utilizes data entry devices commonly found in some workplaces, such as warehouses, to generate an emergency signal.
Hand Held Products, Inc.

Apparatus and verifying utterance in speech recognition system

An apparatus and method for verifying an utterance based on multi-event detection information in a natural language speech recognition system. The apparatus includes a noise processor configured to process noise of an input speech signal, a feature extractor configured to extract features of speech data obtained through the noise processing, an event detector configured to detect events of the plurality of speech features occurring in the speech data using the noise-processed data and data of the extracted features, a decoder configured to perform speech recognition using a plurality of preset speech recognition models for the extracted feature data, and an utterance verifier configured to calculate confidence measurement values in units of words and sentences using information on the plurality of events detected by the event detector and a preset utterance verification model and perform utterance verification according to the calculated confidence measurement values..
Electronics And Telecommunications Research Institute

System and providing generated speech via a network

A system and method of operating an automatic speech recognition application over an internet protocol network is disclosed. The asr application communicates over a packet network such as an internet protocol network or a wireless network.
Nuance Communications, Inc.

Data augmentation method based on stochastic feature mapping for automatic speech recognition

A method of augmenting training data includes converting a feature sequence of a source speaker determined from a plurality of utterances within a transcript to a feature sequence of a target speaker under the same transcript, training a speaker-dependent acoustic model for the target speaker for corresponding speaker-specific acoustic characteristics, estimating a mapping function between the feature sequence of the source speaker and the speaker-dependent acoustic model of the target speaker, and mapping each utterance from each speaker in a training set using the mapping function to multiple selected target speakers in the training set.. .
International Business Machines Corporation

Method and annotating video content with metadata generated using speech recognition technology

A method and apparatus is provided for annotating video content with metadata generated using speech recognition technology. The method begins by rendering video content on a display device.
Google Technology Holdings Llc

Wireless security system

A wireless doorbell having a housing, the housing having a rear portion and a front portion, the rear portion configured to be secured to a support and the front portion configured to be secured to the rear portion. The wireless doorbell having a sensor configured to detect an object in a vicinity of the wireless doorbell, a camera configured to be activated and obtain at least one image, and a microphone configured to obtain audio signals.
Advanced Wireless Innovations Llc

Microphone circuit assembly and system with speech recognition

The present invention relates in one aspect to a microphone circuit assembly for an external application processor such as a programmable digital signal processor. The microphone circuit assembly comprises a microphone preamplifier and analog-to-digital converter generate microphone signal samples at a first predetermined rate.
Analog Devices Global

Speech recognition device and speech recognition method

A speech recognition device: transmits an input voice to a server; receives a first speech recognition result that is a result from speech recognition by the server on the transmitted input voice; performs speech recognition on the input voice to obtain a second speech recognition result; refers to speech rules each representing a formation of speech elements for the input voice, to determine the speech rule matched to the second speech recognition result; determines from the correspondence relationships among presence/absence of the first speech recognition result, presence/absence of the second speech recognition result and presence/absence of the speech element that forms the speech rule, a speech recognition state indicating the speech element whose speech recognition result is not obtained; generates according to the determined speech recognition state, a response text for inquiring about the speech element whose speech recognition result is not obtained; and outputs that text.. .
Mitsubishi Electric Corporation

Recognizing accented speech

Techniques (300, 400, 500) and apparatuses (100, 200, 700) for recognizing accented speech are described. In some embodiments, an accent module recognizes accented speech using an accent library based on device data, uses different speech recognition correction levels based on an application field into which recognized words are set to be provided, or updates an accent library based on corrections made to incorrectly recognized speech..
Google Technology Holdings Llc

Recognizing accented speech

Techniques (300, 400, 500) and apparatuses (100, 200, 700) for recognizing accented speech are described. In some embodiments, an accent module recognizes accented speech using an accent library based on device data, uses different speech recognition correction levels based on an application field into which recognized words are set to be provided, or updates an accent library based on corrections made to incorrectly recognized speech..
Google Technology Holdings Llc

System and neural network based feature extraction for acoustic model development

A system and method are presented for neural network based feature extraction for acoustic model development. A neural network may be used to extract acoustic features from raw mfccs or the spectrum, which are then used for training acoustic models for speech recognition systems.
Interactive Intelligence Group, Inc.

Speech recognition

This patent disclosure relates to a voice technology and discloses a voice recognition method and electronic device. In some embodiments of this disclosure, soft clustering calculation is performed in advance according to n gausses obtained by model training, to obtain m soft clustering gausses; when voice recognition is performed, voice is converted to obtain an eigenvector, and top l soft clustering gausses with highest scores are calculated according to the eigenvector, wherein the l is less than the m; and member gausses among the l soft clustering gausses are used as gausses that need to participate in calculation in an acoustic model in a voice recognition process to calculate likelihood of the acoustic model..
Le Shi Zhi Xin Electronic Technology (tianjin) Limited

Automated equalization

Techniques for improving speech recognition are described. An example of an electronic device includes an extracting unit to extract a reference spectral profile from a reference signal and a device spectral profile from a device signal.
Intel Corporation

Speech recognition with selective use of dynamic language models

This document describes, among other things, a computer-implemented method for transcribing an utterance. The method can include receiving, at a computing system, speech data that characterizes an utterance of a user.
Google Inc.

Method for detecting driving noise and improving speech recognition in a vehicle

The disclosure concerns a method for recognizing driving noise in a sound signal that is acquired by a microphone disposed in a vehicle. The sound signal originates from the surface structure of the road.
Ford Global Technologies, Llc

Fast out-of-vocabulary search in automatic speech recognition systems

A method including: receiving, on a computer system, a text search query, the query including one or more query words; generating, on the computer system, for each query word in the query, one or more anchor segments within a plurality of speech recognition processed audio files, the one or more anchor segments identifying possible locations containing the query word; post-processing, on the computer system, the one or more anchor segments, the post-processing including: expanding the one or more anchor segments; sorting the one or more anchor segments; and merging overlapping ones of the one or more anchor segments; and searching, on the computer system, the post-processed one or more anchor segments for instances of at least one of the one or more query words using a constrained grammar.. .
Genesys Telecommunications Laboratories, Inc.

Efficient empirical determination, computation, and use of acoustic confusability measures

Efficient empirical determination, computation, and use of an acoustic confusability measure comprises: (1) an empirically derived acoustic confusability measure, comprising a means for determining the acoustic confusability between any two textual phrases in a given language, where the measure of acoustic confusability is empirically derived from examples of the application of a specific speech recognition technology, where the procedure does not require access to the internal computational models of the speech recognition technology, and does not depend upon any particular internal structure or modeling technique, and where the procedure is based upon iterative improvement from an initial estimate; (2) techniques for efficient computation of empirically derived acoustic confusability measure, comprising means for efficient application of an acoustic confusability score, allowing practical application to very large-scale problems; and (3) a method for using acoustic confusability measures to make principled choices about which specific phrases to make recognizable by a speech recognition application.. .
Promptu Systems Corporation

System and dynamic asr based on social media

System and method to adjust an automatic speech recognition (asr) engine, the method including: receiving social network information from a social network; data mining the social network information to extract one or more characteristics; inferring a trend from the extracted one or more characteristics; and adjusting the asr engine based upon the inferred trend. Embodiments of the method may further include: receiving a speech signal from a user; and recognizing the speech signal by use of the adjusted asr engine.
Avaya Inc.

Semantic word affinity automatic speech recognition

System and techniques for direct motion sensor input to rendering pipeline are described herein. A ranked list of asr hypotheses may be obtained.

Technologies for end-of-sentence detection using syntactic coherence

Technologies for detecting an end of a sentence in automatic speech recognition are disclosed. An automatic speech recognition device may acquire speech data, and identify phonemes and words of the speech data.

System and user-specified pronunciation of words for speech synthesis and recognition

The method is performed at an electronic device with one or more processors and memory storing one or more programs for execution by the one or more processors. A first speech input including at least one word is received.
Apple Inc.

Method for interaction with terminal and electronic the same

The present application discloses a method for interaction with terminal and an electronic apparatus for the same. The method includes: determining whether a downward acceleration of a gesture is greater than a default threshold value when the gesture is detected under state of a displayed interface, wherein the displayed interface comprises: a replying information and recognition result interface, a replying information full screen interface, or a replying information full screen extension interface after record of speech in a speech recognition interface is detected to be finished; determining an operation type corresponding to the gesture, according to determination of whether the downward acceleration of the gesture is greater than the default threshold value; and executing an interaction corresponding to the operation type, according to the operation type..
Le Shi Zhi Xin Electronic Technology (tianjin) Limited

Natural human-computer interaction for virtual personal assistant systems

Technologies for natural language interactions with virtual personal assistant systems include a computing device configured to capture audio input, distort the audio input to produce a number of distorted audio variations, and perform speech recognition on the audio input and the distorted audio variants. The computing device selects a result from a large number of potential speech recognition results based on contextual information.
Intel Corporation

Multimodal speech recognition for real-time video audio-based display indicia application

Aspects relate to computer implemented methods, systems, and processes to automatically generate audio-based display indicia of media content including receiving, by a processor, a plurality of media content categories including at least one feature, receiving a plurality of categorized speech recognition algorithms, each speech recognition algorithm being associated with a respective one or more of the plurality of media content categories, determining a media content category of a current media content based on at least one feature of the current media content, selecting one speech recognition algorithm from the plurality of categorized speech recognition algorithms based on the determination of the media content category of the current media content, and applying the selected speech recognition algorithm to the current media content.. .
International Business Machines Corporation

Motor vehicle device operation with operating correction

A method for operating a motor vehicle operating device to carry out with voice control two operating steps. A first vocabulary is set, which is provided for the first operating step, to a speech recognition device.
Audi Ag

System and methods for adapting neural network acoustic models

Techniques for adapting a trained neural network acoustic model, comprising using at least one computer hardware processor to perform: generating initial speaker information values for a speaker; generating first speech content values from first speech data corresponding to a first utterance spoken by the speaker; processing the first speech content values and the initial speaker information values using the trained neural network acoustic model; recognizing, using automatic speech recognition, the first utterance based, at least in part on results of the processing; generating updated speaker information values using the first speech data and at least one of the initial speaker information values and/or information used to generate the initial speaker information values; and recognizing, based at least in part on the updated speaker information values, a second utterance spoken by the speaker.. .
Nuance Communications, Inc.

Text rule based multi-accent speech recognition with single acoustic model and automatic accent detection

Embodiments are disclosed for recognizing speech in a computing system. An example speech recognition method includes receiving metadata at a generation unit that includes a database of accented substrings, generating, via the generation unit, accent-corrected phonetic data for words included in the metadata, the accent-corrected phonetic data representing different pronunciations of the words included in the metadata based on the accented substrings stored in the database, receiving, at a voice recognition engine, extracted speech data derived from utterances input by a user to the speech recognition system, and receiving, at the voice recognition engine, the accent-corrected phonetic data.
Harman International Industries, Incorporated

Discriminative training of automatic speech recognition models with natural language processing dictionary for spoken language processing

Methods and systems for language processing includes training one or more automatic speech recognition models using an automatic speech recognition dictionary. A set of n automatic speech recognition hypotheses for an input is determined, based on the one or more automatic speech recognition models, using a processor.
International Business Machines Corporation

Systems and methods for a multi-core optimized recurrent neural network

Systems and methods for a multi-core optimized recurrent neural network (rnn) architecture are disclosed. The various architectures affect communication and synchronization operations according to the multi-bulk-synchronous-parallel (mbsp) model for a given processor.
Baidu Usa Llc

Incorporating an exogenous large-vocabulary model into rule-based speech recognition

Incorporation of an exogenous large-vocabulary model into rule-based speech recognition is provided. An audio stream is received by a local small-vocabulary rule-based speech recognition system (svsrs), and is streamed to a large-vocabulary statistically-modeled speech recognition system (lvsrs).
Microsoft Technology Licensing, Llc

Applying neural network language models to weighted finite state transducers for automatic speech recognition

Systems and processes for converting speech-to-text are provided. In one example process, speech input can be received.
Apple Inc.

Method of and system for providing adaptive respondent training in a speech recognition application

A system for conducting a telephonic speech recognition application includes an automated telephone device for making telephonic contact with a respondent and a speech recognition device which, upon the telephonic contact being made, presents the respondent with at least one introductory prompt for the respondent to reply to; receives a spoken response from the respondent; and performs a speech recognition analysis on the spoken response to determine a capability of the respondent to complete the application. If the speech recognition device, based on the spoken response to the introductory prompt, determines that the respondent is capable of competing the application, the speech recognition device presents at least one application prompt to the respondent.
Eliza Corporation

Semi-supervised system for multichannel source enhancement through configurable adaptive transformations and deep neural network

Various techniques are provided to perform enhanced automatic speech recognition. For example, a subband analysis may be performed that transforms time-domain signals of multiple audio channels in subband signals.
Conexant Systems, Inc.

Prioritized content loading for vehicle automatic speech recognition systems

A method of loading content items for accessibility by a vehicle automatic speech recognition (asr) system. The method tracks content items requested by one or more users and prioritizes the loading of requested content items and/or selectively loads requested content items at least partially based on the interaction history of one or more users.
Gm Global Technology Operations Llc

Method and system for joint training of hybrid neural networks for acoustic modeling in automatic speech recognition

Systems and methods for training networks are provided. A method for training networks comprises receiving an input from each of a plurality of neural networks differing from each other in at least one of architecture, input modality, and feature type, connecting the plurality of neural networks through a common output layer, or through one or more common hidden layers and a common output layer to result in a joint network, and training the joint network..
International Business Machines Corporation

Expansion of a question and answer database

A system and method for expanding a question and answer (q&a) database. The method includes preparing a set of q&a documents and speech recognition results of an agent's utterances in conversations between an agent and a customer, each q&a document in the set having an identifier, and each speech recognition result having an identifier common with the identifier of a relevant q&a document, and adding one or more repetition parts extracted from the speech recognition results of the agent's utterances to a corresponding q&a document in the set..
International Business Machines Corporation

Electronic device, computer-implemented method and computer program

An electronic device comprising a processor which is configured to perform speech recognition on an audio signal, linguistically analyze the output of the speech recognition for named-entities, perform an internet or database search for the recognized named-entities to obtain query results, and display, on a display of the electronic device, information obtained from the query results on a timeline.. .
Sony Corporation

Method and system for role dependent context sensitive spoken and textual language understanding with neural networks

A method and system processes utterances that are acquired either from an automatic speech recognition (asr) system or text. The utterances have associated identities of each party, such as role a utterances and role b utterances.
Mitsubishi Electric Research Laboratories, Inc.

Method for using a human-machine interface device for an aircraft comprising a speech recognition unit

The general field of the invention is that of methods for using a human-machine interface device for an aircraft comprising at least one speech recognition unit, one display device with a touch interface, one graphical interface computer and one electronic computing unit, the set being designed to graphically present a plurality of commands, each command being classed in at least a first category, referred to as the critical category, and a second category, referred to as the non-critical category, each non-critical command having a plurality of options, each option having a name, said names assembled in a database called a “lexicon”. The method according to the invention comprises steps of recognizing displayed commands, activating the speech recognition unit, comparing the touch and voice information and a validation step..
Thales

Pronunciation learning through correction logs

A new pronunciation learning system for dynamically learning new pronunciations assisted by user correction logs. The user correction logs provide a record of speech recognition events and subsequent user behavior that implicitly confirms or rejects the recognition result and/or shows the user's intended words by via subsequent input.
Microsoft Technology Licensing, Llc.

Accent correction in speech recognition systems

A method comprising receiving an audio input signal comprising speech, determining an accent class corresponding to the speech, identifying an accented phone pattern within the speech, replacing the accented phone pattern with an unaccented phone pattern, and generating an unaccented output signal from the unaccented phone pattern.. .
International Business Machines Corporation

Speech recognition apparatus and method

A speech recognition apparatus includes a predictor configured to predict a word class of a word following a word sequence that has been previously searched for based on the word sequence that has been previously searched for; and a decoder configured to search for a candidate word corresponding to a speech signal, extend the word sequence that has been previously searched for using the candidate word that has been searched for, and adjust a probability value of the extended word sequence based on the predicted word class.. .
Samsung Electronics Co., Ltd.

Call context metadata

A computer detects a connected voice or video call between participants and records a brief media sample. speech recognition is utilized to determine when the call is connected as well as to transcribe the content of the audio portion of the media sample.

Terminal device and communication communication of speech signals

A reception unit receives a speech signal from another terminal device. A reproduction unit reproduces the speech signal received in the reception unit.

Generating call context metadata from speech, contacts, and common names in a geographic area

A computer detects a connected voice or video call between participants and records a brief media sample. speech recognition is utilized to determine when the call is connected as well as to transcribe the content of the audio portion of the media sample.

Speech recognition method and speech recognition apparatus to improve performance or response of speech recognition

In a speech recognition method, a criteria value is determined to determine the length of a silent section included in a processing section, and a processing mode to use is determined in accordance with the criteria value. The criteria value is used to obtain audio information of the processing section.

Speech processing system and terminal

[solution] receiving a speech utterance, the speech processing system performs speech recognition and displays a text 158 of the recognition result. Further, the speech processing system translates the recognition result in accordance with settings to a text 176 of another language and displays and synthesizes speech of the translated result.

Deployed end-to-end speech recognition

Embodiments of end-to-end deep learning systems and methods are disclosed to recognize speech of vastly different languages, such as english or mandarin chinese. In embodiments, the entire pipelines of hand-engineered components are replaced with neural networks, and the end-to-end learning allows handling a diverse variety of speech including noisy environments, accents, and different languages.

System and supporting automatic speech recognition of regional accents based on statistical information and user corrections

Disclosed herein is a system for compensating for dialects and accents comprising an automatic speech recognition system comprising an automatic speech recognition device that is operative to receive an utterance in an acoustic format from a user with a user interface; a speech to text conversion engine that is operative to receive the utterance from the automatic speech recognition device and to prepare a textual statement of the utterance; and a correction database that is operative to store textual statements of all utterances; where the correction database is operative to secure a corrected transcript of the textual statement of the utterance from the speech to text conversion engine and adds it to the corrections database if the corrected transcript of the textual statement of the utterance is not available.. .

End-to-end speech recognition

Embodiments of end-to-end deep learning systems and methods are disclosed to recognize speech of vastly different languages, such as english or mandarin chinese. In embodiments, the entire pipelines of hand-engineered components are replaced with neural networks, and the end-to-end learning allows handling a diverse variety of speech including noisy environments, accents, and different languages.

Systems and methods for speech-based searching of content repositories

According to some aspects, a method of searching for content in response to a user voice query is provided. The method may comprise receiving the user voice query, performing speech recognition to generate n best speech recognition results comprising a first speech recognition result, performing a supervised search of at least one content repository to identify one or more supervised search results using one or more classifiers that classify the first speech recognition result into at least one class that identifies previously classified content in the at least one content repository, performing an unsupervised search of the at least one content repository to identify one or more unsupervised search results, wherein performing the unsupervised search comprises performing a word search of the at least one content repository, and generating combined results from among the one or more supervised search results and the one or more unsupervised search results..

Content analysis to enhance voice search

Methods and apparatus for improving speech recognition accuracy in media content searches are described. An advertisement for a media content item is analyzed to identify keywords that may describe the media content item.

Methods and systems for interfacing a speech dialog with new applications

Methods and systems are provided interfacing a speech system with a new application. In one embodiment a method includes: maintaining a registration data datastore that stores registration data from the new application and one or more other applications; receiving, at a router module associated with the speech system, a result from a speech recognition module; processing, by the router module, the result and the registration data to determine a possible new application; and providing the possible new application to the speech system..

Method and tuning speech recognition systems to accommodate ambient noise

A system includes a head and torso simulation (hats) system configured to play back pre-recorded audio commands while simulating a driver head location as an output location. The system also includes a vehicle speaker system and a processor configured to engage a vehicle heating, ventilation and air-conditioning (hvac) system.
Ford Global Technologies, Llc

Automatic speaker identification using speech recognition features

Features are disclosed for automatically identifying a speaker. Artifacts of automatic speech recognition (“asr”) and/or other automatically determined information may be processed against individual user profiles or models.
Amazon Technologies, Inc.

Confidence features for automated speech recognition arbitration

The described technology provides arbitration between speech recognition results generated by different automatic speech recognition (asr) engines, such as asr engines trained according to different language or acoustic models. The system includes an arbitrator that selects between a first speech recognition result representing an acoustic utterance as transcribed by a first asr engine and a second speech recognition result representing the acoustic utterance as transcribed by a second asr engine.
Microsoft Technology Licensing, Llc

Multiple speech locale-specific hotword classifiers for selection of a speech locale

Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for recognizing speech in an utterance. The methods, systems, and apparatus include actions of receiving an utterance and obtaining acoustic features from the utterance.
Google Inc.

Method and device of speech recognition

A method of speech recognition includes the following steps: receiving a first speech input, and converting the first speech input into a first digital signal; transmitting the first digital signal to a cloud server; receiving a first post-processing result generated according to the first digital signal; receiving a second speech input, and converting the second speech input into a second digital signal; performing a first speech recognition to the second digital signal to obtain a recognition result by using a first speech recognition model; and comparing the first post-processing result with the recognition result to determine a speech recognition result.. .
Shenzhen Raisound Technology Co. Ltd.

Method and device for speech recognition

An embodiment of the present disclosure discloses a method and a system for speech recognition. The method comprises steps of intercepting a first speech segment from a monitored speech signal, analyzing the first speech segment to determine an energy spectrum; extracting characteristics of the first speech segment according to the energy spectrum, determining speech characteristics; analyzing the energy spectrum of the first speech segment according to the speech characteristics, intercepting a second speech segment; recognizing the speech of the second speech segment, and obtaining a speech recognition result.
Le Shi Zhi Xin Electronic Technology (tianjin) Limited

Method and system for reading fluency training

A non-transitory processor-readable medium stores code representing instructions to be executed by a processor. The code causes the processor to receive a request from a user of a client device to initiate a speech recognition engine for a web page displayed at the client device.
Rosetta Stone Ltd.

Computer speech recognition and semantic understanding from activity patterns

A user activity pattern may be ascertained using signal data from a set of computing devices. The activity pattern may be used to infer user intent with regards to a user interaction with a computing device or to predict a likely future action by the user.
Microsoft Technology Licensing, Llc

Method and keyword speech recognition

Phoneme images are created for keywords and audio files. The keyword images and audio file images are used to identify keywords within the audio file when the phoneme images match.
Apptek, Inc.

Systems, methods and devices for intelligent speech recognition and processing

Systems, methods, and devices for intelligent speech recognition and processing are disclosed. According to one embodiment, a method for improving intelligibility of a speech signal may include (1) at least one processor receiving an incoming speech signal comprising a plurality of sound elements; (2) the at least one processor recognizing a sound element in the incoming speech signal to improve the intelligibility thereof; (3) the at least one processor processing the sound element by at least one of modifying and replacing the sound element; and (4) the at least one processor outputting the processed speech signal comprising the processed sound element..
Audimax Llc

Speech recognition candidate selection based on non-acoustic input

A method includes the following steps. A speech input is received.
International Business Machines Corporation

Method and context-augmented speech recognition

A system includes a processor configured to receive speech-input. The processor is further configured to receive at least one location-identification.

Systems and methods for adaptive proper name entity recognition and understanding

Various embodiments contemplate systems and methods for performing automatic speech recognition (asr) and natural language understanding (nlu) that enable high accuracy recognition and understanding of freely spoken utterances which may contain proper names and similar entities. The proper name entities may contain or be comprised wholly of words that are not present in the vocabularies of these systems as normally constituted.
Promptu Systems Corporation

Neural network training apparatus and method, and speech recognition apparatus and method

A neural network training apparatus includes a primary trainer configured to perform a primary training of a neural network model based on clean training data and target data corresponding to the clean training data; and a secondary trainer configured to perform a secondary training of the neural network model on which the primary training has been performed based on noisy training data and an output probability distribution of an output class for the clean training data calculated during the primary training of the neural network model.. .
Samsung Electronics Co., Ltd.

System and broadcasting audio tweets

Disclosed herein are systems, methods, and non-transitory computer-readable storage media for broadcasting audio tweets. A system broadcasting audio tweets receives tweets via telephone devices, wherein each listener hears a telephone call of a broadcast on the telephone devices.
Audionow Ip Holdings, Llc

Transfer function to generate lombard speech from neutral speech

A controller may be programmed to create a speech utterance set for speech recognition training by, in response to receiving data representing a neutral utterance and parameter values defining signal noise, generating data representing a lombard effect version of the neutral utterance using a transfer function associated with the parameter values and defining distortion between neutral and lombard effect versions of a same utterance due to the signal noise.. .
Ford Global Technologies, Llc

Electronic device and recognizing speech

An electronic device and a method for recognizing a speech are provided. The method for recognizing a speech by an electronic device includes: receiving sounds generated from a sound source through a plurality of microphones; calculating power values from a plurality of audio signals generated by performing signal processing on each sound input through the plurality of microphones and calculating direction information on the sound source based on the calculated power values and storing the calculated direction information; and performing the speech recognition on a speech section included in the audio signal based on the direction information on the sound source.
Samsung Electronics Co., Ltd.

Sound envelope deconstruction to identify words and speakers in continuous speech

A speech recognition capability in which speakers of spoken text are identified based on the contour of sound waves representing the spoken text. Variations in the contour of the sound waves are identified, features are assigned to those variations, and parameters of those features are grouped into predefined characteristics.
International Business Machines Corporation

Methods and joint stochastic and deterministic dictation formatting

Methods and apparatus for speech recognition on user dictated words to generate a dictation and using a discriminative statistical model derived from a deterministic formatting grammar module and user formatted documents to extract features and estimate scores from the formatting graph. The processed dictation can be output as formatted text based on a formatting selection to provide an integrated stochastic and deterministic formatting of the dictation..
Nuance Communications, Inc.

Techniques for updating an automatic speech recognition system using finite-state transducers

Techniques are described for updating an automatic speech recognition (asr) system that, prior to the update, is configured to perform asr using a first finite-state transducer (fst) comprising a first set of paths representing recognizable speech sequences. A second fst may be accessed, comprising a second set of paths representing speech sequences to be recognized by the updated asr system.
Nuance Communications, Inc.

Using word confidence score, insertion and substitution thresholds for selected words in speech recognition

A method and system for improving the accuracy of a speech recognition system using word confidence score (wcs) processing is introduced. Parameters in a decoder are selected to minimize a weighted total error rate, such that deletion errors are weighted more heavily than substitution and insertion errors.
Adacel, Inc.

Mobile phone

A mobile phone including a first touch screen and a second touch screen respectively coupled to a processor; a transmitter and a receiver respectively coupled to the processor; a memorizer coupled to the processor, which stores speech recognition database and contacts database; a mobile communication unit and a video communication unit both coupled to the processor, the mobile communication unit associated with the first touch screen, including a customer recognition module, a baseband processing chip and an rf module; the video communication unit associated with the second touch screen, including a front camera, an image processing chip and a wireless communication module. Mobile phone of the disclosure provides two touch screens dedicated for call, which makes the elderly very convenient due to the very few interaction level of the mobile phone..
Lecloud Computing Co., Ltd.

Speech recognition and transcription among users having heterogeneous protocols

A system is disclosed for facilitating free form dictation, including directed dictation and constrained recognition and/or structured transcription among users having heterogeneous native (legacy) protocols for generating, transcribing, and exchanging recognized and transcribed speech. The system includes at least one system transaction manager having a “system protocol,” to receive a verified, streamed speech information request from at least one authorized user employing a first legacy user protocol.
Advanced Voice Recognition Systems, Inc.

Source-based automatic speech recognition

Recognizing a user's speech is a computationally demanding task. If a user calls a destination server, little may be known about the user or the user's speech profile.
Avaya Inc.

Visual confirmation for a recognized voice-initiated action

Techniques described herein provide a computing device configured to provide an indication that the computing device has recognized a voice-initiated action. In one example, a method is provided for outputting, by a computing device and for display, a speech recognition graphical user interface (gui) having at least one element in a first visual format.
Google Inc.

Electronic device and executing function using speech recognition thereof

An electronic device and a method for using speech recognition are provided. The electronic device includes an input device, a touch screen display, a processor, and a memory.
Samsung Electronics Co., Ltd.

Architecture for multi-domain natural language processing

Features are disclosed for processing a user utterance with respect to multiple subject matters or domains, and for selecting a likely result from a particular domain with which to respond to the utterance or otherwise take action. A user utterance may be transcribed by an automatic speech recognition (“asr”) module, and the results may be provided to a multi-domain natural language understanding (“nlu”) engine.
Amazon Technologies, Inc.

Method and system for adjusting user speech in a communication session

A system that incorporates the subject disclosure may include, for example, receive user speech captured at a second end user device during a communication session between the second end user device and a first end user device, apply speech recognition to the user speech, identify an unclear word in the user speech based on the speech recognition, adjust the user speech to generate adjusted user speech by replacing all or a portion of the unclear word with replacement audio content, and provide the adjusted user speech to the first end user device during the communication session. Other embodiments are disclosed..
At&t Intellectual Property I, L.p.

Apparatuses and methods for enhanced speech recognition in variable environments

Systems, apparatuses, and methods are described to increase a signal-to-noise ratio difference between a main channel and reference channel. The increased signal-to-noise ratio difference is accomplished with an adaptive threshold for a desired voice activity detector (dvad) and shaping filters.
Kopin Corporation

Speech recognition circuit using parallel processors

A speech recognition circuit comprises an input buffer for receiving processed speech parameters. A lexical memory contains lexical data for word recognition.
Zentian Limited

Mixed speech recognition

The claimed subject matter includes a system and method for recognizing mixed speech from a source. The method includes training a first neural network to recognize the speech signal spoken by the speaker with a higher level of a speech characteristic from a mixed speech sample.
Microsoft Technology Licensing, Llc

Apparatus and normalizing input data of acoustic model and speech recognition apparatus

An apparatus for normalizing input data of an acoustic model includes a window extractor configured to extract windows of frame data to be input to an acoustic model from frame data of a speech to be recognized, and a normalizer configured to normalize the frame data to be input to the acoustic model in units of the extracted windows.. .
Samsung Electronics Co., Ltd.

Speech interaction apparatus and method

According to one embodiment, a speech interaction apparatus for performing an interaction with a user based on a scenario includes a speech recognition unit, a determination unit, a selection unit and an execution unit. The speech recognition unit recognizes a speech of the user and generates a recognition result text.
Kabushiki Kaisha Toshiba

Information processing system, and vehicle-mounted device

This invention can enhance the convenience of a user. An information processing system 1 includes: a vehicle-mounted device 3 which has a sound pickup unit 36 that picks up a speech sound, and a transmitting unit that transmits speech data that is generated based on the speech sound that is picked up to a control server 8; and the control server 8 which has a server storage unit 82 that stores a pictogram correspondence table 82a in which recognition keywords and pictogram ids indicating a plurality of pictograms that correspond to the recognition keywords are associated, and a server control unit 81 which executes pictogram processing that selects a recognition keyword that corresponds to text representing a speech sound that is generated by speech recognition based on speech data from among the recognition keywords included in the pictogram correspondence table 82a, and in accordance with a predetermined condition, selects a single pictogram id from among a plurality of pictogram ids that are associated with the selected recognition keyword..
Clarion Co., Ltd.

Flexible schema for language model customization

The customization of language modeling components for speech recognition is provided. A list of language modeling components may be made available by a computing device.
Microsoft Technology Licensing, Llc

Dynamically adding or removing functionality to speech recognition systems

A system and method of changing features of an existing automatic speech recognition (asr) system includes: monitoring speech received from a vehicle occupant for one or more keywords identifying a feature to remove from or add to the asr system; detecting the keywords in the monitored speech; and adding the identified feature to or removing the identified feature from from the asr system.. .
Gm Global Technology Operations Llc

Techniques to provide a standard interface to a speech recognition platform

Techniques and systems to provide speech recognition services over a network using a standard interface are described. In an embodiment, a technique includes accepting a speech recognition request that includes at least audio input, via an application program interface (api).
Microsoft Technology Licensing, Llc

Speech recognition apparatus and method with acoustic modelling

Provided is a speech recognition apparatus. The apparatus includes a preprocessor configured to extract select frames from all frames of a first speech of a user, and a score calculator configured to calculate an acoustic score of a second speech, made up of the extracted select frames, by using a deep neural network (dnn)-based acoustic model, and to calculate an acoustic score of frames, of the first speech, other than the select frames based on the calculated acoustic score of the second speech..
Samsung Electronics Co., Ltd.

Voice language communication device and system

A voice language communication device and system that includes: a speaker; a microphone; a display panel; a control panel; a power button; a record button; software stored on a hard drive; a language database, where software accesses the language database during operation; a plurality of languages stored on the language database; speech recognition functions related to the software, where the speech recognition functions recognizes a user's language as an input language; and an output language, where the output language is a translation of the input language and the output language is instantaneously emitted to the speaker.. .

Streamlined navigational speech recognition

A system and method of performing automatic speech recognition (asr) includes: receiving speech at a vehicle microphone; communicating the received speech to an asr system; measuring an amount of time that elapses while speech is received; selecting a point-of-interest (poi) context or an address context based on the measured amount of received time; and processing the received speech using a poi context-based grammar when a poi context is selected or an address-based grammar when an address context is selected.. .
Gm Global Technology Operations Llc

Speech recognition system and gain setting system

When an instruction to start voice input is received from the user, a gain controller acquires, from a gain table which defines a correspondence between vehicle speed ranges and gains, a gain corresponding to a vehicle speed range including the vehicle speed of a vehicle detected by a vehicle speed detector, and sets the acquired gain as the gain of an input amplifier that amplifies an input audio signal output by a microphone. As a gain corresponding to each vehicle speed range, the gain table records a gain of the input amplifier corresponding, in an experimentally determined frequency distribution of peak values in the vehicle speed range, to a maximum frequency in the range of magnitude of voice output as an input audio signal by the microphone and to be input to a speech recognition engine as voice having a magnitude within the input range of the speech recognition engine..
Alpine Electronics, Inc.

Incremental utterance decoder combination for efficient and accurate decoding

An incremental speech recognition system. The incremental speech recognition system incrementally decodes a spoken utterance using an additional utterance decoder only when the additional utterance decoder is likely to add significant benefit to the combined result.
Microsoft Technology Licensing, Llc

System and determining recipient of spoken command in a control system

Disclosed is an apparatus and method for determining which controllable device an audible command is directed towards, the method comprising: receiving at each of two or more controlling devices the audible command signal, the audible command being directed to control at least one of two or more controllable devices controlled by a respective one of the two or more controlling devices; digitizing each of the received audible command signals; attaching a unique identifier to each digitized audible command so as to uniquely correlate it to a respective controlling device; determining a magnitude of each of the digitized audible command; determining a digitized audible command with the greatest magnitude, and further determining to which controlling device the audible command is directed to on the basis of the unique identifier associated with the digitized audible command with the greatest magnitude; performing speech recognition on the digitized audible command with the greatest magnitude; and forwarding a command to the controlling device corresponding to the digitized audible command with the greatest magnitude, the command corresponding to the audible command that can be implemented on the controllable device controlled by the controlling device.. .
Crestron Electronics, Inc.

Semiconductor device, system, electronic device, and speech recognition method

A semiconductor device is provided with a data storage unit configured to store speech reproduction data that includes transition destination information or speech recognition option data that includes transition destination information, and a processor configured to perform processing for generating an output speech signal using speech reproduction data read out from the data storage unit or perform speech recognition processing on an input speech signal using speech recognition option data read out from the data storage unit, and to read out, based on the transition destination information included in speech reproduction data or speech recognition option data used in the processing, speech recognition option data or speech reproduction data to be used in the next processing from the data storage unit.. .
Seiko Epson Corporation

Methods for speech enhancement and speech recognition using neural networks

The present invention relates to implementing a system and method to improve speech recognition and speech enhancement of noisy speech. The present invention discloses a way to improve the noise robustness of a speech recognition system by providing additional input to a neural network speech classifier.

Dynamic adaptation of language models and semantic tracking for automatic speech recognition

Generally, this disclosure provides systems, devices, methods and computer readable media for adaptation of language models and semantic tracking to improve automatic speech recognition (asr). A system for recognizing phrases of speech from a conversation may include an asr circuit configured to transcribe a user's speech to a first estimated text sequence, based on a generalized language model.
Intel Corporation

Multichannel raw-waveform neural networks

Methods, systems, and apparatus, including computer programs encoded on computer storage media, for using neural networks. One of the methods includes receiving, by a neural network in a speech recognition system, first data representing a first raw audio signal and second data representing a second raw audio signal, the first raw audio signal and the second raw audio signal for the same period of time, generating, by a spatial filtering convolutional layer in the neural network, a spatial filtered output the first data and the second data, generating, by a spectral filtering convolutional layer in the neural network, a spectral filtered output using the spatial filtered output, and processing, by one or more additional layers in the neural network, the spectral filtered output to predict sub-word units encoded in both the first raw audio signal and the second raw audio signal..
Google Inc.

Hotword detection on multiple devices

Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for hotword detection on multiple devices are disclosed. In one aspect, a method includes the actions of receiving, by a first computing device, audio data that corresponds to an utterance.
Google Inc.

Apparatus and speech recognition, and training transformation parameter

Provided are a method and an apparatus for speech recognition, and a method and an apparatus for training transformation parameter. A speech recognition apparatus includes an acoustic score calculator configured to use an acoustic model to calculate an acoustic score of a speech input, an acoustic score transformer configured to transform the calculated acoustic score into an acoustic score corresponding to standard pronunciation by using a transformation parameter, and a decoder configured to decode the transformed acoustic score to output a recognition result of the speech input..
Samsung Electronics Co., Ltd.

Automatic speech recognition confidence classifier

The described technology provides normalization of speech recognition confidence classifier (cc) scores that maintains the accuracy of acceptance metrics. A speech recognition cc scores quantitatively represents the correctness of decoded utterances in a defined range (e.g., [0,1]).
Microsoft Technology Licensing, Llc

Automatic speech recognition with detection of at least one contextual element, and application management and maintenance of aircraft

An automatic speech recognition with detection of at least one contextual element, and application to aircraft flying and maintenance are provided. The automatic speech recognition device comprises a unit for acquiring an audio signal, a device for detecting the state of at least one contextual element, and a language decoder for determining an oral instruction corresponding to the audio signal.
Dassault Aviation

Apparatus and generating acoustic model, and speech recognition

Described are an apparatus and method for generating to generate an acoustic model. The apparatus and method include a processor a processor configured to calculate a noise representation that represents noise data by using a noise model, and generate the acoustic model through training using training noisy speech data, which comprises speech data and the noise data, a string of phonemes corresponding to the speech data, and the noise representation..
Samsung Electronics Co., Ltd.

Methods and speech recognition using a garbage model

Methods and apparatus for performing speech recognition using a garbage model. The method comprises receiving audio comprising speech and processing at least some of the speech using a garbage model to produce a garbage speech recognition result.
Nuance Communication, Inc

Microphone placement for sound source direction estimation

Architectures of numbers of microphones and their positioning in a device for sound source direction estimation and source separation are presented. The directions of sources are front, back, left, right, top, and bottom of the device, and can be determined by amplitude and phase differences of microphone signals with proper microphone positioning.
Microsoft Technology Licensing, Llc

Method and device for speech recognition

Embodiments of the present disclosure provide a method and device for speech recognition. The solution comprises: receiving a first speech signal issued by a user; performing analog to digital conversion on the first speech signal to generate a first digital signal after the analog to digital conversion; extracting a first speech parameter from the first digital signal, the first speech parameter describing a speech feature of the first speech signal; if the first speech parameter coincides with a first prestored speech parameter in a sample library, executing control signalling instructed by the first digital signal, the sample library prestoring prestored speech parameters of n users, n≧1.
Beijing Boe Multimedia Technology Co., Ltd.

Speech recognition apparatus and method

An apparatus includes a language model group identifier configured to identify a language model group based on determined characteristic data of a user, and a language model generator configured to generate a user-based language model by interpolating a general language model for speech recognition based on the identified language model group.. .
Samsung Electronics Co., Ltd.

Method and system for remotely training and commanding the speech recognition system on a cockpit via a carry-on-device in a connected aircraft

A method for implementing a speaker-independent speech recognition system with reduced latency is provided. The method includes capturing voice data at a carry-on-device from a user during a pre-flight check-in performed by the user for an upcoming flight; extracting features associated with the user from the captured voice data at the carry-on-device; uplinking the extracted features to the speaker-independent speech recognition system onboard the aircraft; and adapting the extracted features with an acoustic feature model of the speaker-independent speech recognition system..
Honeywell International Inc.

Adapting a speech system to user pronunciation

A system and method of adapting a speech system includes the steps of: receiving confirmation of a phonetic transcription of one or more names, receiving confirmation of a selected stored text result, and storing the phonetic transcription with the selected stored text result using an automatic speech recognition (asr) system, a text-to-speech (tts) system, or both.. .
Gm Global Technology Operations Llc

Enhanced speech endpointing

Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for receiving audio data including an utterance, obtaining context data that indicates one or more expected speech recognition results, determining an expected speech recognition result based on the context data, receiving an intermediate speech recognition result generated by a speech recognition engine, comparing the intermediate speech recognition result to the expected speech recognition result for the audio data based on the context data, determining whether the intermediate speech recognition result corresponds to the expected speech recognition result for the audio data based on the context data, and setting an end of speech condition and providing a final speech recognition result in response to determining the intermediate speech recognition result matches the expected speech recognition result, the final speech recognition result including the one or more expected speech recognition results indicated by the context data.. .
Google Inc.

Enhanced speech endpointing

Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for receiving audio data including an utterance, obtaining context data that indicates one or more expected speech recognition results, determining an expected speech recognition result based on the context data, receiving an intermediate speech recognition result generated by a speech recognition engine, comparing the intermediate speech recognition result to the expected speech recognition result for the audio data based on the context data, determining whether the intermediate speech recognition result corresponds to the expected speech recognition result for the audio data based on the context data, and setting an end of speech condition and providing a final speech recognition result in response to determining the intermediate speech recognition result matches the expected speech recognition result, the final speech recognition result including the one or more expected speech recognition results indicated by the context data.. .
Google Inc.

Systems, computer-implemented methods, and tangible computer-readable storage media for transcription alignment

Disclosed herein are systems, computer-implemented methods, and tangible computer-readable storage media for captioning a media presentation. The method includes receiving automatic speech recognition (asr) output from a media presentation and a transcription of the media presentation.
At&t Intellectual Property I, L.p.

Audio-visual speech recognition with scattering operators

Aspects described herein are directed towards methods, computing devices, systems, and computer-readable media that apply scattering operations to extracted visual features of audiovisual input to generate predictions regarding the speech status of a subject. Visual scattering coefficients generated according to one or more aspects described herein may be used as input to a neural network operative to generate the predictions regarding the speech status of the subject.
Nuance Communications, Inc.

Building of n-gram language model for automatic speech recognition (asr)

A method, a system, and a computer program product for building an n-gram language model for an automatic speech recognition. The method includes reading training text data and additional text data both for the n-gram language model from a storage, and building the n-gram language model by a smoothing algorithm having discount parameters for n-gram counts.
International Business Machines Corporation

Method and improving a neural network language model, and speech recognition method and apparatus

According to one embodiment, an apparatus for improving a neural network language model of a speech recognition system includes a word classifying unit, a language model training unit and a vector incorporating unit. The word classifying unit classifies words in a lexicon of the speech recognition system.
Kabushiki Kaisha Toshiba

Method and improving a language model, and speech recognition method and apparatus

According to one embodiment, an apparatus for improving a language model of a speech recognition system includes an extracting unit, a classifying unit, and a setting unit. The extracting unit extracts user words from a user document provided by a user.
Kabushiki Kaisha Toshiba

Topic shift detector

Aspects detect or recognize shifts in topics in computer implemented speech recognition processes as a function of mapping keywords to non-verbal cues. An initial topic is mapped to one or more keywords extracted from a first spoken query within a user keyword ontology mapping.
International Business Machines Corporation

Speech recognition apparatus and method

A speech recognition apparatus and method. The speech recognition apparatus includes a first recognizer configured to generate a first recognition result of an audio signal, in a first linguistic recognition unit, by using an acoustic model, a second recognizer configured to generate a second recognition result of the audio signal, in a second linguistic recognition unit, by using a language model, and a combiner configured to combine the first recognition result and the second recognition result to generate a final recognition result in the second linguistic recognition unit and to reflect the final recognition result in the language model.
Samsung Electronics Co., Ltd.

Speech recognition apparatus, vehicle having the speech recognition apparatus, and controlling the vehicle

Disclosed herein are speech recognition apparatuses, vehicles having the speech recognition apparatuses, and methods for controlling vehicles. According to an aspect, a speech recognition apparatus includes a speech input unit configured to receive a speech command from a user, a communication unit configured to receive the result of processing for speech recognition acquired by at least one user terminal located near the user, and a controller configured to compare the result of processing for speech recognition acquired from the speech command received by the speech input unit to the result of processing for speech recognition acquired by the at least one user terminal, thus processing the speech command according to the result of the comparison..
Hyundai Motor Company

Speech recognition system with abbreviated training

A method of adapting a speech recognition system to its user includes gathering information about a user of a speech recognition system, selecting at least a part of a speech model reflecting estimated speech attributes of the user based on the information about the user, running, in the speech recognition system, a speech model including the selected at least a part of a speech model, and training, in the speech recognition system, other parts of the speech model to reflect identified speech attributes of the user.. .
Toyota Motor Engineering & Manufacturing North America, Inc.

Order statistic techniques for neural networks

According to some aspects, a method of classifying speech recognition results is provided, using a neural network comprising a plurality of interconnected network units, each network unit having one or more weight values, the method comprising using at least one computer, performing acts of providing a first vector as input to a first network layer comprising one or more network units of the neural network, transforming, by a first network unit of the one or more network units, the input vector to produce a plurality of values, the transformation being based at least in part on a plurality of weight values of the first network unit, sorting the plurality of values to produce a sorted plurality of values, and providing the sorted plurality of values as input to a second network layer of the neural network.. .
Nuance Communications, Inc.

Adaptation of speech recognition

A method, computer program product, and system for adapting speech recognition of a user's speech is provided. The method includes receiving a first utterance from a user having a duration below a predetermined threshold, identifying at least one further utterance from the user that provides additional information, generating a concatenated utterance by concatenating the first utterance with the at least one further utterance, transmitting the concatenated utterance to a speech recognition server, receiving a transcription of the concatenated utterance from the speech recognition server that includes a transcription of the first utterance, and extracting the transcription of the first utterance from the transcription of the concatenated utterance.
International Business Machines Corporation

Computer-implemented performing distributed speech recognition

A computer-implemented system and method for performing distributed speech recognition is provided. Audio data is collected.
Intellisist, Inc.

Information processing apparatus, control method, and program

There is provided an information processing apparatus, control method, and program capable of notifying a user of a candidate for a response, from the middle of a speech, through a voice u1, the information processing apparatus including: a semantic analysis unit configured to perform semantic analysis on speech text recognized by a speech recognition unit in the middle of a speech; a score calculation unit configured to calculate a score for a response candidate on the basis of a result of the analysis performed by the semantic analysis unit; and a notification control unit configured to perform control to notify of the response candidate, in the middle of the speech, according to the score calculated by the score calculation unit.. .
Sony Corporation

Speech recognition using an operating system hooking component for context-aware recognition models

Inputs provided into user interface elements of an application are observed. Records are made of the inputs and the state(s) the application was in while the inputs were provided.
Mmodal Ip Llc

Data augmentation method based on stochastic feature mapping for automatic speech recognition

A method of augmenting training data includes converting a feature sequence of a source speaker determined from a plurality of utterances within a transcript to a feature sequence of a target speaker under the same transcript, training a speaker-dependent acoustic model for the target speaker for corresponding speaker-specific acoustic characteristics, estimating a mapping function between the feature sequence of the source speaker and the speaker-dependent acoustic model of the target speaker, and mapping each utterance from each speaker in a training set using the mapping function to multiple selected target speakers in the training set.. .
International Business Machines Corporation

Speech recognition support for remote applications and desktops

An application may be hosted for utilization by a remote computing platform. User interface (ui) elements of a ui generated by the hosted application may be identified.
Citrix Systems, Inc.

Frequency warping in a speech recognition system

Methods, systems, and apparatus, including computer programs encoded on computer storage media, for receiving a sequence representing an utterance, the sequence comprising a plurality of audio frames; determining one or more warping factors for each audio frame in the sequence using a warping neural network; applying, for each audio frame, the one or more warping factors for the audio frame to the audio frame to generate a respective modified audio frame, wherein the applying comprises using at least one of the warping factors to scale a respective frequency of the audio frame to a new respective frequency in the respective modified audio frame; and decoding the modified audio frames using a decoding neural network, wherein the decoding neural network is configured to output a word sequence that is a transcription of the utterance.. .
Google Inc.

Computer-implemented efficient voice transcription

A computer-implemented system and method for efficient voice transcription is provided. A verbal message is processed by splitting the verbal message into segments and generating text for each of the segments via automated speech recognition.
Intellisist, Inc.

Insertion of characters in speech recognition

One embodiment provides a method, including: receiving, from an audio capture device, speech input; converting, using a processor, the speech input to machine text; receiving, from an alternate input source, an input comprising at least one character; identifying, using a processor, a location associated with the machine text to insert the at least one character; and inserting, using a processor, the at least one character at the location identified. Other aspects are described and claimed..
Lenovo (singapore) Pte. Ltd.

System and learning alternate pronunciations for speech recognition

A system and method for learning alternate pronunciations for speech recognition is disclosed. Alternative name pronunciations may be covered, through pronunciation learning, that have not been previously covered in a general pronunciation dictionary.
Interactive Intelligence Group, Inc.

Method and device for updating language model and performing speech recognition based on language model

A method of updating a grammar model used during speech recognition includes obtaining a corpus including at least one word, obtaining the at least one word from the corpus, splitting the at least one obtained word into at least one segment, generating a hint for recombining the at least one segment into the at least one word, and updating the grammar model by using at least one segment comprising the hint.. .
Samsung Electronics Co., Ltd.

Communication a smart phone with a text recognition module

A portable device can transmit information through one of a mobile phone network and an internet, wherein the portable device includes a text-based communication module to allow a user may synchronously transmit or receive data through a local area network, wherein the data is text, audio, video or the combination thereof. The text-based communication module of the portable device includes a text-to-speech recognition module used to convert a text data for outputting the text data by vocal, and a read determination module for determining read target terminals and unread target terminals when a user of the portable phone device activates the read determination module..

Business listing search

A method of operating a voice-enabled business directory search system includes receiving category-business pairs, each category-business pair including a business category and a specific business, and establishing a data structure having nodes based on the category-business pairs. Each node of the data structure is associated with one or more business categories and a speech recognition language model for recognizing specific businesses associated with the one or more businesses categories..
Google Inc.

Speech recognition method and mobile terminal

A speech recognition method and a mobile terminal relate to the field of electronic and information technologies, and can flexibly perform speech collection and improve a speech recognition rate. The method includes acquiring, by a mobile terminal, an orientation/motion status of the mobile terminal, and determining, according to the orientation/motion status, a voice collection apparatus for voice collection; acquiring, by the mobile terminal, a speech signal from the voice collection apparatus; and recognizing, by the mobile terminal, the speech signal.
Huawei Technologies Co., Ltd.

Apparatus and acoustic score calculation and speech recognition

An apparatus for calculating acoustic score, a method of calculating acoustic score, an apparatus for speech recognition, a method of speech recognition, and an electronic device including the same are provided. An apparatus for calculating acoustic score includes a preprocessor configured to sequentially extract audio frames into windows and a score calculator configured to calculate an acoustic score of a window by using a deep neural network (dnn)-based acoustic model..
Samsung Electronics Co., Ltd.

Unsupervised training method, training apparatus, and training program for an n-gram language model based upon recognition reliability

A computer-based, unsupervised training method for an n-gram language model includes reading, by a computer, recognition results obtained as a result of speech recognition of speech data; acquiring, by the computer, a reliability for each of the read recognition results; referring, by the computer, to the recognition result and the acquired reliability to select an n-gram entry; and training, by the computer, the n-gram language model about selected one of more of the n-gram entries using all recognition results.. .
International Business Machines Corporation

Speech recognition apparatus and method

A speech recognition apparatus includes a processor configured to recognize a user's speech using any one or combination of two or more of an acoustic model, a pronunciation dictionary including primitive words, and a language model including primitive words; and correct word spacing in a result of speech recognition based on a word-spacing model.. .
Samsung Electronics Co., Ltd.

System and natural language driven search and discovery in large data sources

In some natural language understanding (nlu) applications, results may not be tailored to the user's query. In an embodiment of the present invention, a method includes tagging elements of automated speech recognition (asr) data based on an ontology stored in a memory.
Nuance Communications, Inc.

Vehicle and control method thereof

A vehicle includes: an input unit configured to receive an execution command for speech recognition; a calculator configured to calculate a time in which the vehicle is expected to arrive at an obstacle existing on a road on which the vehicle travels; and a speech recognition controller configured to compare the calculated time in which the vehicle is expected to arrive at the obstacle to a time in which a voice command input is expected to be completed to determine whether to perform dynamic noise removal pre-processing.. .
Hyundai Motor Company

Real-time adaptation of in-vehicle speech recognition systems

A system and method of controlling an automatic speech recognition (asr) system includes: detecting changes in ambient noise via a microphone in a vehicle equipped with the asr system; determining an environmental noise compensation value and a channel bias compensation value based on the detected changes; and applying the environmental noise compensation value and a channel bias compensation value to speech received by the asr system.. .
Gm Global Technology Operations Llc

Interest notification apparatus and method

An apparatus for notification of speech of interest to a user includes a voice analyzer configured to recognize speech, evaluate a relevance between a result of the speech recognition and a determined user's topic of interest, and determine whether to provide a notification; and an outputter configured to, in response to the voice analyzer determining to provide the notification, generate and output a notification message.. .
Samsung Electronics Co., Ltd.

Speech recognition apparatus and method

A speech recognition apparatus includes a converter configured to convert a captured user speech signal into a standardized speech signal format, one or more processing devices configured to apply the standardized speech signal to an acoustic model, and recognize the user speech signal based on a result of application to the acoustic model.. .
Samsung Electronics Co., Ltd.

Layered contextual configuration management system and method and minimized input speech recognition user interface interactions experience

In an effort to customize or enhance software applications, configuration data is often used. Configuration settings that are editable by users need not to be limited to a simple flat entry that can be taken out of context anymore.

Multiple parallel dialogs in smart phone applications

An arrangement is described for conducting natural language dialogs with a user on a mobile device using automatic speech recognition (asr) and multiple different dialog applications. A user interface provides for user interaction with the dialogue applications in natural language dialogs.
Nuance Communications, Inc.

Using word confidence score, insertion and substitution thresholds for selected words in speech recognition

A method and system for improving the accuracy of a speech recognition system using were confidence score (wcs) processing is introduced. Parameters in a decoder are selected to minimize a weighted total error rate, such that deletion errors are weighted more heavily than substitution and insertion errors.
Adacel, Inc.

Speech recognition system and method

A system and a method of speech recognition which enable a spoken language to be automatically identified while recognizing speech of a person who vocalize to effectively process multilingual speech recognition without a separate process for user registration or recognized language setting such as use of a button for allowing a user to manually select a language to be vocalized and support speech recognition of each language to be automatically performed even though persons who speak different languages vocalize by using one terminal to increase convenience of the user.. .
Electronics And Telecommunications Research Institute

Methods employing phase state analysis for use in speech synthesis and recognition

A computer-implemented method for automatically analyzing, predicting, and/or modifying acoustic units of prosodic human speech utterances for use in speech synthesis or speech recognition. Possible steps include: initiating analysis of acoustic wave data representing the human speech utterances, via the phase state of the acoustic wave data; using one or more phase state defined acoustic wave metrics as common elements for analyzing, and optionally modifying, pitch, amplitude, duration, and other measurable acoustic parameters of the acoustic wave data, at predetermined time intervals; analyzing acoustic wave data representing a selected acoustic unit to determine the phase state of the acoustic unit; and analyzing the acoustic wave data representing the selected acoustic unit to determine at least one acoustic parameter of the acoustic unit with reference to the determined phase state of the selected acoustic unit.
Lessac Technologies, Inc.

System and three-way call detection

A system for detecting three-way calls in a monitored telephone conversation includes a speech recognition processor that transcribes the monitored telephone conversation and associates characteristics of the monitored telephone conversation with a transcript thereof, a database to store the transcript and the characteristics associated therewith, and a three-way call detection processor to analyze the characteristics of the conversation and to detect therefrom the addition of one or more parties to the conversation. The system preferably includes at least one domain-specific language model that the speech recognition processor utilizes to transcribe the conversation.
Dsi-iti, Llc

Corrective feedback loop for automated speech recognition

A method for facilitating the updating of a language model includes receiving, at a client device, via a microphone, an audio message corresponding to speech of a user; communicating the audio message to a first remote server; receiving, that the client device, a result, transcribed at the first remote server using an automatic speech recognition system (“asr”), from the audio message; receiving, at the client device from the user, an affirmation of the result; storing, at the client device, the result in association with an identifier corresponding to the audio message; and communicating, to a second remote server, the stored result together with the identifier.. .
Amazon Technologies, Inc.

Method for controlling operation of an agricultural machine and system thereof

A method for controlling operation of an agricultural machine and system thereof are disclosed. The method may comprise providing a portable device that has an input device, a processing unit, a storage unit, an output device, and a transceiver device configured for wireless data transmission; receiving a voice control command over a microphone device of the input device of the portable device; determining command text data from the voice control command by processing the voice control command by a speech recognition application running on the processing unit of the portable device; providing machine control signals assigned to a machine control function in a control device of an agricultural machine located remotely from the portable device; and controlling the operation of the agricultural machine according to the machine control signals..
Kverneland Group Mechatronics B.v.

Speech recognition apparatus, speech recognition method, and electronic device

A speech recognition apparatus includes a probability calculator configured to calculate phoneme probabilities of an audio signal using an acoustic model; a candidate set extractor configured to extract a candidate set from a recognition target list; and a result returner configured to return a recognition result of the audio signal based on the calculated phoneme probabilities and the extracted candidate set.. .
Samsung Electronics Co., Ltd.

Testing words in a pronunciation lexicon

A method, for testing words defined in a pronunciation lexicon used in an automatic speech recognition (asr) system, is provided. The method includes: obtaining test sentences which can be accepted by a language model used in the asr system.
International Business Machines Corporation

Methods for using simultaneous speech inputs to determine an electronic competitive challenge winner

An electronic competitive challenge is disclosed. A prompt is presented to a group of participants.
John Nicholas And Kristin Gross Trust U/a/d April 13, 2010

Training deep neural network for acoustic modeling in speech recognition

A method is provided for training a deep neural network (dnn) for acoustic modeling in speech recognition. The method includes reading central frames and side frames as input frames from a memory.
International Business Machines Corporation

Privacy-preserving training corpus selection

The present disclosure relates to training a speech recognition system. A system that includes an automated speech recognizer and receives data from a client device.
Google Inc.

Language model speech endpointing

An automatic speech recognition (asr) system detects an endpoint of an utterance using the active hypotheses under consideration by a decoder. The asr system calculates the amount of non-speech detected by a plurality of hypotheses and weights the non-speech duration by the probability of each hypotheses.
Amazon Technologies, Inc.

Speech recognition services

Various systems and methods for providing speech recognition services are described herein. A user device for providing speech recognition services includes a speech module to maintain a speech recognition model of a user of the user device; a user interaction module to detect an initiation of an interaction between the user and a target device; and a transmission module to transmit the speech recognition model to the target device, the target device to use the speech recognition model to enhance a speech recognition process executed by the target device during the interaction between the user and the target device..
Intel Corporation

Method and system of automatic speech recognition with dynamic vocabularies

A system, article, and method of automatic speech recognition with dynamic vocabularies is described herein.. .
Intel Corporation

Language model modification for local speech recognition systems using remote sources

A language model is modified for a local speech recognition system using remote speech recognition sources. In one example, a speech utterance is received.

Recognition result output device, recognition result output method, and computer program product

According to an embodiment, a speech recognition result output device includes a storage and processing circuitry. The storage is configured to store a language model, for speech recognition.
Kabushiki Kaisha Toshiba

Memory bandwidth management for deep learning applications

In a data center, neural network evaluations can be included for services involving image or speech recognition by using a field programmable gate array (fpga) or other parallel processor. The memory bandwidth limitations of providing weighted data sets from an external memory to the fpga (or other parallel processor) can be managed by queuing up input data from the plurality of cores executing the services at the fpga (or other parallel processor) in batches of at least two feature vectors.
Microsoft Technology Licensing, Llc

Intuitive computing methods and systems

In one particular aspect, a portable computing device (e.g., a tablet or smartphone) senses audio and/or image content from a user's environment, and initiates one or more recognition agents (e.g., performing image watermark recognition, image recognition, object recognition, facial recognition, barcode recognition, optical character recognition, audio watermark recognition, speech recognition, speaker recognition, or music recognition). Resource allocation to a recognition agent can be varied based on (a) progress of the recognition agent to achieve its recognition goal, and (b) user interest data indicating user interest in the output of the recognition agent.
Digimarc Corporation

Speech recognition apparatus and speech recognition method

A speech recognition apparatus includes: a sound collection unit that collects a sound signal; a sound source localization unit that calculates a spatial spectrum from the sound signal that is collected by the sound collection unit and uses the calculated spatial spectrum to perform sound source localization; a speech zone determination unit that determines a zone in which a power of the spatial spectrum that is calculated by the sound source localization unit exceeds a predetermined threshold value based on a vehicle state; and a speech recognition unit that performs speech recognition with respect to a sound signal of the zone determined by the speech zone determination unit.. .
Honda Motor Co., Ltd.

Speech recognition with acoustic models

Methods, systems, and apparatus, including computer programs encoded on computer storage media for learning pronunciations from acoustic sequences. One method includes receiving an acoustic sequence, the acoustic sequence representing an utterance, and the acoustic sequence comprising a sequence of multiple frames of acoustic data at each of a plurality of time steps; stacking one or more frames of acoustic data to generate a sequence of modified frames of acoustic data; processing the sequence of modified frames of acoustic data through an acoustic modeling neural network comprising one or more recurrent neural network (rnn) layers and a final ctc output layer to generate a neural network output, wherein processing the sequence of modified frames of acoustic data comprises: sub sampling the modified frames of acoustic data; and processing each subsampled modified frame of acoustic data through the acoustic modeling neural network..
Google Inc.

Speech recognition on board of an aircraft

A method of performing speech recognition on board an aircraft, a computer program for executing the method, as well as a speech recognition unit for performing speech recognition on board an aircraft. The method comprises receiving a speech signal spoken by a user; performing speaker recognition on the speech signal to identify the user from the speech signal; selecting a speech recognition user profile which is associated with the identified user; and performing speech recognition on the speech signal using the selected user profile..
Airbus Operations Gmbh

Voice authentication and speech recognition system and method

A method for configuring a speech recognition system comprises obtaining a speech sample utilised by a voice authentication system in a voice authentication process. The speech sample is processed to generate acoustic models for units of speech associated with the speech sample.
Auraya Pty Ltd

Speech recognition operating a speech recognition system with a mobile unit and an external server

A voice recognition system having a mobile unit and an external server. The mobile unit includes a memory unit that stores voice model data having at least one expression set with expressions, a voice recognition unit, and a data interface that can set up a data-oriented connection to a data interface of the external server.
Volkswagen Ag

Adapting voice input processing based on voice input characteristics

One embodiment provides a method, including: receiving, at an audio receiver, user voice data; identifying, using a processor, at least one characteristic of the voice data; obtaining, using the processor, a speech recognition processing result of the voice data; and changing a standard response to the user voice data to an adapted response based on the at least one characteristic and the speech recognition processing result. Other aspects are described and claimed..
Lenovo (singapore) Pte. Ltd.

Negative n-gram biasing

Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for performing dynamic, stroke-based alignment of touch displays. In one aspect, a method includes obtaining a candidate transcription that an automated speech recognizer generates for an utterance, determining a particular context associated with the utterance, determining that a particular n-gram that is included in the candidate transcription is included among a set of undesirable n-grams that is associated with the context, adjusting a speech recognition confidence score associated with the transcription based on determining that the particular n-gram that is included in the candidate transcription is included among the set of undesirable n-grams that is associated with the context, and determining whether to provide the candidate transcription for output based at least on the adjusted speech recognition confidence score..
Google Inc.

System and automated evaluation of transcription quality

Systems and methods automatedly evaluate a transcription quality. Audio data is obtained.
Verint Systems Ltd.

Multi-microphone speech recognition systems and related techniques

A speech recognition system for resolving impaired utterances can have a speech recognition engine configured to receive a plurality of representations of an utterance and concurrently to determine a plurality of highest-likelihood transcription candidates corresponding to each respective representation of the utterance. The recognition system can also have a selector configured to determine a most-likely accurate transcription from among the transcription candidates.
Apple Inc.

Rapid speech recognition adaptation using acoustic input

A method includes the following steps. An acoustic input is obtained from a user, including issuing a verbal prompt to the user and receiving the acoustic input from the user in response to the verbal prompt.
International Business Machines Corporation

Speech recognition system, speech recognition request device, speech recognition method, speech recognition program, and recording medium

Provided is a speech recognition system, including: a first information processing device including a speech recognition processing unit for receiving data to be used for speech recognition transmitted via a network, carrying out speech recognition processing, and returning resultant data; and a second information processing device connected to the first information processing device via the network. The second information processing device performs conversion of the data into data having a format that disables a content thereof from being perceived and also enables the speech recognition processing unit to perform the speech recognition processing.
Nec Corporation

Multi-microphone speech recognition systems and related techniques

A speech recognition system for resolving impaired utterances can have a speech recognition engine configured to receive a plurality of representations of an utterance and concurrently to determine a plurality of highest-likelihood transcription candidates corresponding to each respective representation of the utterance. The recognition system can also have a selector configured to determine a most-likely accurate transcription from among the transcription candidates.
Apple Inc.

Robust speech recognition in the presence of echo and noise using multiple signals for discrimination

Systems and methods for speech recognition system having a speech processor that is trained to recognize speech by considering (1) a raw microphone signal that includes an echo signal and (2) different types of echo information signals from an echo cancellation system (and optionally different types of ambient noise suppression signals from a noise suppressor). The different types of echo information signals may include those used for echo cancelation and those having echo information.
Apple Inc.

Rapid speech recognition adaptation using acoustic input

A method includes the following steps. An acoustic input is obtained from a user, including issuing a verbal prompt to the user and receiving the acoustic input from the user in response to the verbal prompt.
International Business Machines Corporation

Speech enhancement method, speech recognition method, clustering

The present invention discloses a speech enhancement method, a speech recognition method, a clustering method and a device. The method includes: selecting a feature vector clustering center best matched with the feature vector of a first frame speech part of a test speech; performing direct to the feature vectors of other frame speech parts contained in the test speech: selecting a feature vector clustering center best matched with the feature vector of the speech part from a feature vector clustering center best matched with the feature vector of a previous frame speech part to the speech part and a feature vector clustering center adjacent to the feature vector clustering center best matched with the feature vector of the previous frame speech part; and reconstructing the feature vector of the test speech according to the feature vectors of each frame speech part contained in the test speech and the selected feature vector clustering center.
Le Shi Zhi Xin Electronic Technology (tianjin) Limited

Automated closed captioning using temporal data

One or more systems and/or techniques are provided for automatic closed captioning for media content. In an example, real-time content, occurring within a threshold timespan of a broadcast of media content (e.g., social network posts occurring during and an hour before a live broadcast of an interview), may be accessed.
Microsoft Technology Licensing, Llc

Methods and reducing latency in speech recognition applications

Methods and apparatus for reducing latency in speech recognition applications. The method comprises receive first audio comprising speech from a user of a computing device, detecting an end of speech in the first audio, generating an asr result based, at least in part, on a portion of the first audio prior to the detected end of speech, determining whether a valid action can be performed by a speech-enabled application installed on the computing device using the asr result, and processing second audio when it is determined that a valid action cannot be performed by the speech-enabled application using the asr result..
Nuance Communications, Inc.

Automated learning for speech-based applications

Systems and methods for modifying a computer-based speech recognition system. A speech utterance is processed with the computer-based speech recognition system using a set of internal representations, which may comprise parameters for recognizing speech in a speech utterance, such as parameters of an acoustic model and/or a language model.
Next It Corporation

Dynamic language model

Methods, systems, and apparatus, including computer programs encoded on computer storage media, for speech recognition. One of the methods includes receiving a base language model for speech recognition including a first word sequence having a base probability value; receiving a voice search query associated with a query context; determining that a customized language model is to be used when the query context satisfies one or more criteria associated with the customized language model; obtaining the customized language model, the customized language model including the first word sequence having an adjusted probability value being the base probability value adjusted according to the query context; and converting the voice search query to a text search query based on one or more probabilities, each of the probabilities corresponding to a word sequence in a group of one or more word sequences, the group including the first word sequence having the adjusted probability value..
Google Inc.

Visual voice search

A computer implemented method and system for initiating an action uses text converted from a user's speech. A user's speech is converted into text using an automatic speech recognition (asr) system of a device.
International Business Machines Corporation

Speech recognition using loosely coupled components

An automatic speech recognition system includes an audio capture component, a speech recognition processing component, and a result processing component which are distributed among two or more logical devices and/or two or more physical devices. In particular, the audio capture component may be located on a different logical device and/or physical device from the result processing component.
Mmodal Ip Llc

Electronic devices with voice command and contextual data processing capabilities

An electronic device may capture a voice command from a user. The electronic device may store contextual information about the state of the electronic device when the voice command is received.
Apple Inc.

Cross-language speech recognition and translation

Technologies are described herein for cross-language speech recognition and translation. An example method of speech recognition and translation includes receiving an input utterance in a first language, the input utterance having at least one name of a named entity included therein and being pronounced in a second language, utilizing a customized language model to process at least a portion of the input utterance, and identifying the at least one name of the named entity from the input utterance utilizing a phonetic representation of the at least one name of the named entity.
Microsoft Technology Licensing, Llc

Speech recognition for keywords

Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for speech recognition are disclosed. In one aspect, a method includes receiving a candidate adword from an advertiser.
Google Inc.

Visual indication of a recognized voice-initiated action

A computing device is described that outputs, for display, an initial speech recognition graphical user interface (gui) having at least one element. The computing device receives audio data and determines, based on the audio data, a voice-initiated action.
Google Inc.

Speech recognition device, system and method

According to a speech recognition device of the present invention, even in the case where there are many abutting sight-line detection areas or many overlapping portions between sight-line detection areas, as exemplified by the case where a plurality of icons (display objects) are congested on a display screen, it is possible to narrow down to thereby identify one icon (display object) efficiently using a sight line and a speech-based operation, and further to decrease false recognition, so that the user's convenience can be enhanced.. .
Mitsubishi Electric Corporation

Method and exploiting language skill information in automatic speech recognition

Typical speech recognition systems usually use speaker-specific speech data to apply speaker adaptation to models and parameters associated with the speech recognition system. Given that speaker-specific speech data may not be available to the speech recognition system, information indicative of language skills is employed in adapting configurations of a speech recognition system.
Nuance Communications, Inc.

System and optimizing speech recognition and natural language parameters with user feedback

Disclosed herein are systems, methods, and non-transitory computer-readable storage media for assigning saliency weights to words of an asr model. The saliency values assigned to words within an asr model are based on human perception judgments of previous transcripts.
At&t Intellectual Property I, L.p.

Method for an automated distress alert system with speech recognition

A software application for an automated alert system that utilizes speech recognition software to determine the emergency status of a person and respond accordingly. The software application includes monitoring ambient noise through a microphone for an utterance.

Method and improving speech recognition processing performance

Computing the feature maximum mutual information (fmmi) method requires multiplication of vectors with a huge matrix. The huge matrix is subdivided into block sub-matrices.
Nuance Communications, Inc.

Processing multi-channel audio waveforms

Methods, including computer programs encoded on a computer storage medium, for enhancing the processing of audio waveforms for speech recognition using various neural network processing techniques. In one aspect, a method includes: receiving multiple channels of audio data corresponding to an utterance; convolving each of multiple filters, in a time domain, with each of the multiple channels of audio waveform data to generate convolution outputs, wherein the multiple filters have parameters that have been learned during a training process that jointly trains the multiple filters and trains a deep neural network as an acoustic model; combining, for each of the multiple filters, the convolution outputs for the filter for the multiple channels of audio waveform data; inputting the combined convolution outputs to the deep neural network trained jointly with the multiple filters; and providing a transcription for the utterance that is determined..
Google Inc.

Correcting voice recognition using selective re-speak

Implementations of the present disclosure include actions of providing first text for display on a computing device of a user, the first text being provided from a first speech recognition engine based on first speech received from the computing device, and being displayed as a search query, receiving a speech correction indication from the computing device, the speech correction indication indicating a portion of the first text that is to be corrected, receiving second speech from the computing device, receiving second text from a second speech recognition engine based on the second speech, the second speech recognition engine being different from the first speech recognition engine, replacing the portion of the first text with the second text to provide a combined text, and providing the combined text for display on the computing device as a revised search query.. .
Google Inc.

Method and automatic speech recognition

A method of automatic speech recognition, the method comprising the steps of receiving a speech signal, dividing the speech signal into time windows, for each time window, determining acoustic parameters of the speech signal within that window, and identifying speech features from the acoustic parameters, such that a sequence of speech features are generated for the speech signal, separating the sequence of speech features into a sequence of phonological segments, and comparing the sequential phonological segments to a stored lexicon to identify one or more words in the speech signal.. .
Isis Innovation Ltd.

Voice command triggered speech enhancement

Received data representing speech is stored, and a trigger detection block detects a presence of data representing a trigger phrase in the received data. In response, a first part of the stored data representing at least a part of the trigger phrase is supplied to an adaptive speech enhancement block, which is trained on the first part of the stored data to derive adapted parameters for the speech enhancement block.
Cirrus Logic International Semiconductor Ltd.

Methods and apparatuses for automatic speech recognition

Exemplary embodiments of methods and apparatuses for automatic speech recognition are described. First model parameters associated with a first representation of an input signal are generated.
Apple Inc.

System and wireless ordering using speech recognition

Disclosed herein are systems, computer-implemented methods, and tangible computer-readable media for placing an order for a user. The method includes receiving a search from a user, identifying a product category based on the search, presenting to the user a general ordering screen based on the identified product category, selecting and activating a speech recognition grammar tuned for the identified product category, recognizing a first received user utterance with the activated tuned grammar to identify a vendor who offers items in the identified product category, recognizing a second received user utterance with the activated tuned grammar to identify a specific item from the identified vendor, and placing an order for the specific item with the identified vendor for the user.
At&t Intellectual Property I, L.p.

Methods and speech recognition using visual information

Methods and apparatus for using visual information to facilitate a speech recognition process. The method comprises dividing received audio information into a plurality of audio frames, determining for each of the plurality of audio frames, whether the audio information in the audio frame comprises speech from the foreground speaker, wherein the determining is based, at least in part, on received visual information, and transmitting the audio frame to an automatic speech recognition (asr) engine for speech recognition when it is determined that the audio frame comprises speech from the foreground speaker..
Nuance Communications, Inc.

Speech recognition apparatus and computer program product for speech recognition

In a speech recognition apparatus, a speech driver fetches a guidance speech-data as a reference speech-data, and outputs the reference speech-data to a recognition core unit. A guidance speech into which the guidance speech-data is converted is outputted by a speaker to cause a microphone to receive the outputted guidance speech, which will be converted into an inputted guidance speech-data.
Denso Corporation

Method for building language model, speech recognition method and electronic apparatus

A method for building a language model, a speech recognition method and an electronic apparatus are provided. The speech recognition method includes the following steps.
Via Technologies, Inc.

Interpretation apparatus and method

According to one embodiment, an interpretation apparatus includes a translator, a calculator and a generator. The translator performs machine translation on a speech recognition result corresponding to an input speech audio from a first language into a second language to generate a machine translation result.
Kabushiki Kaisha Toshiba

Speaker-dependent voice-activated camera system

A voice-activated camera system for a computing device. The voice-activated camera system includes a processor, a camera module, a speech recognition module and a microphone for accepting user voice input.

Speech recognition using a database and dynamic gate commands

A system and method of controlling an automatic speech recognition (asr) system includes: receiving speech at the asr system from a vehicle occupant that includes a command to control a vehicle function; identifying a gate command from the speech; associating the identified gate command with the command to control the vehicle function; storing the associated gate command and vehicle command in a database; receiving additional speech at the asr system from the vehicle occupant; detecting the gate command in the additional speech; and accessing the stored gate command and vehicle command from the database.. .
Gm Global Technology Operations Llc

Controlling speech recognition systems based on radio station availability

A system and method of controlling an automatic speech recognition (asr) system includes: determining the location of a vehicle; identifying terrestrial radio stations in a database that are within a range of the vehicle location based on geographic locations of the terrestrial radio stations stored in the database; and altering the content of a speech grammar used by the asr system to process speech received in the vehicle that requests a terrestrial radio station.. .
Gm Global Technology Operations Llc

System and providing scalable educational content

Presented are a system and method for providing a graphical user interface (gui) based modular platform having educational content. The method includes providing an interactive gui on a computing device accessible by a user, receiving a first indication of a language being studied, displaying a gui layer presenting a selection of level, unit, activity, and/or lesson, receiving a lesson selection, and computing a rating or score of the user's performance for the lesson.
Brainpop Esl Llc

System and advanced turn-taking interactive spoken dialog systems

Disclosed herein are systems, methods, and non-transitory computer-readable storage media for advanced turn-taking in an interactive spoken dialog system. A system configured according to this disclosure can incrementally process speech prior to completion of the speech utterance, and can communicate partial speech recognition results upon finding particular conditions.
At&t Intellectual Property I, L.p.

Hotword detection on multiple devices

Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for hotword detection on multiple devices are disclosed. In one aspect, a method includes the actions of receiving, by a computing device, audio data that corresponds to an utterance.
Google Inc.

Speech controlled sex toy

A speech controlled sex toy receives verbal commands through audio receiving device paired with the toy, converted the speech to control signals to adjust the operation or intensity of motor within the sex toy. The sex toy has a speech recognition module to convert human voice to machine understandable control signals, and a microprocessor control module to communicate control signals to adjust the motion of motor.
Aipleasures, Inc.

Method and system of random access compression of transducer data for automatic speech recognition decoding

A system, article, and method of random access compression of transducer data for automatic speech recognition decoding.. .
Intel Corporation

Reminder device wearable by a user

A wearable reminder device is provided using two different types of gestures obtained from two different types of sensors. The device outputs audio to a user but does not have a display, a keypad, or speech recognition software, therewith significantly reducing size, storage requirements and power consumption.
Santa Clara University

Mahjong game system using touch panel

A mahjong game system using touch panel includes: a computer, and four player consoles in communication with the computer by a wired or wireless communication method. The computer includes a touch panel at the top side for displaying game pictures and a mahjong game program installed therein and providing a speech recognition process for identifying multiple specific voice commands.
Egenpower Inc.

System and combining geographic metadata in automatic speech recognition language and acoustic models

Disclosed herein are systems, methods, and computer-readable storage media for a speech recognition application for directory assistance that is based on a user's spoken search query. The spoken search query is received by a portable device and portable device then determines its present location.
At&t Intellectual Property I, L.p.

Automatic generation of a database for speech recognition from video captions

A system and method for automatic generation of a database for speech recognition, comprising: a source of text signals; a source of audio signals comprising an audio representation of said text signals; a text words separation module configured to separate said text into a string of text words; an audio words separation module configured to separate said audio signal into a string of audio words; and a matching module configured to receive said string of text words and said string of audio words and store each pair of matching text word and audio word in a database.. .

Customizable and individualized speech recognition settings interface for users with language accents

A computer implemented method and system for customizing speech recognition for users with accents. A spoken language of a user is identified.
International Business Machines Corporation

Voice activity detection technologies, systems and methods employing the same

Voice activity detection technologies are disclosed. In some embodiments, the voice activity detection technologies determine whether the voice of a user of an electronic device is active based at least in part on biosignal data.
Intel Corporation

Method and system of environment sensitive automatic speech recognition

A system, article, and method of environment-sensitive automatic speech recognition.. .

Mixed speech recognition

The claimed subject matter includes a system and method for recognizing mixed speech from a source. The method includes training a first neural network to recognize the speech signal spoken by the speaker with a higher level of a speech characteristic from a mixed speech sample.
Microsoft Technology Licensing, Llc

Speech data recognition method, apparatus, and server for distinguishing regional accent

A speech data recognition method, apparatus, and server are for distinguishing regional accent. The speech data recognition method includes: calculating a speech recognition confidence and/or a signal-to-noise ratio of the speech data, and screening a regional speech data from the speech data based on the speech recognition confidence and/or the signal-to-noise ratio of the speech dat; and determining a region to which the regional speech data belongs based on a regional attribute of the regional speech data.
Baidu Online Network Technology (beijing) Co., Ltd.

Method and system for generating advanced feature discrimination vectors for use in speech recognition

A method of renormalizing high-resolution oscillator peaks, extracted from windowed samples of an audio signal, is disclosed. Feature vectors are generated for which variations in both fundamental frequency and time duration of speech are substantially mitigated.

Systems and methods for voice enabled traffic prioritization

A system and method capable of responding to an audible traffic alert by visually depicting the identified neighboring aircraft traffic on the onboard display is presented. The system and method employ speech recognition in order to minimize the visual and manual cognitive workload associated with responding to a traffic alert.
Honeywell International Inc.



Speech Recognition topics:
  • Speech Recognition
  • Communications
  • Computing Device
  • Heterogeneous
  • Conditional
  • Transcription
  • False Positive
  • Application Control
  • Natural Language
  • Embedded System
  • Electronic Device
  • Constraints
  • Central Processing Unit
  • Demultiplex
  • Interactive


  • Follow us on Twitter
    twitter icon@FreshPatents

    ###

    This listing is a sample listing of patent applications related to Speech Recognition for is only meant as a recent sample of applications filed, not a comprehensive history. There may be associated servicemarks and trademarks related to these patents. Please check with patent attorney if you need further assistance or plan to use for business purposes. This patent data is also published to the public by the USPTO and available for free on their website. Note that there may be alternative spellings for Speech Recognition with additional patents listed. Browse our RSS directory or Search for other possible listings.


    4.1602

    file did exist - file did put13758

    5 - 1 - 252