Follow us on Twitter
twitter icon@FreshPatents


Speech Recognition patents

      

This page is updated frequently with new Speech Recognition-related patent applications.

SALE: 220+ Speech Recognition-related patent PDFs



 Information processing device,  information processing, and program patent thumbnailInformation processing device, information processing, and program
[object] the technology that can improve accuracy of speech recognition for collected sound data is provided. [solution] provided is an information processing device including: a collected sound data acquisition portion that acquires collected sound data; and an output controller that causes an output portion to output at least whether or not a state of the collected sound data is suitable for speech recognition..
Sony Corporation


 Electronic devices having speech recognition functionality and operating methods of electronic devices patent thumbnailElectronic devices having speech recognition functionality and operating methods of electronic devices
Disclosed are electronic devices having speech recognition functionality and operating methods of the electronic devices. Operating methods may include selectively activating or deactivating speech recognition functionality of one or more electronic devices based on comparing priorities associated with the electronic devices, respectively.
Samsung Electronics Co., Ltd.


 System and  rapid customization of speech recognition models patent thumbnailSystem and rapid customization of speech recognition models
Disclosed herein are systems, methods, and non-transitory computer-readable storage media for generating domain-specific speech recognition models for a domain of interest by combining and tuning existing speech recognition models when a speech recognizer does not have access to a speech recognition model for that domain of interest and when available domain-specific data is below a minimum desired threshold to create a new domain-specific speech recognition model. A system configured to practice the method identifies a speech recognition domain and combines a set of speech recognition models, each speech recognition model of the set of speech recognition models being from a respective speech recognition domain.
Nuance Communications, Inc.


 Permutation invariant training for talker-independent multi-talker speech separation patent thumbnailPermutation invariant training for talker-independent multi-talker speech separation
The techniques described herein improve methods to equip a computing device to conduct automatic speech recognition (“asr”) in talker-independent multi-talker scenarios. In some examples, permutation invariant training of deep learning models can be used for talker-independent multi-talker scenarios.
Microsoft Technology Licensing, Llc


 Information processing device,  information processing, and program patent thumbnailInformation processing device, information processing, and program
[object] to provide technology capable of performing processing on a string recognized from input speech more efficiently. [solution] provided is an information processing device including: a processing unit acquisition portion configured to acquire one or more processing units, on the basis of noise, from a first recognition string obtained by performing speech recognition on first input speech; and a processor configured to, when any one of the one or more processing units is selected as a processing target, process the processing target..
Sony Corporation


 Restructuring deep neural network acoustic models patent thumbnailRestructuring deep neural network acoustic models
A deep neural network (dnn) model used in an automatic speech recognition (asr) system is restructured. A restructured dnn model may include fewer parameters compared to the original dnn model.
Microsoft Technology Licensing, Llc


 Information processing device,  information processing, and program patent thumbnailInformation processing device, information processing, and program
There is provided an information processing device technology that enables the user to know intuitively the situation in which the speech recognition processing is performed, the information processing device including: an information acquisition unit configured to acquire a parameter related to speech recognition processing on sound information based on sound collection; and an output unit configured to output display information used to display a speech recognition processing result for the sound information on the basis of a display mode specified depending on the parameter.. .
Sony Corporation


 Method of identifying contacts for initiating a communication using speech recognition patent thumbnailMethod of identifying contacts for initiating a communication using speech recognition
A method and system on an electronic device which uses speech recognition to initiate a communication from a mobile device having access to contact information for a number of contacts. In one example, the method comprises receiving through an audio input interface a voice input for initiating a communication, extracting from the voice input a type of communication and at least part of a contact name, and outputting, to an output interface, a selectable list of all contacts from the contact information which have the part of the contact name and which have a contact address associated with the type of communication.
2236008 Ontario Inc.


 Hands-free user authentication patent thumbnailHands-free user authentication
A foreign device (fd) authenticates a user by communicating with a personal device (pd) using an audible signal. A system detects audible signals within time windows, and the signals can include codes.
Soundhound, Inc.


 Distributed volume control for speech recognition patent thumbnailDistributed volume control for speech recognition
A system includes a first device having a microphone associated with a voice user interface (vui) and a first network interface, a first processor connected to the first network interface and controlling the first device, a second device having a speaker and a second network interface, and a second processor connected to the second network interface and controlling the second device. Upon connection of the second network interface to a network to which the first network interface is connected, the second processor causes the second device to output an identifiable sound through the speaker.
Bose Corporation


Methods and systems for determining and using a confidence level in speech systems

Methods and systems are provided for processing speech inputs for a controlling one or more vehicle systems of a vehicle. In one embodiment, a method includes: receiving speech input from an audio channel; performing, by a processor, speech recognition on the speech input to obtain recognized results; determining, by a processor, an accuracy level of the audio channel based on a comparison of the recognized results and predictive phraseology; determining, by a processor, an integrity level of the audio channel based on situational awareness information; communicating the recognized results, accuracy level, and the integrity level to a vehicle system; and selectively using the recognized results by the vehicle system based on the accuracy level and the integrity level..
Honeywell International Inc.

Information processing system and information processing method

There is provided an information processing system enabling a user to provide easily an instruction on whether to continue speech recognition processing on sound information, the information processing system including: a recognition control portion configured to control a speech recognition portion so that the speech recognition portion performs speech recognition processing on sound information input from a sound collection portion. The recognition control portion controls whether to continue the speech recognition processing on the basis of a gesture of a user detected at predetermined timing..
Sony Corporation

Speech recognition device and gaming machine

The utterance recognition device 5 comprises camera devices 511 and 512 taking dynamic images including the corners of mouth of respective personage for a plurality of personages, a microphone device 513 acquiring a voice of an utterance of the respective personages, a main unit 101 determining a personage who performs an utterance from the plurality of personages according to a motion of the corners of mouth of respective personage taken by the camera devices 511 and 512 when the microphone device 513 acquires a voice.. .
Aruze Gaming (hong Kong) Limited

Speaker identification device and registering features of registered speech for identifying speaker

[solving means] the speech recognition unit 102 extracts the text data corresponding to the registration speech, as the extraction text data. The registration speech is a speech input by a registration speaker reading aloud registration target text data that is preliminarily set text data.

Speech recognition wake-up of a handheld portable electronic device

A system and method for parallel speech recognition processing of multiple audio signals produced by multiple microphones in a handheld portable electronic device. In one embodiment, a primary processor transitions to a power-saving mode while an auxiliary processor remains active.
Apple Inc.

Methods and systems for recipe management

Systems and methods for obtaining content over the internet, identifying text within the content (e.g., such as closed captioning or recipe text) or creating text from the content using such technologies as speech recognition, analyzing the text for actionable directions, and translating those actionable directions into instructions suitable for network-connected cooking appliances. Certain embodiments provide additional guidance to avoid or correct mistakes in the cooking process, and allow for the customization of recipes to address, e.g., dietary restrictions, culinary preferences, translation into a foreign language, etc..
Koninklijke Philips N.v.

System and automatic speech recognition using parallel processing for weighted finite state transducer-based speech decoding

A system, article, and method of automatic speech recognition using parallel processing for weighted finite state transducer-based speech decoding.. .
Intel Corporation

Name recognition system

A speech recognition system uses, in one embodiment, an extended phonetic dictionary that is obtained by processing words in a user's set of databases, such as a user's contacts database, with a set of pronunciation guessers. The speech recognition system can use a conventional phonetic dictionary and the extended phonetic dictionary to recognize speech inputs that are user requests to use the contacts database, for example, to make a phone call, etc.
Apple Inc.

Disambiguation of vehicle speech commands

A system and method of recognizing speech in a vehicle. The method includes receiving a voice command at the vehicle via a microphone in the vehicle, and obtaining a recognition result from speech recognition performed on the received voice command.
Gm Global Technology Operations Llc

Apparatus and methods using a pattern matching speech recognition engine to train a natural language speech recognition engine

The technology of the present application provides a speech recognition system with at least two different speech recognition engines or a single engine speech recognition engine with at least two different modes of operation. The first speech recognition being used to match audio to text, which text may be words or phrases.
Nvoq Incorporated

Estimating clean speech features using manifold modeling

The technology described in this document can be embodied in a computer-implemented method that includes receiving, at one or more processing devices, a portion of an input signal representing noisy speech, and extracting, from the portion of the input signal, one or more frequency domain features of the noisy speech. The method also includes generating a set of projected features by projecting each of the one or more frequency domain features on a manifold that represents a model of frequency domain features for clean speech.

Speech recognition systems and methods using relative and absolute slot data

Methods and systems are provided for managing speech of a speech system. In one embodiment, a method includes: receiving, by a processor, relative information comprising graph data from at least one relative data datasource; processing, by a processor, the graph data of the relative information to determine at least one of an association and a relationship associated with an element defined in the speech system; and storing, by a processor, the at least one of association and relationship as relative slot data for use by at least one of a speech recognition method and a dialog management method..

Dynamic speech recognition data evaluation

Computing devices and methods for providing speech recognition data from one computing device to another device are disclosed. In one disclosed embodiment, audio input is received at a client device and processed to generate speech recognition data.

Automated speech recognition proxy system for natural language understanding

An interactive response system mixes hsr subsystems with asr subsystems to facilitate overall capability of voice user interfaces. The system permits imperfect asr subsystems to nonetheless relieve burden on hsr subsystems.
Interactions Llc

Re-recognizing speech with external data sources

Methods, including computer programs encoded on a computer storage medium, for improving speech recognition based on external data sources. In one aspect, a method includes obtaining an initial candidate transcription of an utterance using an automated speech recognizer and identifying, based on a language model that is not used by the automated speech recognizer in generating the initial candidate transcription, one or more terms that are phonetically similar to one or more terms that do occur in the initial candidate transcription.

Hybrid speech data processing in a vehicle

A computer-implemented method for hybrid speech data processing in a vehicle includes receiving a first speech input at an input device in the vehicle and digitizing the first speech input into packets. The method includes storing the packets at a memory for predetermined amount of time and transmitting the packets using a wireless voice communication channel to a speech recognition server.

Speech recognition system

A speech recognition system includes a speech acquisition unit for acquiring speeches uttered by a user for a preset sound acquisition period, a speech recognition unit for recognizing the speeches acquired by the speech acquisition unit, a determination unit for determining whether the user performs a predetermined operation or action, and a display control unit for displaying, when the determination unit determines that the user performs the predetermined operation or action, a function execution button for causing a navigation system to execute a function corresponding to a result of the recognition by the speech recognition unit on a display unit.. .

Phonotactic-based speech recognition & re-synthesis

Various implementations disclosed herein include a phonotactic post-processor configured to rescore the n-best phoneme candidates output by a primary ensemble phoneme neural network using a priori phonotactic information. In various implementations, one of the scored set of the n-best phoneme candidates is selected as a preferred estimate for a one-phoneme output decision by the phonotactic post-processor.

Hierarchical speech recognition decoder

A speech interpretation module interprets the audio of user utterances as sequences of words. To do so, the speech interpretation module parameterizes a literal corpus of expressions by identifying portions of the expressions that correspond to known concepts, and generates a parameterized statistical model from the resulting parameterized corpus.

Method and system for enabling a vehicle occupant to report a hazard associated with the surroundings of the vehicle

The present disclosure relates to a method performed by a hazard reporting system for enabling a vehicle occupant to, in an un-distractive and dynamic manner, report a hazard associated with the surroundings of a vehicle. The hazard reporting system receives a verbal hazard report from the vehicle occupant, which verbal hazard report comprises information related to a hazard associated with the surroundings of the vehicle.

Generation of phoneme-experts for speech recognition

Various implementations disclosed herein include an expert-assisted phoneme recognition neural network system configured to recognize phonemes within continuous large vocabulary speech sequences without using language specific models (“left-context”), look-ahead (“right-context”) information, or multi-pass sequence processing, and while operating within the resource constraints of low-power and real-time devices. To these ends, in various implementations, an expert-assisted phoneme recognition neural network system as described herein utilizes a-priori phonetic knowledge.

Method and system for generating advanced feature discrimination vectors for use in speech recognition

A method of renormalizing high-resolution oscillator peaks, extracted from windowed samples of an audio signal, is disclosed. Feature vectors are generated for which variations in both fundamental frequency and time duration of speech are substantially mitigated.

Phoneme-expert assisted speech recognition & re-synthesis

Various implementations disclosed herein include an expert-assisted phoneme recognition neural network system configured to recognize phonemes within continuous large vocabulary speech sequences without using language specific models (“left-context”), look-ahead (“right-context”) information, or multi-pass sequence processing, and while operating within the resource constraints of low-power and real-time devices. To these ends, in various implementations, an expert-assisted phoneme recognition neural network system as described herein utilizes a-priori phonetic knowledge.

Method and system for patients data collection and analysis

A conversational and embodied virtual assistant (va) with decision support (ds) capabilities that can simulate and improve upon information gathering sessions between clinicians, researchers, and patients. The system incorporates a conversational and embodied va and a ds and deploys natural interaction enabled by natural language processing, automatic speech recognition, and an animation framework capable of rendering character animation performances through generated verbal and nonverbal behaviors, all supplemented by on-screen prompts..

Systems and performing speech recognition

A system and method for performing speech recognition. A speech recognition engine includes a plurality of grammar paths each defining a recognized phrase.
Honeywell International Inc.

Method for scoring in an automatic speech recognition system

A system and method for speech recognition is provided. Embodiments may include receiving an audio signal at a first deep neural network (“dnn”) associated with a computing device.
Nuance Communications, Inc.

Noise suppressing apparatus, speech recognition apparatus, and noise suppressing method

A noise suppressing apparatus calculates a phase difference on the basis of a first and second sound signal obtained by a microphone array; calculates a first sound arrival rate on the basis of a first phase difference area and the phase difference and a second sound arrival rate on the basis of a second phase difference area and the phase difference; calculates a dissimilarity that represents a level of difference between the first sound arrival rate and the second sound arrival rate; determines whether the pickup target sound is included in the first sound signal on the basis of the dissimilarity; and determines a suppression coefficient to be applied to the frequency spectrum of the first sound signal, on the basis of a result of the determination of whether the pickup target sound is included and on the basis of the phase difference.. .
Fujitsu Limited

Vehicle aware speech recognition systems and methods

Methods and systems are provided for processing speech for an autonomous or semi-autonomous vehicle. In one embodiment, a method includes receiving, by a processor, context data generated by the vehicle; determining, by a processor, a dialog delivery method based on the context data; and selectively generating, by a processor, a dialog prompt to the user via at least one output device based on the dialog delivery method..
Gm Global Technology Operations Llc

Improving automatic speech recognition of multilingual named entities

Methods and systems are provided for improving speech recognition of multilingual named entities. In some embodiments, a list comprising a plurality of named entities may be accessed by a computing device.
Nuance Communications, Inc.

Speech recognition apparatus and speech recognition method

An apparatus includes a lip image recognition unit 103 to recognize a user state from image data which is information other than speech; a non-speech section deciding unit 104 to decide from the recognized user state whether the user is talking; a speech section detection threshold learning unit 106 to set a first speech section detection threshold (ssdt) from speech data when decided not talking, and a second ssdt from the speech data after conversion by a speech input unit when decided talking; a speech section detecting unit 107 to detect a speech section indicating talking from the speech data using the thresholds set, wherein if it cannot detect the speech section using the second ssdt, it detects the speech section using the first ssdt; and a speech recognition unit 108 to recognize speech data in the speech section detected, and to output a recognition result.. .
Mitsubishi Electric Corporation

Acoustic model training

A method, executed by a computer, includes receiving a channel recording corresponding to a conversation, receiving a transcription for the conversation, generating a conversation-specific language model for the conversation using the transcription, and conducting speech recognition on the channel recording using the conversation-specific language model to provide time boundaries and written language corresponding to utterances within the channel recording. The method further includes determining sentence or phrase boundaries for the transcription, aligning written language within the one or more transcriptions with the written language corresponding to the utterances with the channel recording to provide sentence or phrase boundaries for the channel recording, and training a speech recognizer according to the sentence or phrase boundaries for the transcription and the sentence or phrase boundaries for the channel recording.
International Business Machines Corporation

Speech recognition and text-to-speech learning system

An example text-to-speech learning system performs a method for generating a pronunciation sequence conversion model. The method includes generating a first pronunciation sequence from a speech input of a training pair and generating a second pronunciation sequence from a text input of the training pair.
Microsoft Technology Licensing, Llc

Secure nonscheduled video visitation system

Described are methods and systems in which the censorship and supervision tasks normally performed by secured facility personnel are augmented or automated entirely by a secure nonscheduled video visitation system. In embodiments, the secure nonscheduled video visitation system performs voice biometrics, speech recognition, non-verbal audio classification, fingerprint and other biometric authentication, image object classification, facial recognition, body joint location determination analysis, and/or optical character recognition on the video visitation data.
Global Tel *link Corporation

Characterizing, selecting and adapting audio and acoustic training data for automatic speech recognition systems

A system for and method of characterizing a target application acoustic domain analyzes one or more speech data samples from the target application acoustic domain to determine one or more target acoustic characteristics, including a codec type and bit-rate associated with the speech data samples. The determined target acoustic characteristics may also include other aspects of the target speech data samples such as sampling frequency, active bandwidth, noise level, reverberation level, clipping level, and speaking rate.
Nuance Communications, Inc.

Technologies for automatic speech recognition using articulatory parameters

Technologies for automatic speech recognition using articulatory parameters are disclosed. An automatic speech recognition device may capture speech data from a speaker and also capture an image of the speaker.
Intel Corporation

Adaptive audio enhancement for multichannel speech recognition

Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for neural network adaptive beamforming for multichannel speech recognition are disclosed. In one aspect, a method includes the actions of receiving a first channel of audio data corresponding to an utterance and a second channel of audio data corresponding to the utterance.
Google Inc.

Server-side asr adaptation to speaker, device and noise condition via non-asr audio transmission

A mobile device is adapted for automatic speech recognition (asr). A user interface for interaction with a user includes an input microphone for obtaining speech inputs from the user for automatic speech recognition, and an output interface for system output to the user based on asr results that correspond to the speech input.
Nuance Communications, Inc.

Testing words in a pronunciation lexicon

A method, for testing words defined in a pronunciation lexicon used in an automatic speech recognition (asr) system, is provided. The method includes: obtaining test sentences which can be accepted by a language model used in the asr system.
International Business Machines Corporation

Finding of a target document in a spoken language processing

Methods and systems are provided for finding a target document in spoken language processing. One of the methods includes calculating a score of each document in a document set, in response to a receipt of first n words of output of an automatic speech recognition (asr) system, n being equal or greater than zero.
International Business Machines Corporation

Use of human input recognition to prevent contamination

Embodiments of a system and method for processing and recognizing non-contact types of human input to prevent contamination are generally described herein. In example embodiments, human input is captured, recognized, and used to provide active input for control or data entry into a user interface, the human input may be provided in variety of forms detectable by recognition techniques such as speech recognition, gesture recognition, identification recognition, and facial recognition.
Medivators Inc.

Digital video synthesis

A method which includes: detecting phrases in a transcript of an audiovisual file; applying a speech recognition algorithm to the audiovisual file and to a list of words of the phrase, to output a temporal location of each of the words that are uttered in the audio channel; compiling a list of sub-phrases of each of the phrases; creating a temporal sub-phrase map that comprises a temporal location of each of the sub-phrases; extracting the uttered sub-phrases from the audiovisual file, to create multiple sub-phrase audiovisual files; and constructing a database the multiple sub-phrase audiovisual files and of the sub-phrase uttered in each of the files. The method may also include: receiving a phrase; querying the database for audiovisual files which comprise uttered sub-phrases of the phrase; and splicing at least some of the audiovisual files to a compilation audiovisual file in which the phrase is uttered..
Al Levy Technologies Ltd.

Speech recognition

A speech recognition system comprises: an input, for receiving an input signal from at least one microphone; a first buffer, for storing the input signal; a noise reduction block, for receiving the input signal and generating a noise reduced input signal; a speech recognition engine, for receiving either the input signal output from the first buffer or the noise reduced input signal from the noise reduction block; and a selection circuit for directing either the input signal output from the first buffer or the noise reduced input signal from the noise reduction block to the speech recognition engine.. .
Cirrus Logic International Semiconductor Ltd.

Anchored speech detection and speech recognition

A system configured to process speech commands may classify incoming audio as desired speech, undesired speech, or non-speech. Desired speech is speech that is from a same speaker as reference speech.
Amazon Technologies, Inc.

Negative n-gram biasing

Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for performing dynamic, stroke-based alignment of touch displays. In one aspect, a method includes obtaining a candidate transcription that an automated speech recognizer generates for an utterance, determining a particular context associated with the utterance, determining that a particular n-gram that is included in the candidate transcription is included among a set of undesirable n-grams that is associated with the context, adjusting a speech recognition confidence score associated with the transcription based on determining that the particular n-gram that is included in the candidate transcription is included among the set of undesirable n-grams that is associated with the context, and determining whether to provide the candidate transcription for output based at least on the adjusted speech recognition confidence score..
Google Inc.

Pre-training speech recognition

A pre-training apparatus and method for recognition speech, which initialize, by layers, a deep neural network to correct a node connection weight. The pre-training apparatus for speech recognition includes an input unit configured to receive speech data, a model generation unit configured to initialize a connection weight of a deep neural network, based on the speech data, and an output unit configured to output information about the connection weight.
Electronics And Telecommunications Research Institute

Root cause analysis and recovery systems and methods

Methods and systems are provided for recovering from an error in a speech recognition system. In one embodiment, a method includes: receiving, by a processor, a first command recognized from a first speech utterance by a first language model; receiving, by the processor, a second command recognized from the first speech utterance by a second language model; determining, by the processor, at least one of similarities and dissimilarities between the first command and the second command; processing, by the processor, the first command and the second command with at least one rule of an error model based on the similarities and dissimilarities to determine a root cause; and selectively executing a recovery process based on the root cause..
Gm Global Technology Operations Llc

Apparatus, method, and computer program product for correcting speech recognition error

An apparatus for correcting a character string in a text of an embodiment includes a first converter, a first output unit, a second converter, an estimation unit, and a second output unit. The first converter recognizes a first speech of a first speaker, and converts the first speech to a first text.
Kabushiki Kaisha Toshiba

Multi-pass speech activity detection strategy to improve automatic speech recognition

An automatic speech recognition system and a method performed by an automatic speech recognition system are provided. The method includes performing at least two passes of speech activity detection on an acoustic utterance uttered by a speaker.
International Business Machines Corporation

System and performing automatic speech recognition using local private data

A method of providing hybrid speech recognition between a local embedded speech recognition system and a remote speech recognition system relates to receiving speech from a user at a device communicating with a remote speech recognition system. The system recognizes a first part of speech by performing a first recognition of the first part of the speech with the embedded speech recognition system that accesses private user data, wherein the private user data is not available to the remote speech recognition system.
Nuance Communications, Inc.

Apparatus and training a neutral network acoustic model, and speech recognition apparatus and method

According to one embodiment, an apparatus for training a neural network acoustic model includes a calculating unit, a clustering unit, and a sharing unit. The calculating unit calculates, based on training data including a training speech and a labeled phoneme state, scores of phoneme states different from the labeled phoneme state.
Kabushiki Kaisha Toshiba

System and performing dual mode speech recognition

A system and method is presented for performing dual mode speech recognition, employing a local recognition module on a mobile device and a remote recognition engine on a server device. The system accepts a spoken query from a user, and both the local recognition module and the remote recognition engine perform speech recognition operations on the query, returning a transcription and confidence score, subject to a latency cutoff time.
Soundhound, Inc.

System and speech-to-text conversion

This disclosure relates generally to speech recognition, and more particularly to system and method for speech-to-text conversion using audio as well as video input. In one embodiment, a method is provided for performing speech to text conversion.
Wipro Limited

Speech recognition

A computer system comprises an input configured to receive voice input from a user, the voice input having speech intervals separated by non-speech intervals; an asr system configured to identify individual words in the voice input during speech intervals of the voice input, and store the identified words in memory; a speech overload detection module configured to detect at a time during a speech interval of the voice input a speech overload condition; and a notification module configured to output to the user, in response to said to detection, a notification of the speech overload condition.. .
Microsoft Technology Licensing, Llc

Speech recognition

A computer system comprises an input configured to receive voice input from a user, the voice input having speech intervals separated by non-speech intervals; an asr system configured to identify individual words in the voice input during speech intervals thereof, and store the identified words in memory; a response generation module configured to generate based on the words stored in the memory an audio response for outputting to the user; and a response delivery module configured to begin outputting the audio response to the user during a non-speech interval of the voice input, wherein the outputting of the audio response is terminated before it has completed in response to a subsequent speech interval of the voice input commencing whilst the audio response is still being outputted.. .
Microsoft Technology Licensing, Llc

Speech recognition

Voice input is received from a user. An asr system generates in memory a set of words it has identified in the voice input, and update the set each time it identifies a new word in the voice input to add the new word to the set, during at least one interval of speech activity.
Microsoft Technology Licensing, Llc

Phonetic distance measurement system and related methods

Phonetic distances are empirically measured as a function of speech recognition engine recognition error rates. The error rates are determined by comparing a recognized speech file with a reference file.
Adacel Systems, Inc.

Automatic interpretation generating synthetic sound having characteristics similar to those of original speaker's voice

Provided are an automatic interpretation system and method for generating a synthetic sound having characteristics similar to those of an original speaker's voice. The automatic interpretation system for generating a synthetic sound having characteristics similar to those of an original speaker's voice includes a speech recognition module configured to generate text data by performing speech recognition for an original speech signal of an original speaker and extract at least one piece of characteristic information among pitch information, vocal intensity information, speech speed information, and vocal tract characteristic information of the original speech, an automatic translation module configured to generate a synthesis-target translation by translating the text data, and a speech synthesis module configured to generate a synthetic sound of the synthesis-target translation..
Electronics And Telecommunications Research Institute

Method for acquiring at least two pieces of information to be acquired, comprising information content to be linked, using a speech dialogue device, speech dialogue device, and motor vehicle

A voice output is produced by a speech dialogue device between the acquisitions of two pieces of information. Each piece of information is acquired by acquiring natural verbal voice input data and extracting the respective piece of information from the voice input data using a speech recognition algorithm.
Audi Ag

System and personalization of acoustic models for automatic speech recognition

Disclosed herein are methods, systems, and computer-readable storage media for automatic speech recognition. The method includes selecting a speaker independent model, and selecting a quantity of speaker dependent models, the quantity of speaker dependent models being based on available computing resources, the selected models including the speaker independent model and the quantity of speaker dependent models.
Nuance Communications, Inc.

Speech recognition method, speech recognition apparatus, and non-transitory computer-readable recording medium storing a program

A speech recognition method acquires sound information via multiple microphones, detects a sound source interval including sound from the sound information, acquires an estimated direction of speech by conducting direction estimation on a speech interval from among the sound source interval, conducts an adaptation process of using the sound information to estimate filter coefficients, decides a buffer size of the sound information to hold in a buffer, based on sound source interval information, estimated direction information, and adaptation process convergence state information, holds the sound information in the buffer according to the buffer size, conducts a beamforming process using the sound information held in the buffer and the filter coefficients to acquire speech information, and conducts speech recognition on the speech information acquired by the beamforming process. The method decides the buffer size to be a size sufficient for convergence of the adaptation process immediately after sound information processing starts..
Panasonic Corporation

System and estimating the reliability of alternate speech recognition hypotheses in real time

Disclosed herein are systems, methods, and computer-readable storage media for estimating reliability of alternate speech recognition hypotheses. A system configured to practice the method receives an n-best list of speech recognition hypotheses and features describing the n-best list, determines a first probability of correctness for each hypothesis in the n-best list based on the received features, determines a second probability that the n-best list does not contain a correct hypothesis, and uses the first probability and the second probability in a spoken dialog.
Nuance Communications, Inc.

Apparatus and forming search engine queries based on spoken utterances

A combination and a method are provided. Automatic speech recognition is performed on a received utterance.
Nuance Communications, Inc.

Speech recognition method, electronic device and speech recognition system

A speech recognition method, an electronic device and a speech recognition system are provided. When a local device is not connected to internet is determined, a voiceprint comparison between the received voice data and the history voice data stored in the voice database is executed to obtain the corresponding history voice data, and an associated history text data is found from a result database of the local device according to the obtained history voice data..
Asustek Computer Inc.

System and analyzing audio data samples associated with speech recognition

A particular apparatus includes a first buffer that is configured to store multiple audio data samples and a second buffer that is configured to store the multiple audio data samples. The first buffer is coupled to a first processor that is configured to analyze audio data samples to detect a keyword.
Qualcomm Incorporated

Hearing assistance with automated speech transcription

The assistive hearing device implementations described herein assist hearing impaired users of the device by using automated speech transcription to generate text representing speech received in audio signals which can then be read in a synthesized voice tailored to overcome a user's hearing deficiencies. A speech recognition engine recognizes speech in received audio and converts the speech of the received audio to text.
Microsoft Technology Licensing, Llc

Speech recognition system

A speech recognition system, which continuously recognizes speech uttered by at least one user and controls a navigation system on the basis of a recognition result, includes: a speech-operation-intention determination unit for determining—whether or not the user has made a recognized speech with the intention of operating the navigation system through speech; and a control mode altering unit for changing, when the speech-operation-intention determination unit determines that the user has no operation intention, the control mode of the navigation system in such a manner that the user is less aware of or pays less attention to the control mode than the case in which the speech-operation-intention determination unit determines that the user has an intention of operating the navigation system.. .
Mitsubishi Electric Corporation

Audio processing using an intelligent microphone

The present disclosure relates generally to improving audio processing using an intelligent microphone and, more particularly, to techniques for processing audio received at a microphone with integrated analog-to-digital conversion, digital signal processing, acoustic source separation, and for further processing by a speech recognition system. Embodiments of the present disclosure include intelligent microphone systems designed to collect and process high-quality audio input efficiently.
Analog Devices, Inc.

Apparatus and translating a meeting speech

According to one embodiment, a speech translation apparatus includes a speech recognition unit, a machine translation unit, an extracting unit, and a receiving unit. The extracting unit extracts words used for a meeting from a word set, based on information related to the meeting, and sends the extracted words to the speech recognition unit and the machine translation unit.
Kabushiki Kaisha Toshiba

Segmented character data entry system

The present invention provides a safety control system for a vehicle with controls located on the vehicle steering wheel. The controls maybe arranged in a cluster on one or both sides of the upper half of the steering wheel.
Act-ip

Control apparatus, control method, program, and information storage medium

Provided are a control apparatus, a control method, a program, and an information storage medium that are configured, if an execution of processing based on speech recognition is disabled, to make a user recognize that, if recognition of an accepted speech is successful, the execution of processing based on speech recognition is disabled. A speech acceptance block accepts a speech.
Sony Interactive Entertainment Inc.

Configurable phone with interactive voice response engine

A land-based or mobile phone and methods are provided for receiving inbound communications as either voice or text, and then based on the user's configuration settings, the inbound communication is provided to the user as it was received or is automatically converted into a format that is desired by the user. The phone also takes voice or text that is input by the user of the phone and converts the user's input to either voice or text based on the configuration settings stored in the user's contact list or otherwise.

Speech recognition using electronic device and server

An electronic device is provided. The electronic device includes a processor configured to perform automatic speech recognition (asr) on a speech input by using a speech recognition model that is stored in a memory and a communication module configured to provide the speech input to a server and receive a speech instruction, which corresponds to the speech input, from the server.
Samsung Electronics Co., Ltd.

System and multi-user gpu-accelerated speech recognition engine for client-server architectures

Disclosed herein is a gpu-accelerated speech recognition engine optimized for faster than real time speech recognition on a scalable server-client heterogeneous cpu-gpu architecture, which is specifically optimized to simultaneously decode multiple users in real-time. In order to efficiently support real-time speech recognition for multiple users, a “producer/consumer” design pattern is applied to decouple speech processes that run at different rates in order to handle multiple processes at the same time.
Carnegie Mellon University, A Pennsylvania Non-profit Corporation

System and audio-visual speech recognition

Disclosed herein is method of performing speech recognition using audio and visual information, where the visual information provides data related to a person's face. Image preprocessing identifies regions of interest, which is then combined with the audio data before being processed by a speech recognition engine..
Carnegie Mellon University, A Pennsylvania Non-profit Corporation

Automatic speech recognition for disfluent speech

A system and method of processing disfluent speech at an automatic speech recognition (asr) system includes: receiving speech from a speaker via a microphone; determining the received speech includes disfluent speech; accessing a disfluent speech grammar or acoustic model in response to the determination; and processing the received speech using the disfluent speech grammar.. .
Gm Global Technology Operations Llc

Electronic device and voice command processing therefor

Provided are an electronic device and method of voice command processing therefor. The electronic device may include: a housing having a surface; a display disposed in the housing and exposed through the surface; an audio input interface comprising audio input circuitry disposed in the housing; an audio output interface comprising audio output circuitry disposed in the housing; at least one wireless communication circuit disposed in the housing and configured to select one of plural communication protocols for call setup; a processor disposed in the housing and electrically connected with the display, the audio input interface, the audio output interface, the at least one wireless communication circuit, and a codec; and a memory electrically connected with the processor.
Samsung Electronics Co., Ltd.

Re-recognizing speech with external data sources

Methods, including computer programs encoded on a computer storage medium, for improving speech recognition based on external data sources. In one aspect, a method includes obtaining an initial candidate transcription of an utterance using an automated speech recognizer and identifying, based on a language model that is not used by the automated speech recognizer in generating the initial candidate transcription, one or more terms that are phonetically similar to one or more terms that do occur in the initial candidate transcription.
Google Inc.

Hybridized client-server speech recognition

A recipient computing device can receive a speech utterance to be processed by speech recognition and segment the speech utterance into two or more speech utterance segments, each of which can be to one of a plurality of available speech recognizers. A first one of the plurality of available speech recognizers can be implemented on a separate computing device accessible via a data network.
Speak With Me, Inc.

Information processing device, information processing, and program

There is provided an information processing device technology that enables an improvement in precision of sound recognition processing based on collected sound information, the information processing device including: a recognition controller that causes a speech recognition processing portion to execute sound recognition processing based on collected sound information obtained by a sound collecting portion; and an output controller that generates an output signal to output a recognition result obtained through the sound recognition processing. The output controller causes an output portion to output an evaluation result regarding a type of sound based on the collected sound information prior to the recognition result..
Sony Corporation

Motor vehicle operating device with a correction strategy for voice recognition

The invention relates to a method for operating a motor vehicle, wherein a first speech input of a user is received, at least one recognition result (a-d) is determined by means of a speech recognition system, at least one recognition result (a-d) is output to an output device of the motor vehicle as a result list and a second speech output of the user is received. The objective of the invention is to avoid a double input of false recognition results.
Audi Ag

Method of searching for multimedia image

The present invention provides a method of searching for multimedia image. When a camera and a microphone are used for recording an image file, a software of speech recognition is used for converting the speech recorded by the microphone into a text file, and then the image file and the text file are combined to form a folder for storing into a data base.
National Taipei University Of Technology

Speech recognition automated driving

Methods and systems are provided for processing speech for a vehicle having at least one autonomous vehicle system. In one embodiment, a method includes: receiving, by a processor, context data generated by an autonomous vehicle system; receiving, by a processor, a speech utterance from a user interacting with the vehicle; processing, by a processor, the speech utterance based on the context data; and selectively communicating, by a processor, at least one of a dialog prompt to the user and a control action to the autonomous vehicle system based on the context data..
Gm Global Technology Operations Llc

System and improving speech recognition using context

A system and method are provided for improving speech recognition accuracy. Contextual information about user speech may be received, and then speech recognition analysis can be performed on the user speech using the contextual information.
Paypal, Inc.

Method and system for training language models to reduce recognition errors

A method and for training a language model to reduce recognition errors, wherein the language model is a recurrent neural network language model (rnnlm) by first acquiring training samples. An automatic speech recognition system (asr) is appled to the training samples to produce recognized words and probabilites of the recognized words, and an n-best list is selected from the recognized words based on the probabilities.
Mitsubishi Electric Research Laboratories, Inc.

Improved fixed point integer implementations for neural networks

Techniques related to implementing neural networks for speech recognition systems are discussed. Such techniques may include processing a node of the neural network by determining a score for the node as a product of weights and inputs such that the weights are fixed point integer values, applying a correction to the score based a correction value associated with at least one of the weights, and generating an output from the node based on the corrected score..
Intel Corporation

Method for launching web search on handheld computer

A method for remotely launching web search on a smartphone is disclosed. The method includes the steps of: a) wirelessly connecting a remote control with a microphone to the smartphone; b) opening a searching text input box; c) enabling the searching text input box with a voice ime (input method editor); d) speaking a word to be searched to the remote control; e) sending the voiced word to the handheld computer; f) transmitting the voiced word to a search engine with a speech recognition function through internet; and g) the search engine transmitting a search result to the smartphone..
I/o Interconnect, Ltd.

Vehicle and control the vehicle

A vehicle connected to a terminal of a user to perform dialing includes a speech input unit that receives a speech of the user, a speech recognition unit that recognizes a command included in the received speech, a control unit that determines a preparation state for performing the dialing, and a display unit that displays information about the preparation state when preparation for performing the dialing is not completed.. .
Hyundai Motor Company

Electronic device and speech recognition method thereof

An electronic device and a speech recognition method that is capable of adjusting an end-of-utterance detection period dynamically are disclosed. The electronic device includes a microphone, a display, an input device formed as a part of the display or connected to the electronic device as a separate device, a processor electrically connected to the microphone, the display, and the input device, and a memory electrically connected to the processor.
Samsung Electronics Co., Ltd.

Methods and speech segmentation using multiple metadata

Methods and apparatus to process microphone signals by a speech enhancement module to generate an audio stream signal including first and second metadata for use by a speech recognition module. In an embodiment, speech recognition is performed using endpointing information including transitioning from a silence state to a maybe speech state, in which data is buffered, based on the first metadata and transitioning to a speech state, in which speech recognition is performed, based upon the second metadata..
Nuance Communications, Inc.

Acoustic and domain based speech recognition for vehicles

A processor of a vehicle speech recognition system recognizes speech via domain-specific language and acoustic models. The processor further, in response to the acoustic model having a confidence score for recognized speech falling within a predetermined range defined relative to a confidence score for the domain-specific language model, recognizes speech via the acoustic model only..
Ford Global Technologies, Llc

Dynamic acoustic model switching to improve noisy speech recognition

An automatic speech recognition system for a vehicle includes a controller configured to select an acoustic model from a library of acoustic models based on ambient noise in a cabin of the vehicle and operating parameters of the vehicle. The controller is further configured to apply the selected acoustic model to noisy speech to improve recognition of the speech..
Ford Global Technologies, Llc

System and personalization in speech recognition

Systems, methods, and computer-readable storage devices are for identifying a user profile for speech recognition. The user profile is selected from one of several user profiles which are all associated with a speaker, and can be selected based on the identity of the speaker, the location of the speaker, the device the speaker is using, or other relevant parameters.
At&t Intellectual Property I, L.p.

Systems and methods for engaging an audience in a conversational advertisement

A system and method are described for engaging an audience in a conversational advertisement. A conversational advertising system converses with an audience using spoken words.
Nuance Communications, Inc.

Encoding and adaptive, scalable accessing of distributed models

Systems, methods, and apparatus for accessing distributed models in automated machine processing, including using large language models in machine translation, speech recognition and other applications.. .
Google Inc.

Semiautomated relay method and apparatus

A captioning system comprising a processor and a memory having stored thereon software such that, when the software is executed by the one or more processors, the system generates text captions from speech data, including at least the following, receiving, from a hearing user's (hu's) device, an hu's speech data, generating, at the one or more hardware processors, first text captions from the speech data using a speech recognition algorithm, automatically determining, at the one or more processors, whether the generated first text captions meet a first accuracy threshold and when the first text captions meet the first accuracy threshold, sending the first text captions to an assisted user's (au's) device for display, when the first text captions do not meet the first accuracy threshold, generating, at the one or more processors, second text captions from the speech data based on user input to the speech recognition algorithm from a call assistant and sending the second text captions to the au's device for display.. .
Ultratec, Inc.

Speech recognition method and apparatus using device information

A speech recognition method includes: storing at least one acoustic model (am); obtaining, from a device located outside the asr server, a device id for identifying the device; obtaining speech data from the device; selecting an am based on the device id; performing speech recognition on the speech data by using the selected am; and outputting a result of the speech recognition.. .
Samsung Electronics Co., Ltd.

User configurable speech commands

A speech recognition method and system enables user-configurable speech commands. For a given speech command, the speech recognition engine provides a mechanism for the end-user to select speech command terms to use in substitution for the given speech command.
Kopin Corporation

Systems and methods for assisting automatic speech recognition

Systems and methods for assisting automatic speech recognition (asr) are provided. An example method includes generating, by a mobile device, a plurality of instantiations of a speech component in a captured audio signal, each instantiation of the plurality of instantiations being in support of a particular hypothesis regarding the speech component.
Knowles Electronics, Llc

Apparatus and recognizing speech

A speech recognition apparatus based on a deep-neural-network (dnn) sound model includes a memory and a processor. As the processor executes a program stored in the memory, the processor generates sound-model state sets corresponding to a plurality of pieces of set training speech data included in multi-set training speech data, generates a multi-set state cluster from the sound-model state sets, and sets the multi-set training speech data as an input node and the multi-set state cluster as output nodes so as to learn a dnn structured parameter..
Electronics And Telecommunications Research Institute

Speaker-adaptive speech recognition

(b) providing the test-speaker-specific adaptive system comprising the input network component, the trained test-speaker-specific adaptive model component, and the speaker-adaptive output network.. .

Material selection for language model customization in speech recognition for speech analytics

A method for extracting, from non-speech text, training data for a language model for speech recognition includes: receiving, by a processor, non-speech text; selecting, by the processor, text from the non-speech text; converting, by the processor, the selected text to generate converted text comprising a plurality of phrases consistent with speech transcription text; training, by the processor, a language model using the converted text; and outputting, by the processor, the language model.. .
Genesys Telecommunications Laboratories, Inc.

Language model customization in speech recognition for speech analytics

A method for generating a language model for an organization includes: receiving, by a processor, organization-specific training data; receiving, by the processor, generic training data; computing, by the processor, a plurality of similarities between the generic training data and the organization-specific training data; assigning, by the processor, a plurality of weights to the generic training data in accordance with the computed similarities; combining, by the processor, the generic training data with the organization-specific training data in accordance with the weights to generate customized training data; training, by the processor, a customized language model using the customized training data; and outputting, by the processor, the customized language model, the customized language model being configured to compute the likelihood of phrases in a medium.. .
Genesys Telecommunications Laboratories, Inc.

Predicting recognition quality of a phrase in automatic speech recognition systems

A method for predicting a speech recognition quality of a phrase comprising at least one word includes: receiving, on a computer system including a processor and memory storing instructions, the phrase; computing, on the computer system, a set of features comprising one or more features corresponding to the phrase; providing the phrase to a prediction model on the computer system and receiving a predicted recognition quality value based on the set of features; and returning the predicted recognition quality value.. .
Genesys Telecommunications Laboratories, Inc.

Safety system and method

A system and method are described. The system utilizes data entry devices commonly found in some workplaces, such as warehouses, to generate an emergency signal.
Hand Held Products, Inc.

Apparatus and verifying utterance in speech recognition system

An apparatus and method for verifying an utterance based on multi-event detection information in a natural language speech recognition system. The apparatus includes a noise processor configured to process noise of an input speech signal, a feature extractor configured to extract features of speech data obtained through the noise processing, an event detector configured to detect events of the plurality of speech features occurring in the speech data using the noise-processed data and data of the extracted features, a decoder configured to perform speech recognition using a plurality of preset speech recognition models for the extracted feature data, and an utterance verifier configured to calculate confidence measurement values in units of words and sentences using information on the plurality of events detected by the event detector and a preset utterance verification model and perform utterance verification according to the calculated confidence measurement values..
Electronics And Telecommunications Research Institute

System and providing generated speech via a network

A system and method of operating an automatic speech recognition application over an internet protocol network is disclosed. The asr application communicates over a packet network such as an internet protocol network or a wireless network.
Nuance Communications, Inc.

Data augmentation method based on stochastic feature mapping for automatic speech recognition

A method of augmenting training data includes converting a feature sequence of a source speaker determined from a plurality of utterances within a transcript to a feature sequence of a target speaker under the same transcript, training a speaker-dependent acoustic model for the target speaker for corresponding speaker-specific acoustic characteristics, estimating a mapping function between the feature sequence of the source speaker and the speaker-dependent acoustic model of the target speaker, and mapping each utterance from each speaker in a training set using the mapping function to multiple selected target speakers in the training set.. .
International Business Machines Corporation

Method and annotating video content with metadata generated using speech recognition technology

A method and apparatus is provided for annotating video content with metadata generated using speech recognition technology. The method begins by rendering video content on a display device.
Google Technology Holdings Llc

Wireless security system

A wireless doorbell having a housing, the housing having a rear portion and a front portion, the rear portion configured to be secured to a support and the front portion configured to be secured to the rear portion. The wireless doorbell having a sensor configured to detect an object in a vicinity of the wireless doorbell, a camera configured to be activated and obtain at least one image, and a microphone configured to obtain audio signals.
Advanced Wireless Innovations Llc

Microphone circuit assembly and system with speech recognition

The present invention relates in one aspect to a microphone circuit assembly for an external application processor such as a programmable digital signal processor. The microphone circuit assembly comprises a microphone preamplifier and analog-to-digital converter generate microphone signal samples at a first predetermined rate.
Analog Devices Global

Speech recognition device and speech recognition method

A speech recognition device: transmits an input voice to a server; receives a first speech recognition result that is a result from speech recognition by the server on the transmitted input voice; performs speech recognition on the input voice to obtain a second speech recognition result; refers to speech rules each representing a formation of speech elements for the input voice, to determine the speech rule matched to the second speech recognition result; determines from the correspondence relationships among presence/absence of the first speech recognition result, presence/absence of the second speech recognition result and presence/absence of the speech element that forms the speech rule, a speech recognition state indicating the speech element whose speech recognition result is not obtained; generates according to the determined speech recognition state, a response text for inquiring about the speech element whose speech recognition result is not obtained; and outputs that text.. .
Mitsubishi Electric Corporation

Recognizing accented speech

Techniques (300, 400, 500) and apparatuses (100, 200, 700) for recognizing accented speech are described. In some embodiments, an accent module recognizes accented speech using an accent library based on device data, uses different speech recognition correction levels based on an application field into which recognized words are set to be provided, or updates an accent library based on corrections made to incorrectly recognized speech..
Google Technology Holdings Llc

Recognizing accented speech

Techniques (300, 400, 500) and apparatuses (100, 200, 700) for recognizing accented speech are described. In some embodiments, an accent module recognizes accented speech using an accent library based on device data, uses different speech recognition correction levels based on an application field into which recognized words are set to be provided, or updates an accent library based on corrections made to incorrectly recognized speech..
Google Technology Holdings Llc

System and neural network based feature extraction for acoustic model development

A system and method are presented for neural network based feature extraction for acoustic model development. A neural network may be used to extract acoustic features from raw mfccs or the spectrum, which are then used for training acoustic models for speech recognition systems.
Interactive Intelligence Group, Inc.

Speech recognition

This patent disclosure relates to a voice technology and discloses a voice recognition method and electronic device. In some embodiments of this disclosure, soft clustering calculation is performed in advance according to n gausses obtained by model training, to obtain m soft clustering gausses; when voice recognition is performed, voice is converted to obtain an eigenvector, and top l soft clustering gausses with highest scores are calculated according to the eigenvector, wherein the l is less than the m; and member gausses among the l soft clustering gausses are used as gausses that need to participate in calculation in an acoustic model in a voice recognition process to calculate likelihood of the acoustic model..
Le Shi Zhi Xin Electronic Technology (tianjin) Limited

Automated equalization

Techniques for improving speech recognition are described. An example of an electronic device includes an extracting unit to extract a reference spectral profile from a reference signal and a device spectral profile from a device signal.
Intel Corporation

Speech recognition with selective use of dynamic language models

This document describes, among other things, a computer-implemented method for transcribing an utterance. The method can include receiving, at a computing system, speech data that characterizes an utterance of a user.
Google Inc.

Method for detecting driving noise and improving speech recognition in a vehicle

The disclosure concerns a method for recognizing driving noise in a sound signal that is acquired by a microphone disposed in a vehicle. The sound signal originates from the surface structure of the road.
Ford Global Technologies, Llc

Fast out-of-vocabulary search in automatic speech recognition systems

A method including: receiving, on a computer system, a text search query, the query including one or more query words; generating, on the computer system, for each query word in the query, one or more anchor segments within a plurality of speech recognition processed audio files, the one or more anchor segments identifying possible locations containing the query word; post-processing, on the computer system, the one or more anchor segments, the post-processing including: expanding the one or more anchor segments; sorting the one or more anchor segments; and merging overlapping ones of the one or more anchor segments; and searching, on the computer system, the post-processed one or more anchor segments for instances of at least one of the one or more query words using a constrained grammar.. .
Genesys Telecommunications Laboratories, Inc.

Efficient empirical determination, computation, and use of acoustic confusability measures

Efficient empirical determination, computation, and use of an acoustic confusability measure comprises: (1) an empirically derived acoustic confusability measure, comprising a means for determining the acoustic confusability between any two textual phrases in a given language, where the measure of acoustic confusability is empirically derived from examples of the application of a specific speech recognition technology, where the procedure does not require access to the internal computational models of the speech recognition technology, and does not depend upon any particular internal structure or modeling technique, and where the procedure is based upon iterative improvement from an initial estimate; (2) techniques for efficient computation of empirically derived acoustic confusability measure, comprising means for efficient application of an acoustic confusability score, allowing practical application to very large-scale problems; and (3) a method for using acoustic confusability measures to make principled choices about which specific phrases to make recognizable by a speech recognition application.. .
Promptu Systems Corporation

System and dynamic asr based on social media

System and method to adjust an automatic speech recognition (asr) engine, the method including: receiving social network information from a social network; data mining the social network information to extract one or more characteristics; inferring a trend from the extracted one or more characteristics; and adjusting the asr engine based upon the inferred trend. Embodiments of the method may further include: receiving a speech signal from a user; and recognizing the speech signal by use of the adjusted asr engine.
Avaya Inc.

Semantic word affinity automatic speech recognition

System and techniques for direct motion sensor input to rendering pipeline are described herein. A ranked list of asr hypotheses may be obtained.

Technologies for end-of-sentence detection using syntactic coherence

Technologies for detecting an end of a sentence in automatic speech recognition are disclosed. An automatic speech recognition device may acquire speech data, and identify phonemes and words of the speech data.

System and user-specified pronunciation of words for speech synthesis and recognition

The method is performed at an electronic device with one or more processors and memory storing one or more programs for execution by the one or more processors. A first speech input including at least one word is received.
Apple Inc.

Method for interaction with terminal and electronic the same

The present application discloses a method for interaction with terminal and an electronic apparatus for the same. The method includes: determining whether a downward acceleration of a gesture is greater than a default threshold value when the gesture is detected under state of a displayed interface, wherein the displayed interface comprises: a replying information and recognition result interface, a replying information full screen interface, or a replying information full screen extension interface after record of speech in a speech recognition interface is detected to be finished; determining an operation type corresponding to the gesture, according to determination of whether the downward acceleration of the gesture is greater than the default threshold value; and executing an interaction corresponding to the operation type, according to the operation type..
Le Shi Zhi Xin Electronic Technology (tianjin) Limited

Natural human-computer interaction for virtual personal assistant systems

Technologies for natural language interactions with virtual personal assistant systems include a computing device configured to capture audio input, distort the audio input to produce a number of distorted audio variations, and perform speech recognition on the audio input and the distorted audio variants. The computing device selects a result from a large number of potential speech recognition results based on contextual information.
Intel Corporation

Multimodal speech recognition for real-time video audio-based display indicia application

Aspects relate to computer implemented methods, systems, and processes to automatically generate audio-based display indicia of media content including receiving, by a processor, a plurality of media content categories including at least one feature, receiving a plurality of categorized speech recognition algorithms, each speech recognition algorithm being associated with a respective one or more of the plurality of media content categories, determining a media content category of a current media content based on at least one feature of the current media content, selecting one speech recognition algorithm from the plurality of categorized speech recognition algorithms based on the determination of the media content category of the current media content, and applying the selected speech recognition algorithm to the current media content.. .
International Business Machines Corporation

Motor vehicle device operation with operating correction

A method for operating a motor vehicle operating device to carry out with voice control two operating steps. A first vocabulary is set, which is provided for the first operating step, to a speech recognition device.
Audi Ag

System and methods for adapting neural network acoustic models

Techniques for adapting a trained neural network acoustic model, comprising using at least one computer hardware processor to perform: generating initial speaker information values for a speaker; generating first speech content values from first speech data corresponding to a first utterance spoken by the speaker; processing the first speech content values and the initial speaker information values using the trained neural network acoustic model; recognizing, using automatic speech recognition, the first utterance based, at least in part on results of the processing; generating updated speaker information values using the first speech data and at least one of the initial speaker information values and/or information used to generate the initial speaker information values; and recognizing, based at least in part on the updated speaker information values, a second utterance spoken by the speaker.. .
Nuance Communications, Inc.

Text rule based multi-accent speech recognition with single acoustic model and automatic accent detection

Embodiments are disclosed for recognizing speech in a computing system. An example speech recognition method includes receiving metadata at a generation unit that includes a database of accented substrings, generating, via the generation unit, accent-corrected phonetic data for words included in the metadata, the accent-corrected phonetic data representing different pronunciations of the words included in the metadata based on the accented substrings stored in the database, receiving, at a voice recognition engine, extracted speech data derived from utterances input by a user to the speech recognition system, and receiving, at the voice recognition engine, the accent-corrected phonetic data.
Harman International Industries, Incorporated

Discriminative training of automatic speech recognition models with natural language processing dictionary for spoken language processing

Methods and systems for language processing includes training one or more automatic speech recognition models using an automatic speech recognition dictionary. A set of n automatic speech recognition hypotheses for an input is determined, based on the one or more automatic speech recognition models, using a processor.
International Business Machines Corporation

Systems and methods for a multi-core optimized recurrent neural network

Systems and methods for a multi-core optimized recurrent neural network (rnn) architecture are disclosed. The various architectures affect communication and synchronization operations according to the multi-bulk-synchronous-parallel (mbsp) model for a given processor.
Baidu Usa Llc

Incorporating an exogenous large-vocabulary model into rule-based speech recognition

Incorporation of an exogenous large-vocabulary model into rule-based speech recognition is provided. An audio stream is received by a local small-vocabulary rule-based speech recognition system (svsrs), and is streamed to a large-vocabulary statistically-modeled speech recognition system (lvsrs).
Microsoft Technology Licensing, Llc

Applying neural network language models to weighted finite state transducers for automatic speech recognition

Systems and processes for converting speech-to-text are provided. In one example process, speech input can be received.
Apple Inc.

Method of and system for providing adaptive respondent training in a speech recognition application

A system for conducting a telephonic speech recognition application includes an automated telephone device for making telephonic contact with a respondent and a speech recognition device which, upon the telephonic contact being made, presents the respondent with at least one introductory prompt for the respondent to reply to; receives a spoken response from the respondent; and performs a speech recognition analysis on the spoken response to determine a capability of the respondent to complete the application. If the speech recognition device, based on the spoken response to the introductory prompt, determines that the respondent is capable of competing the application, the speech recognition device presents at least one application prompt to the respondent.
Eliza Corporation

Semi-supervised system for multichannel source enhancement through configurable adaptive transformations and deep neural network

Various techniques are provided to perform enhanced automatic speech recognition. For example, a subband analysis may be performed that transforms time-domain signals of multiple audio channels in subband signals.
Conexant Systems, Inc.

Prioritized content loading for vehicle automatic speech recognition systems

A method of loading content items for accessibility by a vehicle automatic speech recognition (asr) system. The method tracks content items requested by one or more users and prioritizes the loading of requested content items and/or selectively loads requested content items at least partially based on the interaction history of one or more users.
Gm Global Technology Operations Llc

Method and system for joint training of hybrid neural networks for acoustic modeling in automatic speech recognition

Systems and methods for training networks are provided. A method for training networks comprises receiving an input from each of a plurality of neural networks differing from each other in at least one of architecture, input modality, and feature type, connecting the plurality of neural networks through a common output layer, or through one or more common hidden layers and a common output layer to result in a joint network, and training the joint network..
International Business Machines Corporation

Expansion of a question and answer database

A system and method for expanding a question and answer (q&a) database. The method includes preparing a set of q&a documents and speech recognition results of an agent's utterances in conversations between an agent and a customer, each q&a document in the set having an identifier, and each speech recognition result having an identifier common with the identifier of a relevant q&a document, and adding one or more repetition parts extracted from the speech recognition results of the agent's utterances to a corresponding q&a document in the set..
International Business Machines Corporation

Electronic device, computer-implemented method and computer program

An electronic device comprising a processor which is configured to perform speech recognition on an audio signal, linguistically analyze the output of the speech recognition for named-entities, perform an internet or database search for the recognized named-entities to obtain query results, and display, on a display of the electronic device, information obtained from the query results on a timeline.. .
Sony Corporation

Method and system for role dependent context sensitive spoken and textual language understanding with neural networks

A method and system processes utterances that are acquired either from an automatic speech recognition (asr) system or text. The utterances have associated identities of each party, such as role a utterances and role b utterances.
Mitsubishi Electric Research Laboratories, Inc.

Method for using a human-machine interface device for an aircraft comprising a speech recognition unit

The general field of the invention is that of methods for using a human-machine interface device for an aircraft comprising at least one speech recognition unit, one display device with a touch interface, one graphical interface computer and one electronic computing unit, the set being designed to graphically present a plurality of commands, each command being classed in at least a first category, referred to as the critical category, and a second category, referred to as the non-critical category, each non-critical command having a plurality of options, each option having a name, said names assembled in a database called a “lexicon”. The method according to the invention comprises steps of recognizing displayed commands, activating the speech recognition unit, comparing the touch and voice information and a validation step..
Thales

Pronunciation learning through correction logs

A new pronunciation learning system for dynamically learning new pronunciations assisted by user correction logs. The user correction logs provide a record of speech recognition events and subsequent user behavior that implicitly confirms or rejects the recognition result and/or shows the user's intended words by via subsequent input.
Microsoft Technology Licensing, Llc.

Accent correction in speech recognition systems

A method comprising receiving an audio input signal comprising speech, determining an accent class corresponding to the speech, identifying an accented phone pattern within the speech, replacing the accented phone pattern with an unaccented phone pattern, and generating an unaccented output signal from the unaccented phone pattern.. .
International Business Machines Corporation

Speech recognition apparatus and method

A speech recognition apparatus includes a predictor configured to predict a word class of a word following a word sequence that has been previously searched for based on the word sequence that has been previously searched for; and a decoder configured to search for a candidate word corresponding to a speech signal, extend the word sequence that has been previously searched for using the candidate word that has been searched for, and adjust a probability value of the extended word sequence based on the predicted word class.. .
Samsung Electronics Co., Ltd.

Call context metadata

A computer detects a connected voice or video call between participants and records a brief media sample. speech recognition is utilized to determine when the call is connected as well as to transcribe the content of the audio portion of the media sample.

Terminal device and communication communication of speech signals

A reception unit receives a speech signal from another terminal device. A reproduction unit reproduces the speech signal received in the reception unit.

Generating call context metadata from speech, contacts, and common names in a geographic area

A computer detects a connected voice or video call between participants and records a brief media sample. speech recognition is utilized to determine when the call is connected as well as to transcribe the content of the audio portion of the media sample.

Speech recognition method and speech recognition apparatus to improve performance or response of speech recognition

In a speech recognition method, a criteria value is determined to determine the length of a silent section included in a processing section, and a processing mode to use is determined in accordance with the criteria value. The criteria value is used to obtain audio information of the processing section.

Speech processing system and terminal

[solution] receiving a speech utterance, the speech processing system performs speech recognition and displays a text 158 of the recognition result. Further, the speech processing system translates the recognition result in accordance with settings to a text 176 of another language and displays and synthesizes speech of the translated result.

Deployed end-to-end speech recognition

Embodiments of end-to-end deep learning systems and methods are disclosed to recognize speech of vastly different languages, such as english or mandarin chinese. In embodiments, the entire pipelines of hand-engineered components are replaced with neural networks, and the end-to-end learning allows handling a diverse variety of speech including noisy environments, accents, and different languages.

System and supporting automatic speech recognition of regional accents based on statistical information and user corrections

Disclosed herein is a system for compensating for dialects and accents comprising an automatic speech recognition system comprising an automatic speech recognition device that is operative to receive an utterance in an acoustic format from a user with a user interface; a speech to text conversion engine that is operative to receive the utterance from the automatic speech recognition device and to prepare a textual statement of the utterance; and a correction database that is operative to store textual statements of all utterances; where the correction database is operative to secure a corrected transcript of the textual statement of the utterance from the speech to text conversion engine and adds it to the corrections database if the corrected transcript of the textual statement of the utterance is not available.. .

End-to-end speech recognition

Embodiments of end-to-end deep learning systems and methods are disclosed to recognize speech of vastly different languages, such as english or mandarin chinese. In embodiments, the entire pipelines of hand-engineered components are replaced with neural networks, and the end-to-end learning allows handling a diverse variety of speech including noisy environments, accents, and different languages.

Systems and methods for speech-based searching of content repositories

According to some aspects, a method of searching for content in response to a user voice query is provided. The method may comprise receiving the user voice query, performing speech recognition to generate n best speech recognition results comprising a first speech recognition result, performing a supervised search of at least one content repository to identify one or more supervised search results using one or more classifiers that classify the first speech recognition result into at least one class that identifies previously classified content in the at least one content repository, performing an unsupervised search of the at least one content repository to identify one or more unsupervised search results, wherein performing the unsupervised search comprises performing a word search of the at least one content repository, and generating combined results from among the one or more supervised search results and the one or more unsupervised search results..

Content analysis to enhance voice search

Methods and apparatus for improving speech recognition accuracy in media content searches are described. An advertisement for a media content item is analyzed to identify keywords that may describe the media content item.

Methods and systems for interfacing a speech dialog with new applications

Methods and systems are provided interfacing a speech system with a new application. In one embodiment a method includes: maintaining a registration data datastore that stores registration data from the new application and one or more other applications; receiving, at a router module associated with the speech system, a result from a speech recognition module; processing, by the router module, the result and the registration data to determine a possible new application; and providing the possible new application to the speech system..

Method and tuning speech recognition systems to accommodate ambient noise

A system includes a head and torso simulation (hats) system configured to play back pre-recorded audio commands while simulating a driver head location as an output location. The system also includes a vehicle speaker system and a processor configured to engage a vehicle heating, ventilation and air-conditioning (hvac) system.
Ford Global Technologies, Llc

Automatic speaker identification using speech recognition features

Features are disclosed for automatically identifying a speaker. Artifacts of automatic speech recognition (“asr”) and/or other automatically determined information may be processed against individual user profiles or models.
Amazon Technologies, Inc.

Confidence features for automated speech recognition arbitration

The described technology provides arbitration between speech recognition results generated by different automatic speech recognition (asr) engines, such as asr engines trained according to different language or acoustic models. The system includes an arbitrator that selects between a first speech recognition result representing an acoustic utterance as transcribed by a first asr engine and a second speech recognition result representing the acoustic utterance as transcribed by a second asr engine.
Microsoft Technology Licensing, Llc

Multiple speech locale-specific hotword classifiers for selection of a speech locale

Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for recognizing speech in an utterance. The methods, systems, and apparatus include actions of receiving an utterance and obtaining acoustic features from the utterance.
Google Inc.

Method and device of speech recognition

A method of speech recognition includes the following steps: receiving a first speech input, and converting the first speech input into a first digital signal; transmitting the first digital signal to a cloud server; receiving a first post-processing result generated according to the first digital signal; receiving a second speech input, and converting the second speech input into a second digital signal; performing a first speech recognition to the second digital signal to obtain a recognition result by using a first speech recognition model; and comparing the first post-processing result with the recognition result to determine a speech recognition result.. .
Shenzhen Raisound Technology Co. Ltd.

Method and device for speech recognition

An embodiment of the present disclosure discloses a method and a system for speech recognition. The method comprises steps of intercepting a first speech segment from a monitored speech signal, analyzing the first speech segment to determine an energy spectrum; extracting characteristics of the first speech segment according to the energy spectrum, determining speech characteristics; analyzing the energy spectrum of the first speech segment according to the speech characteristics, intercepting a second speech segment; recognizing the speech of the second speech segment, and obtaining a speech recognition result.
Le Shi Zhi Xin Electronic Technology (tianjin) Limited

Method and system for reading fluency training

A non-transitory processor-readable medium stores code representing instructions to be executed by a processor. The code causes the processor to receive a request from a user of a client device to initiate a speech recognition engine for a web page displayed at the client device.
Rosetta Stone Ltd.

Computer speech recognition and semantic understanding from activity patterns

A user activity pattern may be ascertained using signal data from a set of computing devices. The activity pattern may be used to infer user intent with regards to a user interaction with a computing device or to predict a likely future action by the user.
Microsoft Technology Licensing, Llc

Method and keyword speech recognition

Phoneme images are created for keywords and audio files. The keyword images and audio file images are used to identify keywords within the audio file when the phoneme images match.
Apptek, Inc.

Systems, methods and devices for intelligent speech recognition and processing

Systems, methods, and devices for intelligent speech recognition and processing are disclosed. According to one embodiment, a method for improving intelligibility of a speech signal may include (1) at least one processor receiving an incoming speech signal comprising a plurality of sound elements; (2) the at least one processor recognizing a sound element in the incoming speech signal to improve the intelligibility thereof; (3) the at least one processor processing the sound element by at least one of modifying and replacing the sound element; and (4) the at least one processor outputting the processed speech signal comprising the processed sound element..
Audimax Llc

Speech recognition candidate selection based on non-acoustic input

A method includes the following steps. A speech input is received.
International Business Machines Corporation

Method and context-augmented speech recognition

A system includes a processor configured to receive speech-input. The processor is further configured to receive at least one location-identification.

Systems and methods for adaptive proper name entity recognition and understanding

Various embodiments contemplate systems and methods for performing automatic speech recognition (asr) and natural language understanding (nlu) that enable high accuracy recognition and understanding of freely spoken utterances which may contain proper names and similar entities. The proper name entities may contain or be comprised wholly of words that are not present in the vocabularies of these systems as normally constituted.
Promptu Systems Corporation

Neural network training apparatus and method, and speech recognition apparatus and method

A neural network training apparatus includes a primary trainer configured to perform a primary training of a neural network model based on clean training data and target data corresponding to the clean training data; and a secondary trainer configured to perform a secondary training of the neural network model on which the primary training has been performed based on noisy training data and an output probability distribution of an output class for the clean training data calculated during the primary training of the neural network model.. .
Samsung Electronics Co., Ltd.

System and broadcasting audio tweets

Disclosed herein are systems, methods, and non-transitory computer-readable storage media for broadcasting audio tweets. A system broadcasting audio tweets receives tweets via telephone devices, wherein each listener hears a telephone call of a broadcast on the telephone devices.
Audionow Ip Holdings, Llc

Transfer function to generate lombard speech from neutral speech

A controller may be programmed to create a speech utterance set for speech recognition training by, in response to receiving data representing a neutral utterance and parameter values defining signal noise, generating data representing a lombard effect version of the neutral utterance using a transfer function associated with the parameter values and defining distortion between neutral and lombard effect versions of a same utterance due to the signal noise.. .
Ford Global Technologies, Llc

Electronic device and recognizing speech

An electronic device and a method for recognizing a speech are provided. The method for recognizing a speech by an electronic device includes: receiving sounds generated from a sound source through a plurality of microphones; calculating power values from a plurality of audio signals generated by performing signal processing on each sound input through the plurality of microphones and calculating direction information on the sound source based on the calculated power values and storing the calculated direction information; and performing the speech recognition on a speech section included in the audio signal based on the direction information on the sound source.
Samsung Electronics Co., Ltd.

Sound envelope deconstruction to identify words and speakers in continuous speech

A speech recognition capability in which speakers of spoken text are identified based on the contour of sound waves representing the spoken text. Variations in the contour of the sound waves are identified, features are assigned to those variations, and parameters of those features are grouped into predefined characteristics.
International Business Machines Corporation

Methods and joint stochastic and deterministic dictation formatting

Methods and apparatus for speech recognition on user dictated words to generate a dictation and using a discriminative statistical model derived from a deterministic formatting grammar module and user formatted documents to extract features and estimate scores from the formatting graph. The processed dictation can be output as formatted text based on a formatting selection to provide an integrated stochastic and deterministic formatting of the dictation..
Nuance Communications, Inc.

Techniques for updating an automatic speech recognition system using finite-state transducers

Techniques are described for updating an automatic speech recognition (asr) system that, prior to the update, is configured to perform asr using a first finite-state transducer (fst) comprising a first set of paths representing recognizable speech sequences. A second fst may be accessed, comprising a second set of paths representing speech sequences to be recognized by the updated asr system.
Nuance Communications, Inc.

Using word confidence score, insertion and substitution thresholds for selected words in speech recognition

A method and system for improving the accuracy of a speech recognition system using word confidence score (wcs) processing is introduced. Parameters in a decoder are selected to minimize a weighted total error rate, such that deletion errors are weighted more heavily than substitution and insertion errors.
Adacel, Inc.

Mobile phone

A mobile phone including a first touch screen and a second touch screen respectively coupled to a processor; a transmitter and a receiver respectively coupled to the processor; a memorizer coupled to the processor, which stores speech recognition database and contacts database; a mobile communication unit and a video communication unit both coupled to the processor, the mobile communication unit associated with the first touch screen, including a customer recognition module, a baseband processing chip and an rf module; the video communication unit associated with the second touch screen, including a front camera, an image processing chip and a wireless communication module. Mobile phone of the disclosure provides two touch screens dedicated for call, which makes the elderly very convenient due to the very few interaction level of the mobile phone..
Lecloud Computing Co., Ltd.

Speech recognition and transcription among users having heterogeneous protocols

A system is disclosed for facilitating free form dictation, including directed dictation and constrained recognition and/or structured transcription among users having heterogeneous native (legacy) protocols for generating, transcribing, and exchanging recognized and transcribed speech. The system includes at least one system transaction manager having a “system protocol,” to receive a verified, streamed speech information request from at least one authorized user employing a first legacy user protocol.
Advanced Voice Recognition Systems, Inc.

Source-based automatic speech recognition

Recognizing a user's speech is a computationally demanding task. If a user calls a destination server, little may be known about the user or the user's speech profile.
Avaya Inc.

Visual confirmation for a recognized voice-initiated action

Techniques described herein provide a computing device configured to provide an indication that the computing device has recognized a voice-initiated action. In one example, a method is provided for outputting, by a computing device and for display, a speech recognition graphical user interface (gui) having at least one element in a first visual format.
Google Inc.

Electronic device and executing function using speech recognition thereof

An electronic device and a method for using speech recognition are provided. The electronic device includes an input device, a touch screen display, a processor, and a memory.
Samsung Electronics Co., Ltd.

Architecture for multi-domain natural language processing

Features are disclosed for processing a user utterance with respect to multiple subject matters or domains, and for selecting a likely result from a particular domain with which to respond to the utterance or otherwise take action. A user utterance may be transcribed by an automatic speech recognition (“asr”) module, and the results may be provided to a multi-domain natural language understanding (“nlu”) engine.
Amazon Technologies, Inc.

Method and system for adjusting user speech in a communication session

A system that incorporates the subject disclosure may include, for example, receive user speech captured at a second end user device during a communication session between the second end user device and a first end user device, apply speech recognition to the user speech, identify an unclear word in the user speech based on the speech recognition, adjust the user speech to generate adjusted user speech by replacing all or a portion of the unclear word with replacement audio content, and provide the adjusted user speech to the first end user device during the communication session. Other embodiments are disclosed..
At&t Intellectual Property I, L.p.

Apparatuses and methods for enhanced speech recognition in variable environments

Systems, apparatuses, and methods are described to increase a signal-to-noise ratio difference between a main channel and reference channel. The increased signal-to-noise ratio difference is accomplished with an adaptive threshold for a desired voice activity detector (dvad) and shaping filters.
Kopin Corporation

Speech recognition circuit using parallel processors

A speech recognition circuit comprises an input buffer for receiving processed speech parameters. A lexical memory contains lexical data for word recognition.
Zentian Limited

Mixed speech recognition

The claimed subject matter includes a system and method for recognizing mixed speech from a source. The method includes training a first neural network to recognize the speech signal spoken by the speaker with a higher level of a speech characteristic from a mixed speech sample.
Microsoft Technology Licensing, Llc

Apparatus and normalizing input data of acoustic model and speech recognition apparatus

An apparatus for normalizing input data of an acoustic model includes a window extractor configured to extract windows of frame data to be input to an acoustic model from frame data of a speech to be recognized, and a normalizer configured to normalize the frame data to be input to the acoustic model in units of the extracted windows.. .
Samsung Electronics Co., Ltd.

Speech interaction apparatus and method

According to one embodiment, a speech interaction apparatus for performing an interaction with a user based on a scenario includes a speech recognition unit, a determination unit, a selection unit and an execution unit. The speech recognition unit recognizes a speech of the user and generates a recognition result text.
Kabushiki Kaisha Toshiba

Information processing system, and vehicle-mounted device

This invention can enhance the convenience of a user. An information processing system 1 includes: a vehicle-mounted device 3 which has a sound pickup unit 36 that picks up a speech sound, and a transmitting unit that transmits speech data that is generated based on the speech sound that is picked up to a control server 8; and the control server 8 which has a server storage unit 82 that stores a pictogram correspondence table 82a in which recognition keywords and pictogram ids indicating a plurality of pictograms that correspond to the recognition keywords are associated, and a server control unit 81 which executes pictogram processing that selects a recognition keyword that corresponds to text representing a speech sound that is generated by speech recognition based on speech data from among the recognition keywords included in the pictogram correspondence table 82a, and in accordance with a predetermined condition, selects a single pictogram id from among a plurality of pictogram ids that are associated with the selected recognition keyword..
Clarion Co., Ltd.

Flexible schema for language model customization

The customization of language modeling components for speech recognition is provided. A list of language modeling components may be made available by a computing device.
Microsoft Technology Licensing, Llc

Dynamically adding or removing functionality to speech recognition systems

A system and method of changing features of an existing automatic speech recognition (asr) system includes: monitoring speech received from a vehicle occupant for one or more keywords identifying a feature to remove from or add to the asr system; detecting the keywords in the monitored speech; and adding the identified feature to or removing the identified feature from from the asr system.. .
Gm Global Technology Operations Llc

Techniques to provide a standard interface to a speech recognition platform

Techniques and systems to provide speech recognition services over a network using a standard interface are described. In an embodiment, a technique includes accepting a speech recognition request that includes at least audio input, via an application program interface (api).
Microsoft Technology Licensing, Llc

Speech recognition apparatus and method with acoustic modelling

Provided is a speech recognition apparatus. The apparatus includes a preprocessor configured to extract select frames from all frames of a first speech of a user, and a score calculator configured to calculate an acoustic score of a second speech, made up of the extracted select frames, by using a deep neural network (dnn)-based acoustic model, and to calculate an acoustic score of frames, of the first speech, other than the select frames based on the calculated acoustic score of the second speech..
Samsung Electronics Co., Ltd.

Voice language communication device and system

A voice language communication device and system that includes: a speaker; a microphone; a display panel; a control panel; a power button; a record button; software stored on a hard drive; a language database, where software accesses the language database during operation; a plurality of languages stored on the language database; speech recognition functions related to the software, where the speech recognition functions recognizes a user's language as an input language; and an output language, where the output language is a translation of the input language and the output language is instantaneously emitted to the speaker.. .

Streamlined navigational speech recognition

A system and method of performing automatic speech recognition (asr) includes: receiving speech at a vehicle microphone; communicating the received speech to an asr system; measuring an amount of time that elapses while speech is received; selecting a point-of-interest (poi) context or an address context based on the measured amount of received time; and processing the received speech using a poi context-based grammar when a poi context is selected or an address-based grammar when an address context is selected.. .
Gm Global Technology Operations Llc

Speech recognition system and gain setting system

When an instruction to start voice input is received from the user, a gain controller acquires, from a gain table which defines a correspondence between vehicle speed ranges and gains, a gain corresponding to a vehicle speed range including the vehicle speed of a vehicle detected by a vehicle speed detector, and sets the acquired gain as the gain of an input amplifier that amplifies an input audio signal output by a microphone. As a gain corresponding to each vehicle speed range, the gain table records a gain of the input amplifier corresponding, in an experimentally determined frequency distribution of peak values in the vehicle speed range, to a maximum frequency in the range of magnitude of voice output as an input audio signal by the microphone and to be input to a speech recognition engine as voice having a magnitude within the input range of the speech recognition engine..
Alpine Electronics, Inc.

Incremental utterance decoder combination for efficient and accurate decoding

An incremental speech recognition system. The incremental speech recognition system incrementally decodes a spoken utterance using an additional utterance decoder only when the additional utterance decoder is likely to add significant benefit to the combined result.
Microsoft Technology Licensing, Llc

System and determining recipient of spoken command in a control system

Disclosed is an apparatus and method for determining which controllable device an audible command is directed towards, the method comprising: receiving at each of two or more controlling devices the audible command signal, the audible command being directed to control at least one of two or more controllable devices controlled by a respective one of the two or more controlling devices; digitizing each of the received audible command signals; attaching a unique identifier to each digitized audible command so as to uniquely correlate it to a respective controlling device; determining a magnitude of each of the digitized audible command; determining a digitized audible command with the greatest magnitude, and further determining to which controlling device the audible command is directed to on the basis of the unique identifier associated with the digitized audible command with the greatest magnitude; performing speech recognition on the digitized audible command with the greatest magnitude; and forwarding a command to the controlling device corresponding to the digitized audible command with the greatest magnitude, the command corresponding to the audible command that can be implemented on the controllable device controlled by the controlling device.. .
Crestron Electronics, Inc.

Semiconductor device, system, electronic device, and speech recognition method

A semiconductor device is provided with a data storage unit configured to store speech reproduction data that includes transition destination information or speech recognition option data that includes transition destination information, and a processor configured to perform processing for generating an output speech signal using speech reproduction data read out from the data storage unit or perform speech recognition processing on an input speech signal using speech recognition option data read out from the data storage unit, and to read out, based on the transition destination information included in speech reproduction data or speech recognition option data used in the processing, speech recognition option data or speech reproduction data to be used in the next processing from the data storage unit.. .
Seiko Epson Corporation

Methods for speech enhancement and speech recognition using neural networks

The present invention relates to implementing a system and method to improve speech recognition and speech enhancement of noisy speech. The present invention discloses a way to improve the noise robustness of a speech recognition system by providing additional input to a neural network speech classifier.

Dynamic adaptation of language models and semantic tracking for automatic speech recognition

Generally, this disclosure provides systems, devices, methods and computer readable media for adaptation of language models and semantic tracking to improve automatic speech recognition (asr). A system for recognizing phrases of speech from a conversation may include an asr circuit configured to transcribe a user's speech to a first estimated text sequence, based on a generalized language model.
Intel Corporation

Multichannel raw-waveform neural networks

Methods, systems, and apparatus, including computer programs encoded on computer storage media, for using neural networks. One of the methods includes receiving, by a neural network in a speech recognition system, first data representing a first raw audio signal and second data representing a second raw audio signal, the first raw audio signal and the second raw audio signal for the same period of time, generating, by a spatial filtering convolutional layer in the neural network, a spatial filtered output the first data and the second data, generating, by a spectral filtering convolutional layer in the neural network, a spectral filtered output using the spatial filtered output, and processing, by one or more additional layers in the neural network, the spectral filtered output to predict sub-word units encoded in both the first raw audio signal and the second raw audio signal..
Google Inc.

Hotword detection on multiple devices

Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for hotword detection on multiple devices are disclosed. In one aspect, a method includes the actions of receiving, by a first computing device, audio data that corresponds to an utterance.
Google Inc.

Apparatus and speech recognition, and training transformation parameter

Provided are a method and an apparatus for speech recognition, and a method and an apparatus for training transformation parameter. A speech recognition apparatus includes an acoustic score calculator configured to use an acoustic model to calculate an acoustic score of a speech input, an acoustic score transformer configured to transform the calculated acoustic score into an acoustic score corresponding to standard pronunciation by using a transformation parameter, and a decoder configured to decode the transformed acoustic score to output a recognition result of the speech input..
Samsung Electronics Co., Ltd.

Automatic speech recognition confidence classifier

The described technology provides normalization of speech recognition confidence classifier (cc) scores that maintains the accuracy of acceptance metrics. A speech recognition cc scores quantitatively represents the correctness of decoded utterances in a defined range (e.g., [0,1]).
Microsoft Technology Licensing, Llc

Automatic speech recognition with detection of at least one contextual element, and application management and maintenance of aircraft

An automatic speech recognition with detection of at least one contextual element, and application to aircraft flying and maintenance are provided. The automatic speech recognition device comprises a unit for acquiring an audio signal, a device for detecting the state of at least one contextual element, and a language decoder for determining an oral instruction corresponding to the audio signal.
Dassault Aviation

Apparatus and generating acoustic model, and speech recognition

Described are an apparatus and method for generating to generate an acoustic model. The apparatus and method include a processor a processor configured to calculate a noise representation that represents noise data by using a noise model, and generate the acoustic model through training using training noisy speech data, which comprises speech data and the noise data, a string of phonemes corresponding to the speech data, and the noise representation..
Samsung Electronics Co., Ltd.

Methods and speech recognition using a garbage model

Methods and apparatus for performing speech recognition using a garbage model. The method comprises receiving audio comprising speech and processing at least some of the speech using a garbage model to produce a garbage speech recognition result.
Nuance Communication, Inc

Microphone placement for sound source direction estimation

Architectures of numbers of microphones and their positioning in a device for sound source direction estimation and source separation are presented. The directions of sources are front, back, left, right, top, and bottom of the device, and can be determined by amplitude and phase differences of microphone signals with proper microphone positioning.
Microsoft Technology Licensing, Llc

Method and device for speech recognition

Embodiments of the present disclosure provide a method and device for speech recognition. The solution comprises: receiving a first speech signal issued by a user; performing analog to digital conversion on the first speech signal to generate a first digital signal after the analog to digital conversion; extracting a first speech parameter from the first digital signal, the first speech parameter describing a speech feature of the first speech signal; if the first speech parameter coincides with a first prestored speech parameter in a sample library, executing control signalling instructed by the first digital signal, the sample library prestoring prestored speech parameters of n users, n≧1.
Beijing Boe Multimedia Technology Co., Ltd.

Speech recognition apparatus and method

An apparatus includes a language model group identifier configured to identify a language model group based on determined characteristic data of a user, and a language model generator configured to generate a user-based language model by interpolating a general language model for speech recognition based on the identified language model group.. .
Samsung Electronics Co., Ltd.

Method and system for remotely training and commanding the speech recognition system on a cockpit via a carry-on-device in a connected aircraft

A method for implementing a speaker-independent speech recognition system with reduced latency is provided. The method includes capturing voice data at a carry-on-device from a user during a pre-flight check-in performed by the user for an upcoming flight; extracting features associated with the user from the captured voice data at the carry-on-device; uplinking the extracted features to the speaker-independent speech recognition system onboard the aircraft; and adapting the extracted features with an acoustic feature model of the speaker-independent speech recognition system..
Honeywell International Inc.

Adapting a speech system to user pronunciation

A system and method of adapting a speech system includes the steps of: receiving confirmation of a phonetic transcription of one or more names, receiving confirmation of a selected stored text result, and storing the phonetic transcription with the selected stored text result using an automatic speech recognition (asr) system, a text-to-speech (tts) system, or both.. .
Gm Global Technology Operations Llc

Enhanced speech endpointing

Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for receiving audio data including an utterance, obtaining context data that indicates one or more expected speech recognition results, determining an expected speech recognition result based on the context data, receiving an intermediate speech recognition result generated by a speech recognition engine, comparing the intermediate speech recognition result to the expected speech recognition result for the audio data based on the context data, determining whether the intermediate speech recognition result corresponds to the expected speech recognition result for the audio data based on the context data, and setting an end of speech condition and providing a final speech recognition result in response to determining the intermediate speech recognition result matches the expected speech recognition result, the final speech recognition result including the one or more expected speech recognition results indicated by the context data.. .
Google Inc.

Enhanced speech endpointing

Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for receiving audio data including an utterance, obtaining context data that indicates one or more expected speech recognition results, determining an expected speech recognition result based on the context data, receiving an intermediate speech recognition result generated by a speech recognition engine, comparing the intermediate speech recognition result to the expected speech recognition result for the audio data based on the context data, determining whether the intermediate speech recognition result corresponds to the expected speech recognition result for the audio data based on the context data, and setting an end of speech condition and providing a final speech recognition result in response to determining the intermediate speech recognition result matches the expected speech recognition result, the final speech recognition result including the one or more expected speech recognition results indicated by the context data.. .
Google Inc.

Systems, computer-implemented methods, and tangible computer-readable storage media for transcription alignment

Disclosed herein are systems, computer-implemented methods, and tangible computer-readable storage media for captioning a media presentation. The method includes receiving automatic speech recognition (asr) output from a media presentation and a transcription of the media presentation.
At&t Intellectual Property I, L.p.

Audio-visual speech recognition with scattering operators

Aspects described herein are directed towards methods, computing devices, systems, and computer-readable media that apply scattering operations to extracted visual features of audiovisual input to generate predictions regarding the speech status of a subject. Visual scattering coefficients generated according to one or more aspects described herein may be used as input to a neural network operative to generate the predictions regarding the speech status of the subject.
Nuance Communications, Inc.

Building of n-gram language model for automatic speech recognition (asr)

A method, a system, and a computer program product for building an n-gram language model for an automatic speech recognition. The method includes reading training text data and additional text data both for the n-gram language model from a storage, and building the n-gram language model by a smoothing algorithm having discount parameters for n-gram counts.
International Business Machines Corporation

Method and improving a neural network language model, and speech recognition method and apparatus

According to one embodiment, an apparatus for improving a neural network language model of a speech recognition system includes a word classifying unit, a language model training unit and a vector incorporating unit. The word classifying unit classifies words in a lexicon of the speech recognition system.
Kabushiki Kaisha Toshiba

Method and improving a language model, and speech recognition method and apparatus

According to one embodiment, an apparatus for improving a language model of a speech recognition system includes an extracting unit, a classifying unit, and a setting unit. The extracting unit extracts user words from a user document provided by a user.
Kabushiki Kaisha Toshiba

Topic shift detector

Aspects detect or recognize shifts in topics in computer implemented speech recognition processes as a function of mapping keywords to non-verbal cues. An initial topic is mapped to one or more keywords extracted from a first spoken query within a user keyword ontology mapping.
International Business Machines Corporation

Speech recognition apparatus and method

A speech recognition apparatus and method. The speech recognition apparatus includes a first recognizer configured to generate a first recognition result of an audio signal, in a first linguistic recognition unit, by using an acoustic model, a second recognizer configured to generate a second recognition result of the audio signal, in a second linguistic recognition unit, by using a language model, and a combiner configured to combine the first recognition result and the second recognition result to generate a final recognition result in the second linguistic recognition unit and to reflect the final recognition result in the language model.
Samsung Electronics Co., Ltd.

Speech recognition apparatus, vehicle having the speech recognition apparatus, and controlling the vehicle

Disclosed herein are speech recognition apparatuses, vehicles having the speech recognition apparatuses, and methods for controlling vehicles. According to an aspect, a speech recognition apparatus includes a speech input unit configured to receive a speech command from a user, a communication unit configured to receive the result of processing for speech recognition acquired by at least one user terminal located near the user, and a controller configured to compare the result of processing for speech recognition acquired from the speech command received by the speech input unit to the result of processing for speech recognition acquired by the at least one user terminal, thus processing the speech command according to the result of the comparison..
Hyundai Motor Company

Speech recognition system with abbreviated training

A method of adapting a speech recognition system to its user includes gathering information about a user of a speech recognition system, selecting at least a part of a speech model reflecting estimated speech attributes of the user based on the information about the user, running, in the speech recognition system, a speech model including the selected at least a part of a speech model, and training, in the speech recognition system, other parts of the speech model to reflect identified speech attributes of the user.. .
Toyota Motor Engineering & Manufacturing North America, Inc.

Order statistic techniques for neural networks

According to some aspects, a method of classifying speech recognition results is provided, using a neural network comprising a plurality of interconnected network units, each network unit having one or more weight values, the method comprising using at least one computer, performing acts of providing a first vector as input to a first network layer comprising one or more network units of the neural network, transforming, by a first network unit of the one or more network units, the input vector to produce a plurality of values, the transformation being based at least in part on a plurality of weight values of the first network unit, sorting the plurality of values to produce a sorted plurality of values, and providing the sorted plurality of values as input to a second network layer of the neural network.. .
Nuance Communications, Inc.

Adaptation of speech recognition

A method, computer program product, and system for adapting speech recognition of a user's speech is provided. The method includes receiving a first utterance from a user having a duration below a predetermined threshold, identifying at least one further utterance from the user that provides additional information, generating a concatenated utterance by concatenating the first utterance with the at least one further utterance, transmitting the concatenated utterance to a speech recognition server, receiving a transcription of the concatenated utterance from the speech recognition server that includes a transcription of the first utterance, and extracting the transcription of the first utterance from the transcription of the concatenated utterance.
International Business Machines Corporation

Computer-implemented performing distributed speech recognition

A computer-implemented system and method for performing distributed speech recognition is provided. Audio data is collected.
Intellisist, Inc.

Information processing apparatus, control method, and program

There is provided an information processing apparatus, control method, and program capable of notifying a user of a candidate for a response, from the middle of a speech, through a voice u1, the information processing apparatus including: a semantic analysis unit configured to perform semantic analysis on speech text recognized by a speech recognition unit in the middle of a speech; a score calculation unit configured to calculate a score for a response candidate on the basis of a result of the analysis performed by the semantic analysis unit; and a notification control unit configured to perform control to notify of the response candidate, in the middle of the speech, according to the score calculated by the score calculation unit.. .
Sony Corporation

Speech recognition using an operating system hooking component for context-aware recognition models

Inputs provided into user interface elements of an application are observed. Records are made of the inputs and the state(s) the application was in while the inputs were provided.
Mmodal Ip Llc

Data augmentation method based on stochastic feature mapping for automatic speech recognition

A method of augmenting training data includes converting a feature sequence of a source speaker determined from a plurality of utterances within a transcript to a feature sequence of a target speaker under the same transcript, training a speaker-dependent acoustic model for the target speaker for corresponding speaker-specific acoustic characteristics, estimating a mapping function between the feature sequence of the source speaker and the speaker-dependent acoustic model of the target speaker, and mapping each utterance from each speaker in a training set using the mapping function to multiple selected target speakers in the training set.. .
International Business Machines Corporation

Speech recognition support for remote applications and desktops

An application may be hosted for utilization by a remote computing platform. User interface (ui) elements of a ui generated by the hosted application may be identified.
Citrix Systems, Inc.

Frequency warping in a speech recognition system

Methods, systems, and apparatus, including computer programs encoded on computer storage media, for receiving a sequence representing an utterance, the sequence comprising a plurality of audio frames; determining one or more warping factors for each audio frame in the sequence using a warping neural network; applying, for each audio frame, the one or more warping factors for the audio frame to the audio frame to generate a respective modified audio frame, wherein the applying comprises using at least one of the warping factors to scale a respective frequency of the audio frame to a new respective frequency in the respective modified audio frame; and decoding the modified audio frames using a decoding neural network, wherein the decoding neural network is configured to output a word sequence that is a transcription of the utterance.. .
Google Inc.

Computer-implemented efficient voice transcription

A computer-implemented system and method for efficient voice transcription is provided. A verbal message is processed by splitting the verbal message into segments and generating text for each of the segments via automated speech recognition.
Intellisist, Inc.

Insertion of characters in speech recognition

One embodiment provides a method, including: receiving, from an audio capture device, speech input; converting, using a processor, the speech input to machine text; receiving, from an alternate input source, an input comprising at least one character; identifying, using a processor, a location associated with the machine text to insert the at least one character; and inserting, using a processor, the at least one character at the location identified. Other aspects are described and claimed..
Lenovo (singapore) Pte. Ltd.

System and learning alternate pronunciations for speech recognition

A system and method for learning alternate pronunciations for speech recognition is disclosed. Alternative name pronunciations may be covered, through pronunciation learning, that have not been previously covered in a general pronunciation dictionary.
Interactive Intelligence Group, Inc.

Method and device for updating language model and performing speech recognition based on language model

A method of updating a grammar model used during speech recognition includes obtaining a corpus including at least one word, obtaining the at least one word from the corpus, splitting the at least one obtained word into at least one segment, generating a hint for recombining the at least one segment into the at least one word, and updating the grammar model by using at least one segment comprising the hint.. .
Samsung Electronics Co., Ltd.

Communication a smart phone with a text recognition module

A portable device can transmit information through one of a mobile phone network and an internet, wherein the portable device includes a text-based communication module to allow a user may synchronously transmit or receive data through a local area network, wherein the data is text, audio, video or the combination thereof. The text-based communication module of the portable device includes a text-to-speech recognition module used to convert a text data for outputting the text data by vocal, and a read determination module for determining read target terminals and unread target terminals when a user of the portable phone device activates the read determination module..

Business listing search

A method of operating a voice-enabled business directory search system includes receiving category-business pairs, each category-business pair including a business category and a specific business, and establishing a data structure having nodes based on the category-business pairs. Each node of the data structure is associated with one or more business categories and a speech recognition language model for recognizing specific businesses associated with the one or more businesses categories..
Google Inc.

Speech recognition method and mobile terminal

A speech recognition method and a mobile terminal relate to the field of electronic and information technologies, and can flexibly perform speech collection and improve a speech recognition rate. The method includes acquiring, by a mobile terminal, an orientation/motion status of the mobile terminal, and determining, according to the orientation/motion status, a voice collection apparatus for voice collection; acquiring, by the mobile terminal, a speech signal from the voice collection apparatus; and recognizing, by the mobile terminal, the speech signal.
Huawei Technologies Co., Ltd.

Apparatus and acoustic score calculation and speech recognition

An apparatus for calculating acoustic score, a method of calculating acoustic score, an apparatus for speech recognition, a method of speech recognition, and an electronic device including the same are provided. An apparatus for calculating acoustic score includes a preprocessor configured to sequentially extract audio frames into windows and a score calculator configured to calculate an acoustic score of a window by using a deep neural network (dnn)-based acoustic model..
Samsung Electronics Co., Ltd.

Unsupervised training method, training apparatus, and training program for an n-gram language model based upon recognition reliability

A computer-based, unsupervised training method for an n-gram language model includes reading, by a computer, recognition results obtained as a result of speech recognition of speech data; acquiring, by the computer, a reliability for each of the read recognition results; referring, by the computer, to the recognition result and the acquired reliability to select an n-gram entry; and training, by the computer, the n-gram language model about selected one of more of the n-gram entries using all recognition results.. .
International Business Machines Corporation

Speech recognition apparatus and method

A speech recognition apparatus includes a processor configured to recognize a user's speech using any one or combination of two or more of an acoustic model, a pronunciation dictionary including primitive words, and a language model including primitive words; and correct word spacing in a result of speech recognition based on a word-spacing model.. .
Samsung Electronics Co., Ltd.



Speech Recognition topics:
  • Speech Recognition
  • Communications
  • Computing Device
  • Heterogeneous
  • Conditional
  • Transcription
  • False Positive
  • Application Control
  • Natural Language
  • Embedded System
  • Electronic Device
  • Constraints
  • Central Processing Unit
  • Demultiplex
  • Interactive


  • Follow us on Twitter
    twitter icon@FreshPatents

    ###

    This listing is a sample listing of patent applications related to Speech Recognition for is only meant as a recent sample of applications filed, not a comprehensive history. There may be associated servicemarks and trademarks related to these patents. Please check with patent attorney if you need further assistance or plan to use for business purposes. This patent data is also published to the public by the USPTO and available for free on their website. Note that there may be alternative spellings for Speech Recognition with additional patents listed. Browse our RSS directory or Search for other possible listings.


    2.5685

    file did exist - 13813

    1 - 1 - 253