• About
  • Advertise
  • Privacy & Policy
  • Contact
  • Disclaimer
  • Latest
  • How To’s
  • ReviewNew
    Best SSD's for laptop

    Best SSD for Laptop – Budget, Mid-Range, High-Range

    One Plus 8

    OnePlus 8 : The Best Smartphone Ever?

  • GamingNew
    Tips to Optimize Gaming Experience on Mac

    Tips to Optimize Gaming Experience on Mac

    AMD

    AMD’s new Adrenalin Drivers delivers massive performance boost!

    Philips 288E2UAE

    Philips 288E2UAE: An affordable 4k 28″ monitor with a USB 3.2 hub

    What is Latency

    WHAT IS LATENCY? Explained

    AMD

    All that AMD announced at CES 2021

    ASUS ROG ZEPHYRUS

    Leaked! Asus ROG Zephyrus G15 GA503QS on Amazon

No Result
View All Result
  • Latest
  • How To’s
  • ReviewNew
    Best SSD's for laptop

    Best SSD for Laptop – Budget, Mid-Range, High-Range

    One Plus 8

    OnePlus 8 : The Best Smartphone Ever?

  • GamingNew
    Tips to Optimize Gaming Experience on Mac

    Tips to Optimize Gaming Experience on Mac

    AMD

    AMD’s new Adrenalin Drivers delivers massive performance boost!

    Philips 288E2UAE

    Philips 288E2UAE: An affordable 4k 28″ monitor with a USB 3.2 hub

    What is Latency

    WHAT IS LATENCY? Explained

    AMD

    All that AMD announced at CES 2021

    ASUS ROG ZEPHYRUS

    Leaked! Asus ROG Zephyrus G15 GA503QS on Amazon

No Result
View All Result
Unboxstop
No Result
View All Result

What is Speech Recognition?

by Soumyarup
November 1, 2020
6 min read
Home Artificial Intelligence
Share on FacebookShare on Twitter

What is Speech Recognition?

There is a long answer and a short answer to “What is Speech Recognition?” The short answer is, speech recognition is a study in the field of Computer Science that enables a computer to listen and convert verbally spoken words into text. However, if you are a person who enjoys progressing beyond a simple one-line answer, please feel free to read on.

Speech recognition is a subfield of Computational Linguistics, which deals with developing methodologies that allow a computer to automatically recognize spoken words or language and translate the same into text. It is also known as Automatic Speech Recognition (ASR) or Speech-to-Text. Though rudimentary methodologies for Speech-to-Text might be able to handle a single language or a particular accent or dialect, more advanced systems exist that can handle multiple languages, accents and dialects. Although computer scientists have, throughout the 20th and the 21st century, dedicated decades of research and development into the field, it is still far from complete or perfect. That is due to the continually morphing and incredibly complex nature of human language in its current state and as it continues evolving with time.

Important Note:

It is essential to understand the difference between Speech Recognition and Voice Recognition, as they are often mistakenly used interchangeably. Speech Recognition refers to a subfield of Computer Linguistics as discussed above, whereas, in Voice Recognition, a computer uses the vocal properties of the speaker for identification purposes, similar to a verbal fingerprint.

Voice Recognition, however, is often used alongside Speech Recognition, to enable the computer not only to translate speech into text but also to know who is speaking to the same. For instance, if a computer knows who uses what kind of accent while speaking, it can be more accurate at transcribing the same person. However, technology, as ground-breaking as Speech Recognition, follows a fascinating developmental history. Let’s take a few moments briefly discussing the same before we move on.

History of Speech Recognition 

'Audrey' a single-speaker digit recognition created by Researchers at Bell Labs (1)

Research and Development on Speech Recognition technology started as early as the 1950s. Three researchers from Bell Labs, Stephen Balashek, R. Biddulph and K.H. Davis built a system called “Audrey’ for single-speaker digit recognition.

The Shoebox Source: https://www.ibm.com/ibm/history/ibm100/us/en/icons/speechreco/transform/

After the Second World War, IBM created a 16-word “shoebox” machine. Shuzo Saito proposed a speech coding method called Linear Predictive Coding.

By the 1980s, IBM had created a voice-activated typewriter called Tangora, which could handle a 20,000-word vocabulary. The 1980s also saw the introduction of the n-gram language model, which was an essential step towards practical Speech Recognition. In the early 2000s, DARPA sponsored two speech recognition programs, namely, Effective Affordable Reusable Speech-to-Text (EARS) and Global Autonomous Language Exploitation (GALE).

Google’s first foray into Speech Recognition came in the year 2007 after hiring a few researchers from Nuance, with the GOOG-411, a telephone-based directory service, the data from which was instrumental in developing Google’s Speech Recognition system even further. Today, Google’s Voice Search feature is supported in over 30 languages.

Hidden Markov Models (more on it later) was the dominating technique for Speech Recognition Systems with feedforward from Artificial Neural Networks. With the introduction of Deep Learning, many aspects have been replaced or augmented by a new Deep Learning technique called Long Short-Term Memory (LSTM), a recurrent neural network technique, introduced in 1997. A prevalent implementation of said technique can be found in the Google Speech Recognition software in the palms of every smartphone user.

The use of non-recurrent networks for acoustic modelling was introduced in 2009, which contributed significantly to improving the accuracy of Speech Recognition Systems. From the 1980s, the 1990s until the 2000s, multiple challenges plagued the overall performance of neural networks based techniques. ANN techniques for a long time, thus, could not outperform the Hidden Markov Model-based techniques. This challenge was, however, overcome during the 2010s.

How does Modern Speech Recognition model work?

Speech Recognition uses algorithms through acoustic and language modelling to essentially convert the sentences you vocalize into statistical information. You can think of language and speech modelling as two different methods to generate said data for simplicity. In modern speech recognition, both the models are used in unison to transcribe speech into text. Let’s gather some understanding of the same one at a time. We will start with Acoustic Modelling and then proceed to briefly discuss Language Modelling.

So what is Acoustic Modelling? In a language, we use specific pronunciations or sounds to separate one word from another. For example, in the English Language, we use ‘d’ and ‘t’ in dividing the word ‘bad’ from ‘bat’. Such sounds are called phonemes. In Acoustic Modelling, the computer attempts to create statistical data to represent the relationship between phonemes or other units of a language from the verbal input. Now, in Language Modelling, the computer makes a probability distribution over an entire sequence of words. That gives it context to better distinguish between words or phrases that might sound the same but mean different things. The n-gram model (mentioned earlier) is really very simple and works on the same concept of creating probability distributions.

Moving on, let’s take a brief look at various other models used for speech recognition:

  • Hidden Markov Models (HMMs):

HMMs were one of the earliest models to be used for the purpose of speech recognition before Deep Learning models started becoming increasingly popular. In an HMM model, it is assumed that the system which needs to be modelled is a Markov Process. What is a Markov Process? Consider the following example. Suppose a Genie is in a room which the observer does not have access to. There are three urns in the room and three balls in each of the urns. The Genie selects an urn and returns a ball to the observer. There is no relation between the present urn being chosen and any urn that was made before the present moment. This process is a Markov Process.

Thus, the observer knows what the ball is. However, he/she does not have any idea as to which urn the ball was pulled from. In other words, the first process is ‘hidden’ from the observer. Thus, ‘Hidden Markov Process‘. Clearly, The system of the Genie in the room has unobservable states. However, the system of the observer receiving the balls outside the room is dependent on that system. In HMMs, we try to learn about the unobservable Markov system by observing the dependent system.

  • Natural Language Processing (NLP):

Natural Language Processing is actually a subfield of Artificial Intelligence instead of any particular model, which focuses on human-machine interaction through language, text and speech. Most ‘assistant’ software, Siri, for instance, make use of Natural Language Processing algorithms to understand verbal commands.

  • Neural Networks:

Neural Networks, as the name states, is a method that is modelled after the mechanism of information transfer that exists between neurons in the human brain and is primarily utilized in Deep Learning. What that translated to, is that a network of nodes, arranged in layers, make up the neural network. The more the number of layers, the deeper the neural network. Whenever an output from a node exceeds a certain threshold, the node ‘fires’ transferring information to the next layer of the network, thus mimicking the brain in its functionality. Neural Networks learn the mapping function through supervised learning, by minimizing the loss function using methods like gradient descent. While generally more accurate, they also tend to be slower as compared to more traditional language models.

CONCLUSION

With so much technology dominating the world we inhabit and surrounding us at all times, it is easy to forget the details of how they work. Easy to ignore the millions of lines of code that keeps a plane from falling from the sky or a space station in orbit, along with all the genius that went into creating incredible techniques to make it all possible. It is a combination of decades of research that has allowed the device you hold to transcribe your voice into text, understand the same and fetch you the most relevant results out of trillions of answers that any query might generate.

Tags: speech recognition
Soumyarup

Soumyarup

Next Post
How to format laptop

How to Format Laptop?

Leave a Reply Cancel reply

Your email address will not be published. Required fields are marked *

Recommended.

What is a Drone

What is a Drone? or Unmanned Aerial Vehicle?

October 17, 2020
Cyberpunk2077

Cyberpunk 2077 performance too low? Try EZ Optimizer 2.0

February 7, 2021

Trending.

How to unprotect Excel sheet without password

How to unprotect Excel sheet without password?

October 25, 2020
IOT Architecture

What is IOT Architecture? Explained

February 17, 2021
Top 5 Cloud Gaming Services

Top 5 Cloud Gaming Services

January 16, 2021
ASUS ROG ZEPHYRUS

Leaked! Asus ROG Zephyrus G15 GA503QS on Amazon

January 3, 2021
What is Bluestacks

What is Bluestacks? Is it legal?

January 9, 2021

About Us

Unboxstop

Unboxstop is a one-stop platform for all the technology news around the world.

Follow Us

Categories

  • 3D Printing
  • 5G
  • Artificial Intelligence
  • Best
  • Blockchain
  • Drones New
  • Electric Cars
  • Featured
  • Gaming
  • How To's
  • IOT
  • Latest
  • Review
  • Virtual Reality

Tags

3D Printing 4K 5G augmented reality Best Big Data Bitcoin Blockchain CES 2021 Cloud Computing Cloud Gaming coronavirus Data Science Drones Electric Cars Gaming Hardware How To How To's Industry 4.0 Internet IOT Jet Engines Laptops Machine Learning News Productivity Review Robotics Solar Panels speech recognition Technology Trends Tesla Transport TVs Video virtual reality
  • About
  • Advertise
  • Privacy & Policy
  • Contact
  • Disclaimer

© 2020 UNBOXSTOP

No Result
View All Result
  • Latest
  • How To’s
  • Review
  • Gaming

© 2020 UNBOXSTOP

Welcome Back!

Login to your account below

Forgotten Password?

Create New Account!

Fill the forms bellow to register

All fields are required. Log In

Retrieve your password

Please enter your username or email address to reset your password.

Log In

Add New Playlist