Last updated May 2, 2021
In AI Mysteries

Being Tony Stark: How To Build A Voice Assistant Of Your Own?

In this article, we will be focusing on forging a basic and easy voice assistant of our own. It would be a customizable voice assistant which you surely tweak with, as per your desires and requirements.

Share

Published on October 19, 2020

by Bhavishya Pandit

I have always fantasized about flying in the iron man suit and had wondered all my day how cool that would be. Also, the personal voice assistant Tony has is super cool. So today, my responsibility is to make you feel a bit more like a tech maniac like tony!

This article is dedicated to tony! We love you 3000 <3.

We will be making one with the following advancements:

Speech Recognition
Google text to speech
Window Automation
Web Browser automation.

Let the name be Jarvis, for the time being.

Features of our Voice assistant:

Should tell me the current time
Should tell me the date
Should be able to Greet me when I run the program
Should be able to redirect me to other pages
Current window operations like max and min the window
Play songs on youtube
A search query on google

NOTE:

Do not change the way the words are spelt in the if-else conditions and the entire code can be found in the GitHub repo whose link can be found at the end of the article.

A slow internet connection will lead to delayed outputs.

One needs to be specific about the time at which he is speaking the words, although I have given a delay of at least 1-2 seconds everywhere so that you would be able to speak but still, it is something to keep on your tips.

Dependency

We would be needing speech_recognition for recognising whatever we have said. Text to speech for converting the text to speech. Pyautogui will handle basic windows automatically and the same for web browsers.

Active internet connection.

pip install speech_recognition
pip install gtts
pip install pyautogui
pip install webbrowser

Importing packages

Using speech_recognition for speech recognition. Webbrowser will be used for opening the browser. Gtts for google text to speech converts text to speech. Pyautogui is used for window automation.

Datetime for date-time access.

from speech_recognition import Microphone, Recognizer 
import webbrowser as wb
from gtts import gTTS #text to speech
import pyautogui #for window automation
import datetime
import os
import time
import subprocess

Printing the time

now = str(datetime.datetime.now())
print(now)
Output: 020-10-16 16:19:45.872990
now = now.split()
print(now)
Output: ['2020-10-16', '16:19:45.872990']
t = now[1].split(':') #for time:  '16:19:45.872990' into list
hour = int(t[0]) # hour : 16
min = int(t[1]) # minutes: 19
#print(hour)
#time is in 24-hour clock
#greeting the user
Since the hour is 16 ie 4 pm, it will be saying good afternoon.
if hour < 12:
   greetings = 'Good Morning!' #if hour less than 12pm, Good morning
   lang = 'en'
   obj = gTTS(text=greetings,lang=lang,slow=True)
   obj.save('greetings.mp3') #saves the mp3 file
   os.system('greetings.mp3') #to play the mp3 file
   time.sleep(2) #delay of 2 seconds
   pyautogui.moveTo(1910,10)
   pyautogui.click()
elif hour > 12 and hour <+16:    # hour less than 4pm and greater than 12pm, Good afternoon
   greetings = 'Good Afternoon!'
   lang = 'en'
   obj = gTTS(text=greetings, lang=lang)
   obj.save('greetings.mp3')
   os.system('greetings.mp3')
   time.sleep(1.6) #delay of 1.6 seconds
   pyautogui.moveTo(1910, 10)
   pyautogui.click()
elif hour > 16 and hour<21: #hour less than 9pm and more than 4pm, Good Evening
   greetings = 'Good Evening!'
   lang = 'en' #en stands for english language.
   obj = gTTS(text=greetings, lang=lang, slow=True)
   obj.save('greetings.mp3')
   os.system('greetings.mp3')
   time.sleep(1.6)
   pyautogui.moveTo(1910, 10)
   pyautogui.click()

A menu-driven program which shows what are the basic features of the same. Show operations would show you the available operations in the same. Start operations are to kick start the process of different operations. Once you say start operations, it will start and show different operations on the same.

Speech recognition

The speech will be recognised using recognize_google(), which is an API calling approach. One needs to have an active internet connection to access speech recognition features.

print('-----------------------------------')
print()
print('1. SHOW OPERATIONS')
print('2. START OPERATIONS')
print('3. THANKS JARVIS') #to end the process
time.sleep(2)
r = Recognizer() #object instantiation
mic = Microphone() #object instantiation
while True: #infinite until the loop breaks
   try:
       print()
       print('-----------------------------------')
       print('Speak now')
       print('-----------------------------------')
       print()
       with mic as source:
           audio = r.listen(source)
           voice_text = r.recognize_google(audio)
           try:
               if voice_text == 'start operations': #if the said words start operations, then the following code.
                   print(voice_text)
                   r1 = Recognizer()
                   mic1 = Microphone()
                   while True:
                       print()
                       print('-----------------------------------')
                       print('What operation?')
                       print()
                       print('1.Tell Date and Time')
                       print('2.Open Social Media Handles')
                       print('3.Play Songs on Youtube')
                       print('4.Maximize the current window')
                       print('5.Minimize the current window')
                       print('6.Search query on Google')
                       print('-----------------------------------')
                       print()
                       time.sleep(1) # 1 sec of delay
                       with mic1 as source_again:
                           audio1 = r.listen(source_again)
                           command = r1.recognize_google(audio1)
                           if command == 'maximize the window':
                               pyautogui.getActiveWindow().maximize()
                           elif command == 'minimise the window':
                               pyautogui.getActiveWindow().minimize()

To play songs on youtube, it will redirect you to Youtube’s page where you can listen to the song of your own choice.

 elif command == 'play songs on YouTube':
                               url = 'https://www.youtube.com/results?search_query=play+songs'
                               wb.get().open_new(url) #opens a new window on browser
                               time.sleep(1)
                               pyautogui.click(220,220)
                               break

Instead of a dog, you write any word of your choice, to keep it simple have mentioned dog here.

elif command == 'search dogs on Google':
                               command = command.split(' ')
                               query = command[1]
                               url = f'https: //{query}'
                               wb.get().open_new(url)
                           elif command == 'Open File explorer':
                               subprocess.Popen('C:/Users/91884/Desktop')

To tell the time, the time will be told spoken via text to speech from the system itself.

                           elif command == 'What is the time?':
                               telling_the_time = f'{hour} hours and {min} minutes'
                               lang = 'en'
                               obj = gTTS(text=telling_the_time, lang=lang, slow=True)
                               obj.save('time.mp3')
                               os.system('time.mp3')
                               time.sleep(4)
                               pyautogui.moveTo(1910, 10)
                               pyautogui.click()
                           else:
                               print(command)
                               break
               elif voice_text == 'show operations':# list of operations 
                   print('1.Tell Date and Time')
                   print('2.Open Social Media Handles')
                   print('3.Play Songs on Youtube')
                   print('4.Maximize the current window')
                   print('5.Minimize the current window')
                   print('6.Search query on Google')
               elif voice_text == 'thanks Jarvis':
                   print('Have a great day!')
                   break
               else:
                   print(voice_text)
           except Exception:
               break
   except Exception:
       break
print('Hope you liked the fun project!')
print('Follow Bhavishya Pandit on Linkedin!')
print(‘Subscribe to AIM for more such articles!’)

Now It’s Time to See the Results

Output: for opening songs on youtube

A new tab on your chrome will be opened with a list of recommended songs for you.

Output: for searching query on google.

A new tab will open with the following results on google.

Now, the most interesting part of this article, the recorded video depicting how the above program works is given below.

Conclusion

The main aim of this article is to get a better understanding of the advancements in technology and how easily we can really make things possible. The additional tech touches used here are speech recognition, text to speech, web and window automation.

From this, we surely can conclude that all that was fiction, is coming to reality today!

Future isn’t near, the future is here!

Subscribe AIM for more such articles. You can follow me on Linkedin to stay updated on the same.
The complete code can be found at the AIM’s GitHub repository. Please visit this link to find code.

Access all our open Survey & Awards Nomination forms in one place

Bhavishya Pandit

Understanding and building fathomable approaches to problem statements is what I like the most. I love talking about conversations whose main plot is machine learning, computer vision, deep learning, data analysis and visualization. Apart from them, my interest also lies in listening to business podcasts, use cases and reading self help books.