Active Hackathon

Being Tony Stark: How To Build A Voice Assistant Of Your Own?

In this article, we will be focusing on forging a basic and easy voice assistant of our own. It would be a customizable voice assistant which you surely tweak with, as per your desires and requirements.
voice assistant

I have always fantasized about flying in the iron man suit and had wondered all my day how cool that would be. Also, the personal voice assistant Tony has is super cool. So today, my responsibility is to make you feel a bit more like a tech maniac like tony! 

This article is dedicated to tony! We love you 3000 <3.


Sign up for your weekly dose of what's up in emerging technology.

In this article, we will be focusing on forging a basic and easy voice assistant of our own. It would be a customizable voice assistant which you surely tweak with, as per your desires and requirements.

We will be making one with the following advancements:

  1. Speech Recognition
  2. Google text to speech
  3. Window Automation
  4. Web Browser automation.

Let the name be Jarvis, for the time being.

Features of our Voice assistant:

  1. Should tell me the current time
  2. Should tell me the date
  3. Should be able to Greet me when I run the program
  4. Should be able to redirect me to other pages
  5. Current window operations like max and min the window
  6. Play songs on youtube
  7. A search query on google


Do not change the way the words are spelt in the if-else conditions and the entire code can be found in the GitHub repo whose link can be found at the end of the article.

A slow internet connection will lead to delayed outputs.

One needs to be specific about the time at which he is speaking the words, although I have given a delay of at least 1-2 seconds everywhere so that you would be able to speak but still, it is something to keep on your tips.


We would be needing speech_recognition for recognising whatever we have said. Text to speech for converting the text to speech. Pyautogui will handle basic windows automatically and the same for web browsers.

Active internet connection.

pip install speech_recognition
pip install gtts
pip install pyautogui
pip install webbrowser

Importing packages

Using speech_recognition for speech recognition. Webbrowser will be used for opening the browser. Gtts for google text to speech converts text to speech. Pyautogui is used for window automation. 

Datetime for date-time access. 

from speech_recognition import Microphone, Recognizer 
import webbrowser as wb
from gtts import gTTS #text to speech
import pyautogui #for window automation
import datetime
import os
import time
import subprocess

Printing the time

now = str(
Output: 020-10-16 16:19:45.872990
now = now.split()
Output: ['2020-10-16', '16:19:45.872990']
t = now[1].split(':') #for time:  '16:19:45.872990' into list
hour = int(t[0]) # hour : 16
min = int(t[1]) # minutes: 19
#time is in 24-hour clock
#greeting the user
Since the hour is 16 ie 4 pm, it will be saying good afternoon.
if hour < 12:
   greetings = 'Good Morning!' #if hour less than 12pm, Good morning
   lang = 'en'
   obj = gTTS(text=greetings,lang=lang,slow=True)'greetings.mp3') #saves the mp3 file
   os.system('greetings.mp3') #to play the mp3 file
   time.sleep(2) #delay of 2 seconds
elif hour > 12 and hour <+16:    # hour less than 4pm and greater than 12pm, Good afternoon
   greetings = 'Good Afternoon!'
   lang = 'en'
   obj = gTTS(text=greetings, lang=lang)'greetings.mp3')
   time.sleep(1.6) #delay of 1.6 seconds
   pyautogui.moveTo(1910, 10)
elif hour > 16 and hour<21: #hour less than 9pm and more than 4pm, Good Evening
   greetings = 'Good Evening!'
   lang = 'en' #en stands for english language.
   obj = gTTS(text=greetings, lang=lang, slow=True)'greetings.mp3')
   pyautogui.moveTo(1910, 10)


A menu-driven program which shows what are the basic features of the same. Show operations would show you the available operations in the same. Start operations are to kick start the process of different operations. Once you say start operations, it will start and show different operations on the same.

Speech recognition

The speech will be recognised using recognize_google(), which is an API calling approach. One needs to have an active internet connection to access speech recognition features.

print('3. THANKS JARVIS') #to end the process
r = Recognizer() #object instantiation
mic = Microphone() #object instantiation
while True: #infinite until the loop breaks
       print('Speak now')
       with mic as source:
           audio = r.listen(source)
           voice_text = r.recognize_google(audio)
               if voice_text == 'start operations': #if the said words start operations, then the following code.
                   r1 = Recognizer()
                   mic1 = Microphone()
                   while True:
                       print('What operation?')
                       print('1.Tell Date and Time')
                       print('2.Open Social Media Handles')
                       print('3.Play Songs on Youtube')
                       print('4.Maximize the current window')
                       print('5.Minimize the current window')
                       print('6.Search query on Google')
                       time.sleep(1) # 1 sec of delay
                       with mic1 as source_again:
                           audio1 = r.listen(source_again)
                           command = r1.recognize_google(audio1)
                           if command == 'maximize the window':
                           elif command == 'minimise the window':

To play songs on youtube, it will redirect you to Youtube’s page where you can listen to the song of your own choice.

 elif command == 'play songs on YouTube':
                               url = ''
                               wb.get().open_new(url) #opens a new window on browser

Instead of a dog, you write any word of your choice, to keep it simple have mentioned dog here.

elif command == 'search dogs on Google':
                               command = command.split(' ')
                               query = command[1]
                               url = f'https: //{query}'
                           elif command == 'Open File explorer':

To tell the time, the time will be told spoken via text to speech from the system itself. 

                           elif command == 'What is the time?':
                               telling_the_time = f'{hour} hours and {min} minutes'
                               lang = 'en'
                               obj = gTTS(text=telling_the_time, lang=lang, slow=True)
                               pyautogui.moveTo(1910, 10)
               elif voice_text == 'show operations':# list of operations 
                   print('1.Tell Date and Time')
                   print('2.Open Social Media Handles')
                   print('3.Play Songs on Youtube')
                   print('4.Maximize the current window')
                   print('5.Minimize the current window')
                   print('6.Search query on Google')
               elif voice_text == 'thanks Jarvis':
                   print('Have a great day!')
           except Exception:
   except Exception:
print('Hope you liked the fun project!')
print('Follow Bhavishya Pandit on Linkedin!')
print(‘Subscribe to AIM for more such articles!’)

Now It’s Time to See the Results

Output: for opening songs on youtube

A new tab on your chrome will be opened with a list of recommended songs for you.

voice assistant

Output: for searching query on google.

A new tab will open with the following results on google.

voice assistant

Now, the most interesting part of this article, the recorded video depicting how the above program works is given below.


The main aim of this article is to get a better understanding of the advancements in technology and how easily we can really make things possible. The additional tech touches used here are speech recognition, text to speech, web and window automation.

From this, we surely can conclude that all that was fiction, is coming to reality today!

Future isn’t near, the future is here! 

Subscribe AIM for more such articles. You can follow me on Linkedin to stay updated on the same.
The complete code can be found at the AIM’s GitHub repository. Please visit this link to find code.

More Great AIM Stories

Bhavishya Pandit
Understanding and building fathomable approaches to problem statements is what I like the most. I love talking about conversations whose main plot is machine learning, computer vision, deep learning, data analysis and visualization. Apart from them, my interest also lies in listening to business podcasts, use cases and reading self help books.

Our Upcoming Events

Conference, Virtual
Genpact Analytics Career Day
3rd Sep

Conference, in-person (Bangalore)
Cypher 2022
21-23rd Sep

Conference, in-person (Bangalore)
Machine Learning Developers Summit (MLDS) 2023
19-20th Jan

Conference, in-person (Bangalore)
Data Engineering Summit (DES) 2023
21st Apr, 2023

3 Ways to Join our Community

Discord Server

Stay Connected with a larger ecosystem of data science and ML Professionals

Telegram Channel

Discover special offers, top stories, upcoming events, and more.

Subscribe to our newsletter

Get the latest updates from AIM
How Data Science Can Help Overcome The Global Chip Shortage

China-Taiwan standoff might increase Global chip shortage

After Nancy Pelosi’s visit to Taiwan, Chinese aircraft are violating Taiwan’s airspace. The escalation made TSMC’s chairman go public and threaten the world with consequences. Can this move by China fuel a global chip shortage?

Another bill bites the dust

The Bill had faced heavy criticism from different stakeholders -citizens, tech firms, political parties since its inception

So long, Spotify

‘TikTok Music’ is set to take over the online streaming space, but there exists an app that has silently established itself in the Indian market.