I have always fantasized about flying in the iron man suit and had wondered all my day how cool that would be. Also, the personal voice assistant Tony has is super cool. So today, my responsibility is to make you feel a bit more like a tech maniac like tony!
This article is dedicated to tony! We love you 3000 <3.
In this article, we will be focusing on forging a basic and easy voice assistant of our own. It would be a customizable voice assistant which you surely tweak with, as per your desires and requirements.
We will be making one with the following advancements:
- Speech Recognition
- Google text to speech
- Window Automation
- Web Browser automation.
Let the name be Jarvis, for the time being.
Features of our Voice assistant:
- Should tell me the current time
- Should tell me the date
- Should be able to Greet me when I run the program
- Should be able to redirect me to other pages
- Current window operations like max and min the window
- Play songs on youtube
- A search query on google
NOTE:
Do not change the way the words are spelt in the if-else conditions and the entire code can be found in the GitHub repo whose link can be found at the end of the article.
A slow internet connection will lead to delayed outputs.
One needs to be specific about the time at which he is speaking the words, although I have given a delay of at least 1-2 seconds everywhere so that you would be able to speak but still, it is something to keep on your tips.
Dependency
We would be needing speech_recognition for recognising whatever we have said. Text to speech for converting the text to speech. Pyautogui will handle basic windows automatically and the same for web browsers.
Active internet connection.
pip install speech_recognition pip install gtts pip install pyautogui pip install webbrowser
Importing packages
Using speech_recognition for speech recognition. Webbrowser will be used for opening the browser. Gtts for google text to speech converts text to speech. Pyautogui is used for window automation.
Datetime for date-time access.
from speech_recognition import Microphone, Recognizer import webbrowser as wb from gtts import gTTS #text to speech import pyautogui #for window automation import datetime import os import time import subprocess
Printing the time
now = str(datetime.datetime.now()) print(now) Output: 020-10-16 16:19:45.872990 now = now.split() print(now) Output: ['2020-10-16', '16:19:45.872990'] t = now[1].split(':') #for time: '16:19:45.872990' into list hour = int(t[0]) # hour : 16 min = int(t[1]) # minutes: 19 #print(hour) #time is in 24-hour clock #greeting the user Since the hour is 16 ie 4 pm, it will be saying good afternoon. if hour < 12: greetings = 'Good Morning!' #if hour less than 12pm, Good morning lang = 'en' obj = gTTS(text=greetings,lang=lang,slow=True) obj.save('greetings.mp3') #saves the mp3 file os.system('greetings.mp3') #to play the mp3 file time.sleep(2) #delay of 2 seconds pyautogui.moveTo(1910,10) pyautogui.click() elif hour > 12 and hour <+16: # hour less than 4pm and greater than 12pm, Good afternoon greetings = 'Good Afternoon!' lang = 'en' obj = gTTS(text=greetings, lang=lang) obj.save('greetings.mp3') os.system('greetings.mp3') time.sleep(1.6) #delay of 1.6 seconds pyautogui.moveTo(1910, 10) pyautogui.click() elif hour > 16 and hour<21: #hour less than 9pm and more than 4pm, Good Evening greetings = 'Good Evening!' lang = 'en' #en stands for english language. obj = gTTS(text=greetings, lang=lang, slow=True) obj.save('greetings.mp3') os.system('greetings.mp3') time.sleep(1.6) pyautogui.moveTo(1910, 10) pyautogui.click()
Menu
A menu-driven program which shows what are the basic features of the same. Show operations would show you the available operations in the same. Start operations are to kick start the process of different operations. Once you say start operations, it will start and show different operations on the same.
Speech recognition
The speech will be recognised using recognize_google(), which is an API calling approach. One needs to have an active internet connection to access speech recognition features.
print('-----------------------------------') print() print('1. SHOW OPERATIONS') print('2. START OPERATIONS') print('3. THANKS JARVIS') #to end the process time.sleep(2) r = Recognizer() #object instantiation mic = Microphone() #object instantiation while True: #infinite until the loop breaks try: print() print('-----------------------------------') print('Speak now') print('-----------------------------------') print() with mic as source: audio = r.listen(source) voice_text = r.recognize_google(audio) try: if voice_text == 'start operations': #if the said words start operations, then the following code. print(voice_text) r1 = Recognizer() mic1 = Microphone() while True: print() print('-----------------------------------') print('What operation?') print() print('1.Tell Date and Time') print('2.Open Social Media Handles') print('3.Play Songs on Youtube') print('4.Maximize the current window') print('5.Minimize the current window') print('6.Search query on Google') print('-----------------------------------') print() time.sleep(1) # 1 sec of delay with mic1 as source_again: audio1 = r.listen(source_again) command = r1.recognize_google(audio1) if command == 'maximize the window': pyautogui.getActiveWindow().maximize() elif command == 'minimise the window': pyautogui.getActiveWindow().minimize()
To play songs on youtube, it will redirect you to Youtube’s page where you can listen to the song of your own choice.
elif command == 'play songs on YouTube': url = 'https://www.youtube.com/results?search_query=play+songs' wb.get().open_new(url) #opens a new window on browser time.sleep(1) pyautogui.click(220,220) break
Instead of a dog, you write any word of your choice, to keep it simple have mentioned dog here.
elif command == 'search dogs on Google': command = command.split(' ') query = command[1] url = f'https: //{query}' wb.get().open_new(url) elif command == 'Open File explorer': subprocess.Popen('C:/Users/91884/Desktop')
To tell the time, the time will be told spoken via text to speech from the system itself.
elif command == 'What is the time?': telling_the_time = f'{hour} hours and {min} minutes' lang = 'en' obj = gTTS(text=telling_the_time, lang=lang, slow=True) obj.save('time.mp3') os.system('time.mp3') time.sleep(4) pyautogui.moveTo(1910, 10) pyautogui.click() else: print(command) break elif voice_text == 'show operations':# list of operations print('1.Tell Date and Time') print('2.Open Social Media Handles') print('3.Play Songs on Youtube') print('4.Maximize the current window') print('5.Minimize the current window') print('6.Search query on Google') elif voice_text == 'thanks Jarvis': print('Have a great day!') break else: print(voice_text) except Exception: break except Exception: break print('Hope you liked the fun project!') print('Follow Bhavishya Pandit on Linkedin!') print(‘Subscribe to AIM for more such articles!’)
Now It’s Time to See the Results
Output: for opening songs on youtube
A new tab on your chrome will be opened with a list of recommended songs for you.
Output: for searching query on google.
A new tab will open with the following results on google.
Now, the most interesting part of this article, the recorded video depicting how the above program works is given below.
Conclusion
The main aim of this article is to get a better understanding of the advancements in technology and how easily we can really make things possible. The additional tech touches used here are speech recognition, text to speech, web and window automation.
From this, we surely can conclude that all that was fiction, is coming to reality today!
Future isn’t near, the future is here!
Subscribe AIM for more such articles. You can follow me on Linkedin to stay updated on the same.
The complete code can be found at the AIM’s GitHub repository. Please visit this link to find code.