What if one were to tell you that Big Brother can be a ‘good’ brother? Notwithstanding the privacy concerns and debates around surveillance cameras, metropolitan cities today are covered with them. Yet, most of the time, there is no one governing movement through the camera to catch crimes occurring in real-time. This stands true of India with its high sexual abuse cases, many of which happen in public places in broad daylight. Souvik Ghosh, a third-year engineering student, has created a model, ‘Sathi’, an automated defender of harassment in public places. The prototype shows a woman showing a fist or the index finger to any surveillance camera on the street and the camera alerting bypassers with a loud siren and noticing the police, all on its own. In an interview with Analytics India Magazine, Souvik spoke about Sathi and his life as a student data scientist.
Sign up for your weekly dose of what's up in emerging technology.
AIM: What was the idea behind the Sathi model?
My interests lie in data science, computer vision and deep learning. Along with being a tech enthusiast, I am also a social worker. I work in the Kolkata red light area and teach the students there. This has allowed me to see the abuse that happens first handed. Later, I found that one in every three women has experienced sexual harassment in India. My female friends also spoke about such experiences, and I realised harassment usually happens in places like busy roads, crowded buses, local trains, educational institutions and more. But many women feel hesitant to bring it to notice or shout in public places. They fail to react. Searching for solutions for this made me realise that metropolitan cities like Mumbai, Delhi or Kolkata have surveillance cameras around the city – even transport, streets and colleges. Additionally, most people carry smartphones on them always. But no one is really checking the camera footage to identify such crimes. I realised we needed the camera to be automated, and it led me to create Sathi.
AIM: Can you run us through Sathi?
Let’s suppose anyone runs into a situation where they feel threatened. They need to show their fist in a sign of protest towards the camera, be it the CCTV camera or even their smartphone camera. The device will start playing a loud alarm or siren sound upon registering the fist. This would alert the citizens around. Alternatively, if the person shows their index finger or points to number one, the system will directly inform the police. It will also capture and send the images of that spot with a time graph to the police. The system will capture every face in that scene as passport-sized faces as evidence to use later. For the app, the person can show a fist to the mobile phone camera, and the phone will directly call the police.
AIM: Tell us about your tech stack. Additionally, as a student data scientist, how did you access the needed resources?
Thanks to the internet, we have great free courses on YouTube and Coursera. The education resources are easy. When it comes to the system, we dont need access to perfect systems. We have Kaggle and Google Collab that provide huge RAMs. Even if you stay in the remotest parts of the country with the cheapest laptops, you can contribute to system creation.
My technological stack was mainly deep learning, computer vision and data science. My base language was Python. I used TensorFlow and MediaPipe for Face detection for computer vision, to ensure the system could detect faces even in low light areas.
AIM: What is the scope of the model?
The system is still a prototype as of now. The plan is to embed it in a CCTV camera and a mobile camera. There are a few answers I am still working on. Firstly, the distance between the person and the camera is not always going to be ten to fifteen meters. Sometimes it will be fifty meters. So, how well will the camera detect faces from a long-distance? Secondly, how well will it perform for multiple faces? For instance, how will it identify the face in question from hundreds of people? Lastly, what if people use it for fun? If I were a child, I would enjoy showing a fist to the camera and watching the crowd create chaos. I plan on connecting with industry experts and professors to brainstorm and make it foolproof.
AIM: How do you plan to take the model further?
My two approaches as of now are using MEMs and embedded computers. We can put the whole system in embedded computers and CCTV cameras. We can also put the code in mobile apps that work in rural areas without cameras. The app runs in the background and will identify the fist in the camera. A challenge here is to connect the camera to light posts in the area since a siren in a mobile phone is pointless. Upon seeing a fist, the mobile app will notify the light post that will play a siren. We still have to brainstorm where the final version will work; a CCTV camera, an app or both? We can have urban and rural coverage through both.
AIM: What are your tips for students entering data science?
Anyone entering data science should start with Python and its libraries. There are steps one needs to climb up; people shouldn’t jump directly into deep learning. Data science is the root of all learning, followed by machine learning and deep learning. This is the path I followed. I have also interned with different organisations in positions like a research intern. Internships also compel me to study and learn more. Finally, applying for free courses on platforms like YouTube will help study better.