I Built an Overlay AI: My Latest Coding Adventure

Okay, so hear me out…

As someone deep into AI and coding personal projects, I’ve been messing around with something new: an “Overlay AI.” It’s basically an AI that lives on top of your other applications, kind of like a smart assistant that can see what you’re doing and interact with it. Think of it as a digital helper that doesn’t need you to switch windows or tell it everything from scratch.

My background in computer engineering, especially with machine learning, got me thinking. How could I make AI more integrated into everyday computer use without needing a whole new interface? This project is my answer.

What is an Overlay AI?

At its core, an Overlay AI is an application that runs in the background and monitors your screen or specific active applications. It uses techniques like computer vision and natural language processing (NLP) to understand what’s happening. For example, it could read text on your screen, recognize images, or understand the context of a document you’re working on.

Then, based on that understanding, it can perform actions. This could be anything from summarizing a webpage, filling out a form automatically, finding relevant information based on what you’re looking at, or even offering helpful tips related to your current task.

Why Build This?

Honestly, I’m always looking for ways to make our digital lives smoother. We juggle so many apps and tasks, and switching contexts constantly eats up time and mental energy. I wanted to see if I could create an AI that seamlessly fits into that workflow, acting as a proactive assistant rather than just a tool you have to explicitly command.

My PhD work is in AI, and I’ve tinkered with machine learning models before, but this project felt different. It’s about practical application and pushing the boundaries of how we interact with our computers. Plus, it’s the kind of project that really lets me dive deep into the coding and see the results firsthand.

Behind the Scenes: The Tech Stack

Building this involved a few key components:

  • Screen Capture/Analysis: I needed a way to grab parts of the screen or specific application windows. Libraries for screen recording and image processing were crucial here.
  • Computer Vision: To