
Every project starts with a spark, and for us, it was seeing how students struggle to navigate the wealth of information in our college catalog.
Nam Hoang
Every project starts with a spark, and for us, it was seeing how students struggle to navigate the wealth of information in our college catalog. We noticed that many students felt overwhelmed when trying to decide which courses to take or figure out degree requirements. We wanted to build a tool that could make academic planning easier, more accessible, and faster—hence the idea of a Virtual College Consultant chatbot was born. Our goal was simple: to create an intelligent assistant that guides students through the maze of academic options.
The Troy Virtual College Consultant chatbot is designed to help Troy's students by providing:
We implemented the chatbot using a technique called Retrieval-Augmented Generation (RAG), which combines information retrieval with natural language generation to provide accurate and context-aware responses. The development process involved multiple stages, from data extraction to AI model integration. Here's a breakdown of the key components:
One of the major challenges we faced was in the data aggregation phase. Initially, we attempted to crawl data directly from the Troy's website and Canvas (the reason behind is there are more useful information in these platform). However, after sending around 100 requests to the website, we hit a rate limit and got blocked from further crawling. This forced us to reconsider our approach🤔.
To overcome this, we decided to extracting the data from the college's catalog PDF file. Working with PDF files also posed its own challenges, as they are unstructured and difficult to parse. We had to use Python libraries like PyMuPDF and pdfplumber to extract key data points such as course descriptions and program details, and then clean and organize this data into a usable format.
We have a few exciting ideas for the future: