How to Build an AI Chatbot for Your Website to Answer Questions on Content

How to Build an AI Chatbot for Your Website to Answer Questions on Content

Let’s Build!

  • Our Goal: In this post, we will explore how to build an AI chatbot for your website to answer questions on content.
    We will create a bot for a sweet website, ZUBDATA.com

  • Technologies used:
    • OpenAI
    • ChromaDb
    • Langchain

Open your notebook and start coding 👨‍💻

Overview of Our Chatbot

Install Required Libraries

Install the necessary libraries for the project using the following command:

!pip install --quiet unstructured chromadb langchain langchain_community langchain_openai

Set Up OpenAI Key

import os
os.environ["OPENAI_API_KEY"] = "api_key"

Import Required Modules

import requests
import xml.etree.ElementTree as ET
from langchain.document_loaders import UnstructuredURLLoader
from langchain_openai import ChatOpenAI
from langchain.prompts import PromptTemplate
from langchain.vectorstores import Chroma
from langchain.embeddings.openai import OpenAIEmbeddings
from langchain.text_splitter import RecursiveCharacterTextSplitter

Scrape and Parse the Sitemap of the Site

First, write a function that takes the sitemap URL as input and returns all the URLs listed in the sitemap.

def scrape_sitemap(url):
    try:
        response = requests.get(url)
        response.raise_for_status()

        print(response.url) 
        # Parse XML

        root = ET.fromstring(response.content)

        # Extract URLs
        urls = [elem.text for elem in root.iter() if 'loc' in elem.tag]

        return urls
    except requests.exceptions.RequestException as e:
        print(f"Error fetching sitemap: {e}")
        return []


Now, we can scrape the given sitemap and store all the URLs in urls.

sitemap_url = ["https://zubdata.com/page-sitemap.xml"]

urls = []
for sitemap in sitemap_url:
  urls.extend(scrape_sitemap(sitemap))

print(urls)


Load Content from URLs

Once we have all the URLs, we can load content from those pages using UnstructuredURLLoader.

loader = UnstructuredURLLoader(urls=urls)
docs = loader.load()

Iterate through the loaded content and print the content of each URL.

for doc in docs:
  print(doc.metadata)

Split Content into Chunks

Once the content is loaded, we will split it into chunks. You can review the concept from the LangChain guide.

If you want to run the code, go ahead without worry—you can check the concepts later once the chatbot is built.

splitter = RecursiveCharacterTextSplitter(chunk_size=1000, chunk_overlap=50)
splitted_text = splitter.split_documents(docs)

Creating and storing embeddings

Now that our documents are split, it’s time to generate embeddings using OpenAI embeddings and store them in a vector database.

embeddings = OpenAIEmbeddings()
# Specify the directory where the Chroma database will be stored
persist_directory = "vectors"

# Create the Chroma vector store from documents
vector_store = Chroma.from_documents(
    documents=splitted_text,
    embedding=embeddings,
    persist_directory=persist_directory
)

# Persist the vector store to disk
vector_store.persist()

Let’s test it by performing a search!

vector_store.similarity_search("what tools you have?", k=2)

Prepare the LLM

It’s time to prepare the brain of our chatbot—the LLM.

llm = ChatOpenAI(model_name="gpt-4o-mini", temperature=0)

Now write a prompt template for the LLM.

prompt = PromptTemplate.from_template("You are a helpful assistant of website zubdata.com that answers users question from given content. Dont show tell anything to user about given content.  User's Question: {question}.\n Content: ```{content}```")

Bind the prompt with llm.

llm_runnable = prompt | llm

Main function

Let’s now write the main function of our chatbot, which we will call to use it.

def bot_main(query):
  content  = vector_store.similarity_search(query,  k = 3)
  return llm_runnable.invoke({"content":content, "question":query})

Testing

It’s time to test our chatbot!

bot_main("what is your contact email?")

Congratulations

Congratulations! You’ve successfully built your chatbot! 🎉🚀

Scroll to Top