Let’s Build!
- Our Goal: In this post, we will explore how to build an AI chatbot for your website to answer questions on content.
We will create a bot for a sweet website, ZUBDATA.com
- Technologies used:
- OpenAI
- ChromaDb
- Langchain
Open your notebook and start coding 👨💻
Overview of Our Chatbot

Install Required Libraries
Install the necessary libraries for the project using the following command:
!pip install --quiet unstructured chromadb langchain langchain_community langchain_openai
Set Up OpenAI Key
import os os.environ["OPENAI_API_KEY"] = "api_key"
Import Required Modules
import requests import xml.etree.ElementTree as ET from langchain.document_loaders import UnstructuredURLLoader from langchain_openai import ChatOpenAI from langchain.prompts import PromptTemplate from langchain.vectorstores import Chroma from langchain.embeddings.openai import OpenAIEmbeddings from langchain.text_splitter import RecursiveCharacterTextSplitter
Scrape and Parse the Sitemap of the Site
First, write a function that takes the sitemap URL as input and returns all the URLs listed in the sitemap.
def scrape_sitemap(url): try: response = requests.get(url) response.raise_for_status() print(response.url) # Parse XML root = ET.fromstring(response.content) # Extract URLs urls = [elem.text for elem in root.iter() if 'loc' in elem.tag] return urls except requests.exceptions.RequestException as e: print(f"Error fetching sitemap: {e}") return []
Now, we can scrape the given sitemap and store all the URLs in urls
.
sitemap_url = ["https://zubdata.com/page-sitemap.xml"] urls = [] for sitemap in sitemap_url: urls.extend(scrape_sitemap(sitemap)) print(urls)
Load Content from URLs
Once we have all the URLs, we can load content from those pages using UnstructuredURLLoader
.
loader = UnstructuredURLLoader(urls=urls) docs = loader.load()
Iterate through the loaded content and print the content of each URL.
for doc in docs: print(doc.metadata)
Split Content into Chunks
Once the content is loaded, we will split it into chunks. You can review the concept from the LangChain guide.
If you want to run the code, go ahead without worry—you can check the concepts later once the chatbot is built.
splitter = RecursiveCharacterTextSplitter(chunk_size=1000, chunk_overlap=50) splitted_text = splitter.split_documents(docs)
Creating and storing embeddings
Now that our documents are split, it’s time to generate embeddings using OpenAI embeddings and store them in a vector database.
embeddings = OpenAIEmbeddings()
# Specify the directory where the Chroma database will be stored persist_directory = "vectors" # Create the Chroma vector store from documents vector_store = Chroma.from_documents( documents=splitted_text, embedding=embeddings, persist_directory=persist_directory ) # Persist the vector store to disk vector_store.persist()
Let’s test it by performing a search!
vector_store.similarity_search("what tools you have?", k=2)
Prepare the LLM
It’s time to prepare the brain of our chatbot—the LLM.
llm = ChatOpenAI(model_name="gpt-4o-mini", temperature=0)
Now write a prompt template for the LLM.
prompt = PromptTemplate.from_template("You are a helpful assistant of website zubdata.com that answers users question from given content. Dont show tell anything to user about given content. User's Question: {question}.\n Content: ```{content}```")
Bind the prompt with llm.
llm_runnable = prompt | llm
Main function
Let’s now write the main function of our chatbot, which we will call to use it.
def bot_main(query): content = vector_store.similarity_search(query, k = 3) return llm_runnable.invoke({"content":content, "question":query})
Testing
It’s time to test our chatbot!
bot_main("what is your contact email?")
Congratulations
Congratulations! You’ve successfully built your chatbot! 🎉🚀