Elgato_dark/autostream-ai-agent

Fork 0

mirror of https://github.com/Kartvaya2008/autostream-ai-agent synced 2026-04-21 15:47:55 +00:00

Kartavya Raikwar 6b73ed14f1

Update README.md

2026-04-12 21:52:34 +05:30

8.6 KiB

Raw Blame History

🤖 AutoStream AI Agent

Conversational AI + Lead Generation System

An intelligent support agent that understands user intent, retrieves answers from a local knowledge base, detects high-intent leads, and captures user details — all through a clean, real-time chat interface.

demo video - https://www.youtube.com/watch?v=z_nOnDO30Gs

📌 Overview

AutoStream AI Agent is a full-stack conversational AI system built for ServiceHive's Inflx platform. It simulates a real-world SaaS support assistant that goes beyond answering questions — it identifies when a user is ready to convert, collects their details, and triggers a lead capture workflow automatically.

This project demonstrates a production-grade approach to building agentic AI systems using LangChain, Gemini 1.5 Flash, and a RAG pipeline backed by a local JSON knowledge base.

✨ Features

Intent Detection — Classifies every user message as a greeting, product inquiry, or high-intent lead
RAG-Powered Responses — Retrieves accurate answers from a local knowledge base (no hallucinations)
Lead Capture Workflow — Collects name, email, and creator platform when high intent is detected
Tool Execution — Calls mock_lead_capture() only after all required fields are collected
Session Memory — Retains full conversation context across 5–6 turns
FastAPI Backend — Clean REST API with a /chat endpoint
Responsive Chat UI — Professional light/dark-mode frontend built with HTML, CSS, and JS

🧠 Tech Stack

Layer	Technology
Language	Python 3.10+
Backend Framework	FastAPI
LLM	Gemini 1.5 Flash (via LangChain)
Agent Framework	LangChain / Custom Agent Logic
Knowledge Base	JSON file (local RAG)
Memory	LangChain `ConversationBufferMemory`
Frontend	HTML5, CSS3, Vanilla JS
Environment	`python-dotenv`

📂 Project Structure

autostream-ai-agent/
│
├── backend/
│   ├── agent.py              # Main agent orchestration logic
│   ├── app.py                # FastAPI server + /api/chat endpoint
│   ├── intent.py             # Intent classification (greeting / inquiry / lead)
│   ├── rag.py                # RAG pipeline — knowledge base retrieval
│   ├── memory.py             # Conversation session memory management
│   ├── tools.py              # Tool definitions + mock_lead_capture()
│   └── knowledge_base.json   # Local knowledge base (pricing, features, policies)
│
├── frontend/
│   ├── index.html            # Chat UI (single-page)
│   └── script.js             # Frontend logic — send, receive, render messages
│
├── .env                      # API keys (not committed)
├── .env.example              # Example environment variables
├── requirements.txt          # Python dependencies
└── README.md

⚙️ How to Run Locally

1. Clone the Repository

git clone https://github.com/your-username/autostream-ai-agent.git
cd autostream-ai-agent

2. Create a Virtual Environment

python -m venv venv
source venv/bin/activate        # macOS / Linux
venv\Scripts\activate           # Windows

3. Install Dependencies

pip install -r requirements.txt

4. Configure Environment Variables

cp .env.example .env

Open .env and add your credentials:

GEMINI_API_KEY=your_gemini_api_key_here

5. Start the Backend Server

uvicorn backend.app:app --reload --port 8000

6. Open the Frontend

Open frontend/index.html directly in your browser, or serve it with:

python -m http.server 5500 --directory frontend

Then visit: http://localhost:5500

🔌 API Endpoints

`POST /api/chat`

Send a user message and receive an AI-generated response.

Request Body

{
  "system": "You are AutoStream AI...",
  "messages": [
    { "role": "user", "content": "What is the Pro plan pricing?" }
  ],
  "mode": "chat"
}

Response

{
  "content": "The Pro plan is priced at $79/month and includes unlimited videos, 4K resolution, and AI captions.",
  "intent": "product_inquiry"
}

Status Codes

Code	Meaning
`200`	Success
`400`	Bad request — missing fields
`500`	Internal server error

🏗️ Architecture Explanation

Why LangChain?

LangChain was chosen for its modular architecture — it separates concerns cleanly between the LLM, memory, tools, and retrieval layers. This makes it straightforward to swap components (e.g., replace Gemini with Claude or GPT) without rewriting the agent logic.

How State is Managed

Conversation state is managed using LangChain's ConversationBufferMemory, which stores the full message history in-memory per session. Each API call includes the complete history so the LLM retains context across turns. For multi-user scenarios, sessions are keyed by a unique chat_id generated at the start of each conversation.

RAG Pipeline

The knowledge base (knowledge_base.json) stores AutoStream's pricing, features, and policies as structured documents. On each user query, rag.py performs a similarity search against this knowledge base and injects the most relevant context into the LLM prompt — ensuring responses are grounded in factual product data rather than model hallucinations.

Tool Execution (Lead Capture)

tools.py defines mock_lead_capture() as a LangChain tool. The agent is instructed to call this function only after collecting the user's name, email, and creator platform — preventing premature triggering. The agent collects each field conversationally, one at a time, before executing the tool.

🎬 Demo Conversation Flow

User    → "Hi there"
Agent   → "Hello! I'm AutoStream AI. How can I assist you today?"

User    → "What are your pricing plans?"
Agent   → [RAG retrieval] "We offer two plans:
            Basic — $29/month (10 videos, 720p)
            Pro   — $79/month (Unlimited, 4K, AI captions)"

User    → "The Pro plan sounds great. I run a YouTube channel."
Agent   → [High intent detected] "That's great to hear! May I get your name?"

User    → "Alex"
Agent   → "Thank you, Alex. Could you share your email address?"

User    → "alex@gmail.com"
Agent   → "Got it. And which platform are you primarily creating for?"

User    → "YouTube"
Agent   → [Calls mock_lead_capture("Alex", "alex@gmail.com", "YouTube")]
          "Thank you, Alex. Your details have been captured.
           Welcome to AutoStream — our team will be in touch shortly."

📱 WhatsApp Integration (Deployment Theory)

To deploy this agent on WhatsApp, the recommended approach is using the WhatsApp Business API via Meta's Cloud API combined with a webhook-based architecture:

Register a webhook on Meta Developer Portal pointing to a public endpoint (e.g., https://yourdomain.com/webhook)
When a user sends a WhatsApp message, Meta sends a POST request to your webhook with the message payload
Your FastAPI server receives the payload, extracts the message text and sender ID, and routes it through the existing agent pipeline
The agent's response is sent back to the user via a POST request to the WhatsApp Send Message API using the user's phone number as the session identifier
Session memory is maintained per phone number using a dictionary or Redis store

This approach requires no change to the core agent logic — only a new webhook handler and a WhatsApp API client (e.g., pywa, whatsapp-python, or direct HTTP calls).

🚀 Future Improvements

Vector database (Pinecone / ChromaDB) for scalable RAG beyond JSON
Streaming responses for real-time token-by-token output
User authentication and persistent session storage (PostgreSQL / Redis)
WhatsApp & Telegram deployment via webhook integrations
Analytics dashboard — track intent distribution and lead conversion rate
Multi-language support for global content creator audiences
Human handoff — escalate to a live agent when confidence is low

👤 Author

Built for ServiceHive — Inflx Platform Machine Learning Intern Assignment — Social-to-Lead Agentic Workflow


Project	AutoStream AI Agent
Stack	Python · FastAPI · LangChain · Gemini · HTML/JS
Assignment	ServiceHive / Inflx — ML Intern Take-Home

AutoStream AI Agent — Built with precision. Designed for conversion.

8.6 KiB Raw Blame History Unescape Escape