ultralytics 8.3.127 New Visual Similarity Search Solution (#20397)

Signed-off-by: Muhammad Rizwan Munawar <muhammadrizwanmunawar123@gmail.com>
Signed-off-by: Glenn Jocher <glenn.jocher@ultralytics.com>
Co-authored-by: UltralyticsAssistant <web@ultralytics.com>
Co-authored-by: Ultralytics Assistant <135830346+UltralyticsAssistant@users.noreply.github.com>
Co-authored-by: Glenn Jocher <glenn.jocher@ultralytics.com>
This commit is contained in:
Muhammad Rizwan Munawar 2025-05-04 20:26:43 +05:00 committed by GitHub
parent 0c5f4aa382
commit 0db5d327d0
No known key found for this signature in database
GPG key ID: B5690EEEBB952194
12 changed files with 572 additions and 45 deletions

View file

@ -0,0 +1,149 @@
---
comments: true
description: Build a semantic image search web app using OpenAI CLIP, Meta FAISS, and Flask. Learn how to embed images and retrieve them using natural language.
keywords: CLIP, FAISS, Flask, semantic search, image retrieval, OpenAI, Ultralytics, tutorial, computer vision, web app
---
# Semantic Image Search with OpenAI CLIP and Meta FAISS
## Introduction
This guide walks you through building a **semantic image search** engine using [OpenAI CLIP](https://openai.com/blog/clip), [Meta FAISS](https://github.com/facebookresearch/faiss), and [Flask](https://flask.palletsprojects.com/). By combining CLIP's powerful visual-language embeddings with FAISS's efficient nearest-neighbor search, you can create a fully functional web interface where you can retrieve relevant images using natural language queries.
## Semantic Image Search Visual Preview
![Flask webpage with semantic search results overview](https://github.com/ultralytics/docs/releases/download/0/flask-ui.avif)
## How It Works
- **CLIP** uses a vision encoder (e.g., ResNet or ViT) for images and a text encoder (Transformer-based) for language to project both into the same multimodal embedding space. This allows for direct comparison between text and images using [cosine similarity](https://en.wikipedia.org/wiki/Cosine_similarity).
- **FAISS** (Facebook AI Similarity Search) builds an index of the image embeddings and enables fast, scalable retrieval of the closest vectors to a given query.
- **Flask** provides a simple web interface to submit natural language queries and display semantically matched images from the index.
This architecture supports zero-shot search, meaning you don't need labels or categories, just image data and a good prompt.
!!! example "Semantic Image Search using Ultralytics Python package"
??? note "Image Path Warning"
If you're using your own images, make sure to provide an absolute path to the image directory. Otherwise, the images may not appear on the webpage due to Flask's file serving limitations.
=== "Python"
```python
from ultralytics import solutions
app = solutions.SearchApp(
# data = "path/to/img/directory" # Optional, build search engine with your own images
device="cpu" # configure the device for processing i.e "cpu" or "cuda"
)
app.run(debug=False) # You can also use `debug=True` argument for testing
```
## `VisualAISearch` class
This class performs all the backend operations:
- Loads or builds a FAISS index from local images.
- Extracts image and text [embeddings](https://platform.openai.com/docs/guides/embeddings) using CLIP.
- Performs similarity search using cosine similarity.
!!! Example "Similar Images Search"
??? note "Image Path Warning"
If you're using your own images, make sure to provide an absolute path to the image directory. Otherwise, the images may not appear on the webpage due to Flask's file serving limitations.
=== "Python"
```python
from ultralytics import solutions
searcher = solutions.VisualAISearch(
# data = "path/to/img/directory" # Optional, build search engine with your own images
device="cuda" # configure the device for processing i.e "cpu" or "cuda"
)
results = searcher("a dog sitting on a bench")
# Ranked Results:
# - 000000546829.jpg | Similarity: 0.3269
# - 000000549220.jpg | Similarity: 0.2899
# - 000000517069.jpg | Similarity: 0.2761
# - 000000029393.jpg | Similarity: 0.2742
# - 000000534270.jpg | Similarity: 0.2680
```
## Advantages of Semantic Image Search with CLIP and FAISS
Building your own semantic image search system with CLIP and FAISS provides several compelling advantages:
1. **Zero-Shot Capabilities**: You don't need to train the model on your specific dataset. CLIP's zero-shot learning lets you perform search queries on any image dataset using free-form natural language, saving both time and resources.
2. **Human-Like Understanding**: Unlike keyword-based search engines, CLIP understands semantic context. It can retrieve images based on abstract, emotional, or relational queries like "a happy child in nature" or "a futuristic city skyline at night".
![OpenAI Clip image retrieval workflow](https://github.com/ultralytics/docs/releases/download/0/clip-image-retrieval.avif)
3. **No Need for Labels or Metadata**: Traditional image search systems require carefully labeled data. This approach only needs raw images. CLIP generates embeddings without needing any manual annotation.
4. **Flexible and Scalable Search**: FAISS enables fast nearest-neighbor search even with large-scale datasets. It's optimized for speed and memory, allowing real-time response even with thousands (or millions) of embeddings.
![Meta FAISS embedding vectors building workflow](https://github.com/ultralytics/docs/releases/download/0/faiss-indexing-workflow.avif)
5. **Cross-Domain Applications**: Whether you're building a personal photo archive, a creative inspiration tool, a product search engine, or even an art recommendation system, this stack adapts to diverse domains with minimal tweaking.
## FAQ
### How does CLIP understand both images and text?
[CLIP](https://github.com/openai/CLIP) (Contrastive Language Image Pretraining) is a model developed by [OpenAI](https://openai.com/) that learns to connect visual and linguistic information. It's trained on a massive dataset of images paired with natural language captions. This training allows it to map both images and text into a shared embedding space, so you can compare them directly using vector similarity.
### Why is CLIP considered so powerful for AI tasks?
What makes CLIP stand out is its ability to generalize. Instead of being trained just for specific labels or tasks, it learns from natural language itself. This allows it to handle flexible queries like “a man riding a jet ski” or “a surreal dreamscape,” making it useful for everything from classification to creative semantic search, without retraining.
### What exactly does FAISS do in this project (Semantic Search)?
[FAISS](https://engineering.fb.com/2017/03/29/data-infrastructure/faiss-a-library-for-efficient-similarity-search/) (Facebook AI Similarity Search) is a toolkit that helps you search through high-dimensional vectors very efficiently. Once CLIP turns your images into embeddings, FAISS makes it fast and easy to find the closest matches to a text query, perfect for real-time image retrieval.
### Why use the [Ultralytics](https://ultralytics.com/) [Python package](https://github.com/ultralytics/ultralytics/) if CLIP and FAISS are from OpenAI and Meta?
While CLIP and FAISS are developed by OpenAI and Meta respectively, the [Ultralytics Python package](https://pypi.org/project/ultralytics/) simplifies their integration into a complete semantic image search pipeline in a 2-lines workflow that just works:
!!! Example "Similar Images Search"
=== "Python"
```python
from ultralytics import solutions
searcher = solutions.VisualAISearch(
# data = "path/to/img/directory" # Optional, build search engine with your own images
device="cuda" # configure the device for processing i.e "cpu" or "cuda"
)
results = searcher("a dog sitting on a bench")
# Ranked Results:
# - 000000546829.jpg | Similarity: 0.3269
# - 000000549220.jpg | Similarity: 0.2899
# - 000000517069.jpg | Similarity: 0.2761
# - 000000029393.jpg | Similarity: 0.2742
# - 000000534270.jpg | Similarity: 0.2680
```
This high-level implementation handles:
- CLIP-based image and text embedding generation.
- FAISS index creation and management.
- Efficient semantic search with cosine similarity.
- Directory-based image loading and [visualization](https://www.ultralytics.com/glossary/data-visualization).
### Can I customize the frontend of this app?
Yes, you absolutely can. The current setup uses Flask with a basic HTML frontend, but you're free to swap in your own HTML or even build something more dynamic with React, Vue, or another frontend framework. Flask can easily serve as the backend API for your custom interface.
### Is it possible to search through videos instead of static images?
Not directly—but there's a simple workaround. You can extract individual frames from your videos (e.g., one every second), treat them as standalone images, and feed those into the system. This way, the search engine can semantically index visual moments from your videos.

View file

@ -0,0 +1,20 @@
---
description: Explore the Ultralytics semantic image search solution. Learn how to retrieve images using natural language with CLIP, FAISS, and a simple web app powered by Flask.
keywords: Ultralytics, semantic search, CLIP, FAISS, image retrieval, natural language, Flask, computer vision, YOLO, AI
---
# Reference for `ultralytics/solutions/similarity_search.py`
!!! note
This file is available at [https://github.com/ultralytics/ultralytics/blob/main/ultralytics/solutions/similarity_search.py](https://github.com/ultralytics/ultralytics/blob/main/ultralytics/solutions/similarity_search.py). If you spot a problem please help fix it by [contributing](https://docs.ultralytics.com/help/contributing/) a [Pull Request](https://github.com/ultralytics/ultralytics/edit/main/ultralytics/solutions/similarity_search.py) 🛠️. Thank you 🙏!
<br>
## ::: ultralytics.solutions.similarity_search.VisualAISearch
<br><br><hr><br>
## ::: ultralytics.solutions.similarity_search.SearchApp
<br><br>

View file

@ -40,7 +40,8 @@ Here's our curated list of Ultralytics solutions that can be used to create awes
- [Parking Management](../guides/parking-management.md): Organize and direct vehicle flow in parking areas with YOLO11, optimizing space utilization and user experience.
- [Analytics](../guides/analytics.md): Conduct comprehensive data analysis to discover patterns and make informed decisions, leveraging YOLO11 for descriptive, predictive, and prescriptive analytics.
- [Live Inference with Streamlit](../guides/streamlit-live-inference.md): Leverage the power of YOLO11 for real-time [object detection](https://www.ultralytics.com/glossary/object-detection) directly through your web browser with a user-friendly Streamlit interface.
- [Track Objects in Zone](../guides/trackzone.md) 🚀 NEW: Learn how to track objects within specific zones of video frames using YOLO11 for precise and efficient monitoring.
- [Track Objects in Zone](../guides/trackzone.md): Learn how to track objects within specific zones of video frames using YOLO11 for precise and efficient monitoring.
- [Similarity search](../guides/similarity-search.md) 🚀 NEW: Enable intelligent image retrieval by combining [OpenAI CLIP](https://cookbook.openai.com/examples/custom_image_embedding_search) embeddings with [Meta FAISS](https://ai.meta.com/tools/faiss/), allowing natural language queries like "person holding a bag" or "vehicles in motion."
### Solutions Arguments

View file

@ -342,7 +342,9 @@ nav:
- Parking Management: guides/parking-management.md
- Analytics: guides/analytics.md
- Live Inference: guides/streamlit-live-inference.md
- Track Objects in Zone 🚀 NEW: guides/trackzone.md
- Track Objects in Zone: guides/trackzone.md
- Similarity Search 🚀 NEW: guides/similarity-search.md
- Guides:
- guides/index.md
- YOLO Common Issues: guides/yolo-common-issues.md
@ -599,6 +601,7 @@ nav:
- queue_management: reference/solutions/queue_management.md
- region_counter: reference/solutions/region_counter.md
- security_alarm: reference/solutions/security_alarm.md
- similarity_search: reference/solutions/similarity_search.md
- solutions: reference/solutions/solutions.md
- speed_estimation: reference/solutions/speed_estimation.md
- streamlit_inference: reference/solutions/streamlit_inference.md

View file

@ -105,6 +105,7 @@ export = [
solutions = [
"shapely>=2.0.0", # shapely for point and polygon data matching
"streamlit>=1.29.0", # for live inference on web browser, i.e `yolo streamlit-predict`
"flask", # for similarity search solution
]
logging = [
"wandb", # https://docs.ultralytics.com/integrations/weights-biases/

View file

@ -174,3 +174,14 @@ def test_solution(name, solution_class, needs_frame_count, video, kwargs):
video_path=str(TMP / video),
needs_frame_count=needs_frame_count,
)
@pytest.mark.slow
@pytest.mark.skipif(checks.IS_PYTHON_3_8, reason="Disabled due to unsupported CLIP dependencies.")
@pytest.mark.skipif(IS_RASPBERRYPI, reason="Disabled due to slow performance on Raspberry Pi.")
def test_similarity_search():
"""Test similarity search solution."""
from ultralytics import solutions
searcher = solutions.VisualAISearch()
_ = searcher("a dog sitting on a bench") # Returns the results in format "- img name | similarity score"

View file

@ -1,6 +1,6 @@
# Ultralytics 🚀 AGPL-3.0 License - https://ultralytics.com/license
__version__ = "8.3.126"
__version__ = "8.3.127"
import os

View file

@ -12,6 +12,7 @@ from .parking_management import ParkingManagement, ParkingPtsSelection
from .queue_management import QueueManager
from .region_counter import RegionCounter
from .security_alarm import SecurityAlarm
from .similarity_search import SearchApp, VisualAISearch
from .speed_estimation import SpeedEstimator
from .streamlit_inference import Inference
from .trackzone import TrackZone
@ -35,4 +36,6 @@ __all__ = (
"Analytics",
"Inference",
"TrackZone",
"SearchApp",
"VisualAISearch",
)

View file

@ -48,6 +48,7 @@ class SolutionConfig:
half (bool): Whether to use FP16 precision (requires a supported CUDA device).
tracker (str): Path to tracking configuration YAML file (e.g., 'botsort.yaml').
verbose (bool): Enable verbose logging output for debugging or diagnostics.
data (str): Path to image directory used for similarity search.
Methods:
update: Update the configuration with user-defined keyword arguments and raise error on invalid keys.
@ -91,6 +92,7 @@ class SolutionConfig:
half: bool = False
tracker: str = "botsort.yaml"
verbose: bool = True
data: str = "images"
def update(self, **kwargs):
"""Update configuration parameters with new values provided as keyword arguments."""

View file

@ -0,0 +1,176 @@
# Ultralytics 🚀 AGPL-3.0 License - https://ultralytics.com/license
import os
from pathlib import Path
import numpy as np
import torch
from PIL import Image
from ultralytics.data.utils import IMG_FORMATS
from ultralytics.solutions.solutions import BaseSolution
from ultralytics.utils.checks import check_requirements
from ultralytics.utils.torch_utils import select_device
os.environ["KMP_DUPLICATE_LIB_OK"] = "TRUE" # Avoid OpenMP conflict on some systems
class VisualAISearch(BaseSolution):
"""
VisualAISearch leverages OpenCLIP to generate high-quality image and text embeddings, aligning them in a shared
semantic space. It then uses FAISS to perform fast and scalable similarity-based retrieval, allowing users to search
large collections of images using natural language queries with high accuracy and speed.
Attributes:
data (str): Directory containing images.
device (str): Computation device, e.g., 'cpu' or 'cuda'.
"""
def __init__(self, **kwargs):
"""Initializes the VisualAISearch class with the FAISS index file and CLIP model."""
super().__init__(**kwargs)
check_requirements(["open-clip-torch", "faiss-cpu"])
import faiss
import open_clip
self.faiss = faiss
self.open_clip = open_clip
self.faiss_index = "faiss.index"
self.data_path_npy = "paths.npy"
self.model_name = "ViT-B-32-quickgelu"
self.data_dir = Path(self.CFG["data"])
self.device = select_device(self.CFG["device"])
if not self.data_dir.exists():
from ultralytics.utils import ASSETS_URL
self.LOGGER.warning(f"{self.data_dir} not found. Downloading images.zip from {ASSETS_URL}/images.zip")
from ultralytics.utils.downloads import safe_download
safe_download(url=f"{ASSETS_URL}/images.zip", unzip=True, retry=3)
self.data_dir = Path("images")
self.clip_model, _, self.preprocess = self.open_clip.create_model_and_transforms(
self.model_name, pretrained="openai"
)
self.clip_model = self.clip_model.to(self.device).eval()
self.tokenizer = self.open_clip.get_tokenizer(self.model_name)
self.index = None
self.image_paths = []
self.load_or_build_index()
def extract_image_feature(self, path):
"""Extract CLIP image embedding."""
image = Image.open(path)
tensor = self.preprocess(image).unsqueeze(0).to(self.device)
with torch.no_grad():
return self.clip_model.encode_image(tensor).cpu().numpy()
def extract_text_feature(self, text):
"""Extract CLIP text embedding."""
tokens = self.tokenizer([text]).to(self.device)
with torch.no_grad():
return self.clip_model.encode_text(tokens).cpu().numpy()
def load_or_build_index(self):
"""Loads FAISS index or builds a new one from image features."""
# Check if the FAISS index and corresponding image paths already exist
if Path(self.faiss_index).exists() and Path(self.data_path_npy).exists():
self.LOGGER.info("Loading existing FAISS index...")
self.index = self.faiss.read_index(self.faiss_index) # Load the FAISS index from disk
self.image_paths = np.load(self.data_path_npy) # Load the saved image path list
return # Exit the function as the index is successfully loaded
# If the index doesn't exist, start building it from scratch
self.LOGGER.info("Building FAISS index from images...")
vectors = [] # List to store feature vectors of images
# Iterate over all image files in the data directory
for file in self.data_dir.iterdir():
# Skip files that are not valid image formats
if file.suffix.lower().lstrip(".") not in IMG_FORMATS:
continue
try:
# Extract feature vector for the image and add to the list
vectors.append(self.extract_image_feature(file))
self.image_paths.append(file.name) # Store the corresponding image name
except Exception as e:
self.LOGGER.warning(f"Skipping {file.name}: {e}")
# If no vectors were successfully created, raise an error
if not vectors:
raise RuntimeError("No image embeddings could be generated.")
vectors = np.vstack(vectors).astype("float32") # Stack all vectors into a NumPy array and convert to float32
self.faiss.normalize_L2(vectors) # Normalize vectors to unit length for cosine similarity
self.index = self.faiss.IndexFlatIP(vectors.shape[1]) # Create a new FAISS index using inner product
self.index.add(vectors) # Add the normalized vectors to the FAISS index
self.faiss.write_index(self.index, self.faiss_index) # Save the newly built FAISS index to disk
np.save(self.data_path_npy, np.array(self.image_paths)) # Save the list of image paths to disk
self.LOGGER.info(f"Indexed {len(self.image_paths)} images.")
def search(self, query, k=30, similarity_thresh=0.1):
"""Returns top-k semantically similar images to the given query."""
text_feat = self.extract_text_feature(query).astype("float32")
self.faiss.normalize_L2(text_feat)
D, index = self.index.search(text_feat, k)
results = [
(self.image_paths[i], float(D[0][idx])) for idx, i in enumerate(index[0]) if D[0][idx] >= similarity_thresh
]
results.sort(key=lambda x: x[1], reverse=True)
self.LOGGER.info("\nRanked Results:")
for name, score in results:
self.LOGGER.info(f" - {name} | Similarity: {score:.4f}")
return [r[0] for r in results]
def __call__(self, query):
"""Direct call for search function."""
return self.search(query)
class SearchApp:
"""
A Flask-based web interface powers the semantic image search experience, enabling users to input natural language
queries and instantly view the most relevant images retrieved from the indexed databaseall through a clean,
responsive, and easily customizable frontend.
Args:
data (str): Path to images to index and search.
device (str): Device to run inference on (e.g. 'cpu', 'cuda').
"""
def __init__(self, data="images", device=None):
"""Initialization of the VisualAISearch class for performing semantic image search."""
check_requirements("flask")
from flask import Flask, render_template, request
self.render_template = render_template
self.request = request
self.searcher = VisualAISearch(data=data, device=device)
self.app = Flask(
__name__,
template_folder="templates",
static_folder=Path(data).resolve(), # Absolute path to serve images
static_url_path="/images", # URL prefix for images
)
self.app.add_url_rule("/", view_func=self.index, methods=["GET", "POST"])
def index(self):
"""Function to process the user query and display output."""
results = []
if self.request.method == "POST":
query = self.request.form.get("query", "").strip()
results = self.searcher(query)
return self.render_template("similarity-search.html", results=results)
def run(self, debug=False):
"""Runs the Flask web app."""
self.app.run(debug=debug)

View file

@ -54,55 +54,56 @@ class BaseSolution:
is_cli (bool): Enables CLI mode if set to True.
**kwargs (Any): Additional configuration parameters that override defaults.
"""
check_requirements("shapely>=2.0.0")
from shapely.geometry import LineString, Point, Polygon
from shapely.prepared import prep
self.LineString = LineString
self.Polygon = Polygon
self.Point = Point
self.prep = prep
self.annotator = None # Initialize annotator
self.tracks = None
self.track_data = None
self.boxes = []
self.clss = []
self.track_ids = []
self.track_line = None
self.masks = None
self.r_s = None
self.LOGGER = LOGGER # Store logger object to be used in multiple solution classes
self.CFG = vars(SolutionConfig().update(**kwargs))
self.LOGGER.info(f"Ultralytics Solutions: ✅ {self.CFG}")
self.LOGGER = LOGGER # Store logger object to be used in multiple solution classes
self.region = self.CFG["region"] # Store region data for other classes usage
self.line_width = self.CFG["line_width"]
if self.__class__.__name__ != "VisualAISearch":
check_requirements("shapely>=2.0.0")
from shapely.geometry import LineString, Point, Polygon
from shapely.prepared import prep
# Load Model and store additional information (classes, show_conf, show_label)
if self.CFG["model"] is None:
self.CFG["model"] = "yolo11n.pt"
self.model = YOLO(self.CFG["model"])
self.names = self.model.names
self.classes = self.CFG["classes"]
self.show_conf = self.CFG["show_conf"]
self.show_labels = self.CFG["show_labels"]
self.LineString = LineString
self.Polygon = Polygon
self.Point = Point
self.prep = prep
self.annotator = None # Initialize annotator
self.tracks = None
self.track_data = None
self.boxes = []
self.clss = []
self.track_ids = []
self.track_line = None
self.masks = None
self.r_s = None
self.track_add_args = { # Tracker additional arguments for advance configuration
k: self.CFG[k] for k in ["iou", "conf", "max_det", "half", "tracker", "device", "verbose"]
} # verbose must be passed to track method; setting it False in YOLO still logs the track information.
self.LOGGER.info(f"Ultralytics Solutions: ✅ {self.CFG}")
self.region = self.CFG["region"] # Store region data for other classes usage
self.line_width = self.CFG["line_width"]
if is_cli and self.CFG["source"] is None:
d_s = "solutions_ci_demo.mp4" if "-pose" not in self.CFG["model"] else "solution_ci_pose_demo.mp4"
self.LOGGER.warning(f"source not provided. using default source {ASSETS_URL}/{d_s}")
from ultralytics.utils.downloads import safe_download
# Load Model and store additional information (classes, show_conf, show_label)
if self.CFG["model"] is None:
self.CFG["model"] = "yolo11n.pt"
self.model = YOLO(self.CFG["model"])
self.names = self.model.names
self.classes = self.CFG["classes"]
self.show_conf = self.CFG["show_conf"]
self.show_labels = self.CFG["show_labels"]
safe_download(f"{ASSETS_URL}/{d_s}") # download source from ultralytics assets
self.CFG["source"] = d_s # set default source
self.track_add_args = { # Tracker additional arguments for advance configuration
k: self.CFG[k] for k in ["iou", "conf", "device", "max_det", "half", "tracker", "device", "verbose"]
} # verbose must be passed to track method; setting it False in YOLO still logs the track information.
# Initialize environment and region setup
self.env_check = check_imshow(warn=True)
self.track_history = defaultdict(list)
if is_cli and self.CFG["source"] is None:
d_s = "solutions_ci_demo.mp4" if "-pose" not in self.CFG["model"] else "solution_ci_pose_demo.mp4"
self.LOGGER.warning(f"source not provided. using default source {ASSETS_URL}/{d_s}")
from ultralytics.utils.downloads import safe_download
safe_download(f"{ASSETS_URL}/{d_s}") # download source from ultralytics assets
self.CFG["source"] = d_s # set default source
# Initialize environment and region setup
self.env_check = check_imshow(warn=True)
self.track_history = defaultdict(list)
def adjust_box_label(self, cls, conf, track_id=None):
"""

View file

@ -0,0 +1,160 @@
<!-- Ultralytics 🚀 AGPL-3.0 License - https://ultralytics.com/license -->
<!--Similarity search webpage-->
<!doctype html>
<html lang="en">
<head>
<meta charset="UTF-8" />
<meta name="viewport" content="width=device-width, initial-scale=1.0" />
<title>Semantic Image Search</title>
<link
href="https://fonts.googleapis.com/css2?family=Inter:wght@400;600&display=swap"
rel="stylesheet"
/>
<style>
body {
background: linear-gradient(135deg, #f0f4ff, #f9fbff);
font-family: "Inter", sans-serif;
color: #111e68;
padding: 2rem;
margin: 0;
min-height: 100vh;
}
h1 {
text-align: center;
margin-bottom: 2rem;
font-size: 2.5rem;
font-weight: 600;
}
form {
display: flex;
flex-wrap: wrap;
justify-content: center;
align-items: center;
gap: 1rem;
margin-bottom: 3rem;
animation: fadeIn 1s ease-in-out;
}
input[type="text"] {
width: 300px;
padding: 0.75rem 1rem;
font-size: 1rem;
border-radius: 10px;
border: 1px solid #ccc;
box-shadow: 0 2px 6px rgba(0, 0, 0, 0.05);
transition: box-shadow 0.3s ease;
}
input[type="text"]:focus {
outline: none;
box-shadow: 0 0 0 3px rgba(17, 30, 104, 0.2);
}
button {
background-color: #111e68;
color: white;
font-weight: 600;
font-size: 1rem;
padding: 0.75rem 1.5rem;
border-radius: 10px;
border: none;
cursor: pointer;
transition:
background-color 0.3s ease,
transform 0.2s ease;
}
button:hover {
background-color: #1f2e9f;
transform: translateY(-2px);
}
.grid {
display: grid;
grid-template-columns: repeat(auto-fill, minmax(260px, 1fr));
gap: 1.5rem;
max-width: 1600px;
margin: auto;
animation: fadeInUp 1s ease-in-out;
}
.card {
background: white;
border-radius: 16px;
overflow: hidden;
box-shadow: 0 6px 14px rgba(0, 0, 0, 0.08);
transition:
transform 0.3s ease,
box-shadow 0.3s ease;
}
.card:hover {
transform: translateY(-6px);
box-shadow: 0 10px 20px rgba(0, 0, 0, 0.1);
}
.card img {
width: 100%;
height: 100%;
object-fit: cover;
display: block;
}
@keyframes fadeIn {
0% {
opacity: 0;
transform: scale(0.95);
}
100% {
opacity: 1;
transform: scale(1);
}
}
@keyframes fadeInUp {
0% {
opacity: 0;
transform: translateY(20px);
}
100% {
opacity: 1;
transform: translateY(0);
}
}
</style>
</head>
<body>
<div style="text-align: center; margin-bottom: 1rem">
<img
src="https://raw.githubusercontent.com/ultralytics/assets/main/logo/favicon.png"
alt="Ultralytics Logo"
style="height: 40px"
/>
</div>
<h1>Semantic Image Search with AI</h1>
<!-- Search box -->
<form method="POST">
<input
type="text"
name="query"
placeholder="Describe the scene (e.g., man walking)"
value="{{ request.form['query'] }}"
required
/>
<button type="submit">Search</button>
</form>
<!-- Search results grid -->
<div class="grid">
{% for img in results %}
<div class="card">
<img src="{{ url_for('static', filename=img) }}" alt="Result Image" />
</div>
{% endfor %}
</div>
</body>
</html>