mirror of
https://github.com/ultralytics/ultralytics
synced 2026-04-21 14:07:18 +00:00
ultralytics 8.3.127 New Visual Similarity Search Solution (#20397)
Signed-off-by: Muhammad Rizwan Munawar <muhammadrizwanmunawar123@gmail.com> Signed-off-by: Glenn Jocher <glenn.jocher@ultralytics.com> Co-authored-by: UltralyticsAssistant <web@ultralytics.com> Co-authored-by: Ultralytics Assistant <135830346+UltralyticsAssistant@users.noreply.github.com> Co-authored-by: Glenn Jocher <glenn.jocher@ultralytics.com>
This commit is contained in:
parent
0c5f4aa382
commit
0db5d327d0
12 changed files with 572 additions and 45 deletions
149
docs/en/guides/similarity-search.md
Normal file
149
docs/en/guides/similarity-search.md
Normal file
|
|
@ -0,0 +1,149 @@
|
|||
---
|
||||
comments: true
|
||||
description: Build a semantic image search web app using OpenAI CLIP, Meta FAISS, and Flask. Learn how to embed images and retrieve them using natural language.
|
||||
keywords: CLIP, FAISS, Flask, semantic search, image retrieval, OpenAI, Ultralytics, tutorial, computer vision, web app
|
||||
---
|
||||
|
||||
# Semantic Image Search with OpenAI CLIP and Meta FAISS
|
||||
|
||||
## Introduction
|
||||
|
||||
This guide walks you through building a **semantic image search** engine using [OpenAI CLIP](https://openai.com/blog/clip), [Meta FAISS](https://github.com/facebookresearch/faiss), and [Flask](https://flask.palletsprojects.com/). By combining CLIP's powerful visual-language embeddings with FAISS's efficient nearest-neighbor search, you can create a fully functional web interface where you can retrieve relevant images using natural language queries.
|
||||
|
||||
## Semantic Image Search Visual Preview
|
||||
|
||||

|
||||
|
||||
## How It Works
|
||||
|
||||
- **CLIP** uses a vision encoder (e.g., ResNet or ViT) for images and a text encoder (Transformer-based) for language to project both into the same multimodal embedding space. This allows for direct comparison between text and images using [cosine similarity](https://en.wikipedia.org/wiki/Cosine_similarity).
|
||||
- **FAISS** (Facebook AI Similarity Search) builds an index of the image embeddings and enables fast, scalable retrieval of the closest vectors to a given query.
|
||||
- **Flask** provides a simple web interface to submit natural language queries and display semantically matched images from the index.
|
||||
|
||||
This architecture supports zero-shot search, meaning you don't need labels or categories, just image data and a good prompt.
|
||||
|
||||
!!! example "Semantic Image Search using Ultralytics Python package"
|
||||
|
||||
??? note "Image Path Warning"
|
||||
|
||||
If you're using your own images, make sure to provide an absolute path to the image directory. Otherwise, the images may not appear on the webpage due to Flask's file serving limitations.
|
||||
|
||||
=== "Python"
|
||||
|
||||
```python
|
||||
from ultralytics import solutions
|
||||
|
||||
app = solutions.SearchApp(
|
||||
# data = "path/to/img/directory" # Optional, build search engine with your own images
|
||||
device="cpu" # configure the device for processing i.e "cpu" or "cuda"
|
||||
)
|
||||
|
||||
app.run(debug=False) # You can also use `debug=True` argument for testing
|
||||
```
|
||||
|
||||
## `VisualAISearch` class
|
||||
|
||||
This class performs all the backend operations:
|
||||
|
||||
- Loads or builds a FAISS index from local images.
|
||||
- Extracts image and text [embeddings](https://platform.openai.com/docs/guides/embeddings) using CLIP.
|
||||
- Performs similarity search using cosine similarity.
|
||||
|
||||
!!! Example "Similar Images Search"
|
||||
|
||||
??? note "Image Path Warning"
|
||||
|
||||
If you're using your own images, make sure to provide an absolute path to the image directory. Otherwise, the images may not appear on the webpage due to Flask's file serving limitations.
|
||||
|
||||
=== "Python"
|
||||
|
||||
```python
|
||||
from ultralytics import solutions
|
||||
|
||||
searcher = solutions.VisualAISearch(
|
||||
# data = "path/to/img/directory" # Optional, build search engine with your own images
|
||||
device="cuda" # configure the device for processing i.e "cpu" or "cuda"
|
||||
)
|
||||
|
||||
results = searcher("a dog sitting on a bench")
|
||||
|
||||
# Ranked Results:
|
||||
# - 000000546829.jpg | Similarity: 0.3269
|
||||
# - 000000549220.jpg | Similarity: 0.2899
|
||||
# - 000000517069.jpg | Similarity: 0.2761
|
||||
# - 000000029393.jpg | Similarity: 0.2742
|
||||
# - 000000534270.jpg | Similarity: 0.2680
|
||||
```
|
||||
|
||||
## Advantages of Semantic Image Search with CLIP and FAISS
|
||||
|
||||
Building your own semantic image search system with CLIP and FAISS provides several compelling advantages:
|
||||
|
||||
1. **Zero-Shot Capabilities**: You don't need to train the model on your specific dataset. CLIP's zero-shot learning lets you perform search queries on any image dataset using free-form natural language, saving both time and resources.
|
||||
|
||||
2. **Human-Like Understanding**: Unlike keyword-based search engines, CLIP understands semantic context. It can retrieve images based on abstract, emotional, or relational queries like "a happy child in nature" or "a futuristic city skyline at night".
|
||||
|
||||

|
||||
|
||||
3. **No Need for Labels or Metadata**: Traditional image search systems require carefully labeled data. This approach only needs raw images. CLIP generates embeddings without needing any manual annotation.
|
||||
|
||||
4. **Flexible and Scalable Search**: FAISS enables fast nearest-neighbor search even with large-scale datasets. It's optimized for speed and memory, allowing real-time response even with thousands (or millions) of embeddings.
|
||||
|
||||

|
||||
|
||||
5. **Cross-Domain Applications**: Whether you're building a personal photo archive, a creative inspiration tool, a product search engine, or even an art recommendation system, this stack adapts to diverse domains with minimal tweaking.
|
||||
|
||||
## FAQ
|
||||
|
||||
### How does CLIP understand both images and text?
|
||||
|
||||
[CLIP](https://github.com/openai/CLIP) (Contrastive Language Image Pretraining) is a model developed by [OpenAI](https://openai.com/) that learns to connect visual and linguistic information. It's trained on a massive dataset of images paired with natural language captions. This training allows it to map both images and text into a shared embedding space, so you can compare them directly using vector similarity.
|
||||
|
||||
### Why is CLIP considered so powerful for AI tasks?
|
||||
|
||||
What makes CLIP stand out is its ability to generalize. Instead of being trained just for specific labels or tasks, it learns from natural language itself. This allows it to handle flexible queries like “a man riding a jet ski” or “a surreal dreamscape,” making it useful for everything from classification to creative semantic search, without retraining.
|
||||
|
||||
### What exactly does FAISS do in this project (Semantic Search)?
|
||||
|
||||
[FAISS](https://engineering.fb.com/2017/03/29/data-infrastructure/faiss-a-library-for-efficient-similarity-search/) (Facebook AI Similarity Search) is a toolkit that helps you search through high-dimensional vectors very efficiently. Once CLIP turns your images into embeddings, FAISS makes it fast and easy to find the closest matches to a text query, perfect for real-time image retrieval.
|
||||
|
||||
### Why use the [Ultralytics](https://ultralytics.com/) [Python package](https://github.com/ultralytics/ultralytics/) if CLIP and FAISS are from OpenAI and Meta?
|
||||
|
||||
While CLIP and FAISS are developed by OpenAI and Meta respectively, the [Ultralytics Python package](https://pypi.org/project/ultralytics/) simplifies their integration into a complete semantic image search pipeline in a 2-lines workflow that just works:
|
||||
|
||||
!!! Example "Similar Images Search"
|
||||
|
||||
=== "Python"
|
||||
|
||||
```python
|
||||
from ultralytics import solutions
|
||||
|
||||
searcher = solutions.VisualAISearch(
|
||||
# data = "path/to/img/directory" # Optional, build search engine with your own images
|
||||
device="cuda" # configure the device for processing i.e "cpu" or "cuda"
|
||||
)
|
||||
|
||||
results = searcher("a dog sitting on a bench")
|
||||
|
||||
# Ranked Results:
|
||||
# - 000000546829.jpg | Similarity: 0.3269
|
||||
# - 000000549220.jpg | Similarity: 0.2899
|
||||
# - 000000517069.jpg | Similarity: 0.2761
|
||||
# - 000000029393.jpg | Similarity: 0.2742
|
||||
# - 000000534270.jpg | Similarity: 0.2680
|
||||
```
|
||||
|
||||
This high-level implementation handles:
|
||||
|
||||
- CLIP-based image and text embedding generation.
|
||||
- FAISS index creation and management.
|
||||
- Efficient semantic search with cosine similarity.
|
||||
- Directory-based image loading and [visualization](https://www.ultralytics.com/glossary/data-visualization).
|
||||
|
||||
### Can I customize the frontend of this app?
|
||||
|
||||
Yes, you absolutely can. The current setup uses Flask with a basic HTML frontend, but you're free to swap in your own HTML or even build something more dynamic with React, Vue, or another frontend framework. Flask can easily serve as the backend API for your custom interface.
|
||||
|
||||
### Is it possible to search through videos instead of static images?
|
||||
|
||||
Not directly—but there's a simple workaround. You can extract individual frames from your videos (e.g., one every second), treat them as standalone images, and feed those into the system. This way, the search engine can semantically index visual moments from your videos.
|
||||
20
docs/en/reference/solutions/similarity_search.md
Normal file
20
docs/en/reference/solutions/similarity_search.md
Normal file
|
|
@ -0,0 +1,20 @@
|
|||
---
|
||||
description: Explore the Ultralytics semantic image search solution. Learn how to retrieve images using natural language with CLIP, FAISS, and a simple web app powered by Flask.
|
||||
keywords: Ultralytics, semantic search, CLIP, FAISS, image retrieval, natural language, Flask, computer vision, YOLO, AI
|
||||
---
|
||||
|
||||
# Reference for `ultralytics/solutions/similarity_search.py`
|
||||
|
||||
!!! note
|
||||
|
||||
This file is available at [https://github.com/ultralytics/ultralytics/blob/main/ultralytics/solutions/similarity_search.py](https://github.com/ultralytics/ultralytics/blob/main/ultralytics/solutions/similarity_search.py). If you spot a problem please help fix it by [contributing](https://docs.ultralytics.com/help/contributing/) a [Pull Request](https://github.com/ultralytics/ultralytics/edit/main/ultralytics/solutions/similarity_search.py) 🛠️. Thank you 🙏!
|
||||
|
||||
<br>
|
||||
|
||||
## ::: ultralytics.solutions.similarity_search.VisualAISearch
|
||||
|
||||
<br><br><hr><br>
|
||||
|
||||
## ::: ultralytics.solutions.similarity_search.SearchApp
|
||||
|
||||
<br><br>
|
||||
|
|
@ -40,7 +40,8 @@ Here's our curated list of Ultralytics solutions that can be used to create awes
|
|||
- [Parking Management](../guides/parking-management.md): Organize and direct vehicle flow in parking areas with YOLO11, optimizing space utilization and user experience.
|
||||
- [Analytics](../guides/analytics.md): Conduct comprehensive data analysis to discover patterns and make informed decisions, leveraging YOLO11 for descriptive, predictive, and prescriptive analytics.
|
||||
- [Live Inference with Streamlit](../guides/streamlit-live-inference.md): Leverage the power of YOLO11 for real-time [object detection](https://www.ultralytics.com/glossary/object-detection) directly through your web browser with a user-friendly Streamlit interface.
|
||||
- [Track Objects in Zone](../guides/trackzone.md) 🚀 NEW: Learn how to track objects within specific zones of video frames using YOLO11 for precise and efficient monitoring.
|
||||
- [Track Objects in Zone](../guides/trackzone.md): Learn how to track objects within specific zones of video frames using YOLO11 for precise and efficient monitoring.
|
||||
- [Similarity search](../guides/similarity-search.md) 🚀 NEW: Enable intelligent image retrieval by combining [OpenAI CLIP](https://cookbook.openai.com/examples/custom_image_embedding_search) embeddings with [Meta FAISS](https://ai.meta.com/tools/faiss/), allowing natural language queries like "person holding a bag" or "vehicles in motion."
|
||||
|
||||
### Solutions Arguments
|
||||
|
||||
|
|
|
|||
|
|
@ -342,7 +342,9 @@ nav:
|
|||
- Parking Management: guides/parking-management.md
|
||||
- Analytics: guides/analytics.md
|
||||
- Live Inference: guides/streamlit-live-inference.md
|
||||
- Track Objects in Zone 🚀 NEW: guides/trackzone.md
|
||||
- Track Objects in Zone: guides/trackzone.md
|
||||
- Similarity Search 🚀 NEW: guides/similarity-search.md
|
||||
|
||||
- Guides:
|
||||
- guides/index.md
|
||||
- YOLO Common Issues: guides/yolo-common-issues.md
|
||||
|
|
@ -599,6 +601,7 @@ nav:
|
|||
- queue_management: reference/solutions/queue_management.md
|
||||
- region_counter: reference/solutions/region_counter.md
|
||||
- security_alarm: reference/solutions/security_alarm.md
|
||||
- similarity_search: reference/solutions/similarity_search.md
|
||||
- solutions: reference/solutions/solutions.md
|
||||
- speed_estimation: reference/solutions/speed_estimation.md
|
||||
- streamlit_inference: reference/solutions/streamlit_inference.md
|
||||
|
|
|
|||
|
|
@ -105,6 +105,7 @@ export = [
|
|||
solutions = [
|
||||
"shapely>=2.0.0", # shapely for point and polygon data matching
|
||||
"streamlit>=1.29.0", # for live inference on web browser, i.e `yolo streamlit-predict`
|
||||
"flask", # for similarity search solution
|
||||
]
|
||||
logging = [
|
||||
"wandb", # https://docs.ultralytics.com/integrations/weights-biases/
|
||||
|
|
|
|||
|
|
@ -174,3 +174,14 @@ def test_solution(name, solution_class, needs_frame_count, video, kwargs):
|
|||
video_path=str(TMP / video),
|
||||
needs_frame_count=needs_frame_count,
|
||||
)
|
||||
|
||||
|
||||
@pytest.mark.slow
|
||||
@pytest.mark.skipif(checks.IS_PYTHON_3_8, reason="Disabled due to unsupported CLIP dependencies.")
|
||||
@pytest.mark.skipif(IS_RASPBERRYPI, reason="Disabled due to slow performance on Raspberry Pi.")
|
||||
def test_similarity_search():
|
||||
"""Test similarity search solution."""
|
||||
from ultralytics import solutions
|
||||
|
||||
searcher = solutions.VisualAISearch()
|
||||
_ = searcher("a dog sitting on a bench") # Returns the results in format "- img name | similarity score"
|
||||
|
|
|
|||
|
|
@ -1,6 +1,6 @@
|
|||
# Ultralytics 🚀 AGPL-3.0 License - https://ultralytics.com/license
|
||||
|
||||
__version__ = "8.3.126"
|
||||
__version__ = "8.3.127"
|
||||
|
||||
import os
|
||||
|
||||
|
|
|
|||
|
|
@ -12,6 +12,7 @@ from .parking_management import ParkingManagement, ParkingPtsSelection
|
|||
from .queue_management import QueueManager
|
||||
from .region_counter import RegionCounter
|
||||
from .security_alarm import SecurityAlarm
|
||||
from .similarity_search import SearchApp, VisualAISearch
|
||||
from .speed_estimation import SpeedEstimator
|
||||
from .streamlit_inference import Inference
|
||||
from .trackzone import TrackZone
|
||||
|
|
@ -35,4 +36,6 @@ __all__ = (
|
|||
"Analytics",
|
||||
"Inference",
|
||||
"TrackZone",
|
||||
"SearchApp",
|
||||
"VisualAISearch",
|
||||
)
|
||||
|
|
|
|||
|
|
@ -48,6 +48,7 @@ class SolutionConfig:
|
|||
half (bool): Whether to use FP16 precision (requires a supported CUDA device).
|
||||
tracker (str): Path to tracking configuration YAML file (e.g., 'botsort.yaml').
|
||||
verbose (bool): Enable verbose logging output for debugging or diagnostics.
|
||||
data (str): Path to image directory used for similarity search.
|
||||
|
||||
Methods:
|
||||
update: Update the configuration with user-defined keyword arguments and raise error on invalid keys.
|
||||
|
|
@ -91,6 +92,7 @@ class SolutionConfig:
|
|||
half: bool = False
|
||||
tracker: str = "botsort.yaml"
|
||||
verbose: bool = True
|
||||
data: str = "images"
|
||||
|
||||
def update(self, **kwargs):
|
||||
"""Update configuration parameters with new values provided as keyword arguments."""
|
||||
|
|
|
|||
176
ultralytics/solutions/similarity_search.py
Normal file
176
ultralytics/solutions/similarity_search.py
Normal file
|
|
@ -0,0 +1,176 @@
|
|||
# Ultralytics 🚀 AGPL-3.0 License - https://ultralytics.com/license
|
||||
|
||||
import os
|
||||
from pathlib import Path
|
||||
|
||||
import numpy as np
|
||||
import torch
|
||||
from PIL import Image
|
||||
|
||||
from ultralytics.data.utils import IMG_FORMATS
|
||||
from ultralytics.solutions.solutions import BaseSolution
|
||||
from ultralytics.utils.checks import check_requirements
|
||||
from ultralytics.utils.torch_utils import select_device
|
||||
|
||||
os.environ["KMP_DUPLICATE_LIB_OK"] = "TRUE" # Avoid OpenMP conflict on some systems
|
||||
|
||||
|
||||
class VisualAISearch(BaseSolution):
|
||||
"""
|
||||
VisualAISearch leverages OpenCLIP to generate high-quality image and text embeddings, aligning them in a shared
|
||||
semantic space. It then uses FAISS to perform fast and scalable similarity-based retrieval, allowing users to search
|
||||
large collections of images using natural language queries with high accuracy and speed.
|
||||
|
||||
Attributes:
|
||||
data (str): Directory containing images.
|
||||
device (str): Computation device, e.g., 'cpu' or 'cuda'.
|
||||
"""
|
||||
|
||||
def __init__(self, **kwargs):
|
||||
"""Initializes the VisualAISearch class with the FAISS index file and CLIP model."""
|
||||
super().__init__(**kwargs)
|
||||
check_requirements(["open-clip-torch", "faiss-cpu"])
|
||||
import faiss
|
||||
import open_clip
|
||||
|
||||
self.faiss = faiss
|
||||
self.open_clip = open_clip
|
||||
|
||||
self.faiss_index = "faiss.index"
|
||||
self.data_path_npy = "paths.npy"
|
||||
self.model_name = "ViT-B-32-quickgelu"
|
||||
self.data_dir = Path(self.CFG["data"])
|
||||
self.device = select_device(self.CFG["device"])
|
||||
|
||||
if not self.data_dir.exists():
|
||||
from ultralytics.utils import ASSETS_URL
|
||||
|
||||
self.LOGGER.warning(f"{self.data_dir} not found. Downloading images.zip from {ASSETS_URL}/images.zip")
|
||||
from ultralytics.utils.downloads import safe_download
|
||||
|
||||
safe_download(url=f"{ASSETS_URL}/images.zip", unzip=True, retry=3)
|
||||
self.data_dir = Path("images")
|
||||
|
||||
self.clip_model, _, self.preprocess = self.open_clip.create_model_and_transforms(
|
||||
self.model_name, pretrained="openai"
|
||||
)
|
||||
self.clip_model = self.clip_model.to(self.device).eval()
|
||||
self.tokenizer = self.open_clip.get_tokenizer(self.model_name)
|
||||
|
||||
self.index = None
|
||||
self.image_paths = []
|
||||
|
||||
self.load_or_build_index()
|
||||
|
||||
def extract_image_feature(self, path):
|
||||
"""Extract CLIP image embedding."""
|
||||
image = Image.open(path)
|
||||
tensor = self.preprocess(image).unsqueeze(0).to(self.device)
|
||||
with torch.no_grad():
|
||||
return self.clip_model.encode_image(tensor).cpu().numpy()
|
||||
|
||||
def extract_text_feature(self, text):
|
||||
"""Extract CLIP text embedding."""
|
||||
tokens = self.tokenizer([text]).to(self.device)
|
||||
with torch.no_grad():
|
||||
return self.clip_model.encode_text(tokens).cpu().numpy()
|
||||
|
||||
def load_or_build_index(self):
|
||||
"""Loads FAISS index or builds a new one from image features."""
|
||||
# Check if the FAISS index and corresponding image paths already exist
|
||||
if Path(self.faiss_index).exists() and Path(self.data_path_npy).exists():
|
||||
self.LOGGER.info("Loading existing FAISS index...")
|
||||
self.index = self.faiss.read_index(self.faiss_index) # Load the FAISS index from disk
|
||||
self.image_paths = np.load(self.data_path_npy) # Load the saved image path list
|
||||
return # Exit the function as the index is successfully loaded
|
||||
|
||||
# If the index doesn't exist, start building it from scratch
|
||||
self.LOGGER.info("Building FAISS index from images...")
|
||||
vectors = [] # List to store feature vectors of images
|
||||
|
||||
# Iterate over all image files in the data directory
|
||||
for file in self.data_dir.iterdir():
|
||||
# Skip files that are not valid image formats
|
||||
if file.suffix.lower().lstrip(".") not in IMG_FORMATS:
|
||||
continue
|
||||
try:
|
||||
# Extract feature vector for the image and add to the list
|
||||
vectors.append(self.extract_image_feature(file))
|
||||
self.image_paths.append(file.name) # Store the corresponding image name
|
||||
except Exception as e:
|
||||
self.LOGGER.warning(f"Skipping {file.name}: {e}")
|
||||
|
||||
# If no vectors were successfully created, raise an error
|
||||
if not vectors:
|
||||
raise RuntimeError("No image embeddings could be generated.")
|
||||
|
||||
vectors = np.vstack(vectors).astype("float32") # Stack all vectors into a NumPy array and convert to float32
|
||||
self.faiss.normalize_L2(vectors) # Normalize vectors to unit length for cosine similarity
|
||||
|
||||
self.index = self.faiss.IndexFlatIP(vectors.shape[1]) # Create a new FAISS index using inner product
|
||||
self.index.add(vectors) # Add the normalized vectors to the FAISS index
|
||||
self.faiss.write_index(self.index, self.faiss_index) # Save the newly built FAISS index to disk
|
||||
np.save(self.data_path_npy, np.array(self.image_paths)) # Save the list of image paths to disk
|
||||
|
||||
self.LOGGER.info(f"Indexed {len(self.image_paths)} images.")
|
||||
|
||||
def search(self, query, k=30, similarity_thresh=0.1):
|
||||
"""Returns top-k semantically similar images to the given query."""
|
||||
text_feat = self.extract_text_feature(query).astype("float32")
|
||||
self.faiss.normalize_L2(text_feat)
|
||||
|
||||
D, index = self.index.search(text_feat, k)
|
||||
results = [
|
||||
(self.image_paths[i], float(D[0][idx])) for idx, i in enumerate(index[0]) if D[0][idx] >= similarity_thresh
|
||||
]
|
||||
results.sort(key=lambda x: x[1], reverse=True)
|
||||
|
||||
self.LOGGER.info("\nRanked Results:")
|
||||
for name, score in results:
|
||||
self.LOGGER.info(f" - {name} | Similarity: {score:.4f}")
|
||||
|
||||
return [r[0] for r in results]
|
||||
|
||||
def __call__(self, query):
|
||||
"""Direct call for search function."""
|
||||
return self.search(query)
|
||||
|
||||
|
||||
class SearchApp:
|
||||
"""
|
||||
A Flask-based web interface powers the semantic image search experience, enabling users to input natural language
|
||||
queries and instantly view the most relevant images retrieved from the indexed database—all through a clean,
|
||||
responsive, and easily customizable frontend.
|
||||
|
||||
Args:
|
||||
data (str): Path to images to index and search.
|
||||
device (str): Device to run inference on (e.g. 'cpu', 'cuda').
|
||||
"""
|
||||
|
||||
def __init__(self, data="images", device=None):
|
||||
"""Initialization of the VisualAISearch class for performing semantic image search."""
|
||||
check_requirements("flask")
|
||||
from flask import Flask, render_template, request
|
||||
|
||||
self.render_template = render_template
|
||||
self.request = request
|
||||
self.searcher = VisualAISearch(data=data, device=device)
|
||||
self.app = Flask(
|
||||
__name__,
|
||||
template_folder="templates",
|
||||
static_folder=Path(data).resolve(), # Absolute path to serve images
|
||||
static_url_path="/images", # URL prefix for images
|
||||
)
|
||||
self.app.add_url_rule("/", view_func=self.index, methods=["GET", "POST"])
|
||||
|
||||
def index(self):
|
||||
"""Function to process the user query and display output."""
|
||||
results = []
|
||||
if self.request.method == "POST":
|
||||
query = self.request.form.get("query", "").strip()
|
||||
results = self.searcher(query)
|
||||
return self.render_template("similarity-search.html", results=results)
|
||||
|
||||
def run(self, debug=False):
|
||||
"""Runs the Flask web app."""
|
||||
self.app.run(debug=debug)
|
||||
|
|
@ -54,55 +54,56 @@ class BaseSolution:
|
|||
is_cli (bool): Enables CLI mode if set to True.
|
||||
**kwargs (Any): Additional configuration parameters that override defaults.
|
||||
"""
|
||||
check_requirements("shapely>=2.0.0")
|
||||
from shapely.geometry import LineString, Point, Polygon
|
||||
from shapely.prepared import prep
|
||||
|
||||
self.LineString = LineString
|
||||
self.Polygon = Polygon
|
||||
self.Point = Point
|
||||
self.prep = prep
|
||||
self.annotator = None # Initialize annotator
|
||||
self.tracks = None
|
||||
self.track_data = None
|
||||
self.boxes = []
|
||||
self.clss = []
|
||||
self.track_ids = []
|
||||
self.track_line = None
|
||||
self.masks = None
|
||||
self.r_s = None
|
||||
|
||||
self.LOGGER = LOGGER # Store logger object to be used in multiple solution classes
|
||||
self.CFG = vars(SolutionConfig().update(**kwargs))
|
||||
self.LOGGER.info(f"Ultralytics Solutions: ✅ {self.CFG}")
|
||||
self.LOGGER = LOGGER # Store logger object to be used in multiple solution classes
|
||||
|
||||
self.region = self.CFG["region"] # Store region data for other classes usage
|
||||
self.line_width = self.CFG["line_width"]
|
||||
if self.__class__.__name__ != "VisualAISearch":
|
||||
check_requirements("shapely>=2.0.0")
|
||||
from shapely.geometry import LineString, Point, Polygon
|
||||
from shapely.prepared import prep
|
||||
|
||||
# Load Model and store additional information (classes, show_conf, show_label)
|
||||
if self.CFG["model"] is None:
|
||||
self.CFG["model"] = "yolo11n.pt"
|
||||
self.model = YOLO(self.CFG["model"])
|
||||
self.names = self.model.names
|
||||
self.classes = self.CFG["classes"]
|
||||
self.show_conf = self.CFG["show_conf"]
|
||||
self.show_labels = self.CFG["show_labels"]
|
||||
self.LineString = LineString
|
||||
self.Polygon = Polygon
|
||||
self.Point = Point
|
||||
self.prep = prep
|
||||
self.annotator = None # Initialize annotator
|
||||
self.tracks = None
|
||||
self.track_data = None
|
||||
self.boxes = []
|
||||
self.clss = []
|
||||
self.track_ids = []
|
||||
self.track_line = None
|
||||
self.masks = None
|
||||
self.r_s = None
|
||||
|
||||
self.track_add_args = { # Tracker additional arguments for advance configuration
|
||||
k: self.CFG[k] for k in ["iou", "conf", "max_det", "half", "tracker", "device", "verbose"]
|
||||
} # verbose must be passed to track method; setting it False in YOLO still logs the track information.
|
||||
self.LOGGER.info(f"Ultralytics Solutions: ✅ {self.CFG}")
|
||||
self.region = self.CFG["region"] # Store region data for other classes usage
|
||||
self.line_width = self.CFG["line_width"]
|
||||
|
||||
if is_cli and self.CFG["source"] is None:
|
||||
d_s = "solutions_ci_demo.mp4" if "-pose" not in self.CFG["model"] else "solution_ci_pose_demo.mp4"
|
||||
self.LOGGER.warning(f"source not provided. using default source {ASSETS_URL}/{d_s}")
|
||||
from ultralytics.utils.downloads import safe_download
|
||||
# Load Model and store additional information (classes, show_conf, show_label)
|
||||
if self.CFG["model"] is None:
|
||||
self.CFG["model"] = "yolo11n.pt"
|
||||
self.model = YOLO(self.CFG["model"])
|
||||
self.names = self.model.names
|
||||
self.classes = self.CFG["classes"]
|
||||
self.show_conf = self.CFG["show_conf"]
|
||||
self.show_labels = self.CFG["show_labels"]
|
||||
|
||||
safe_download(f"{ASSETS_URL}/{d_s}") # download source from ultralytics assets
|
||||
self.CFG["source"] = d_s # set default source
|
||||
self.track_add_args = { # Tracker additional arguments for advance configuration
|
||||
k: self.CFG[k] for k in ["iou", "conf", "device", "max_det", "half", "tracker", "device", "verbose"]
|
||||
} # verbose must be passed to track method; setting it False in YOLO still logs the track information.
|
||||
|
||||
# Initialize environment and region setup
|
||||
self.env_check = check_imshow(warn=True)
|
||||
self.track_history = defaultdict(list)
|
||||
if is_cli and self.CFG["source"] is None:
|
||||
d_s = "solutions_ci_demo.mp4" if "-pose" not in self.CFG["model"] else "solution_ci_pose_demo.mp4"
|
||||
self.LOGGER.warning(f"source not provided. using default source {ASSETS_URL}/{d_s}")
|
||||
from ultralytics.utils.downloads import safe_download
|
||||
|
||||
safe_download(f"{ASSETS_URL}/{d_s}") # download source from ultralytics assets
|
||||
self.CFG["source"] = d_s # set default source
|
||||
|
||||
# Initialize environment and region setup
|
||||
self.env_check = check_imshow(warn=True)
|
||||
self.track_history = defaultdict(list)
|
||||
|
||||
def adjust_box_label(self, cls, conf, track_id=None):
|
||||
"""
|
||||
|
|
|
|||
160
ultralytics/solutions/templates/similarity-search.html
Normal file
160
ultralytics/solutions/templates/similarity-search.html
Normal file
|
|
@ -0,0 +1,160 @@
|
|||
<!-- Ultralytics 🚀 AGPL-3.0 License - https://ultralytics.com/license -->
|
||||
|
||||
<!--Similarity search webpage-->
|
||||
<!doctype html>
|
||||
<html lang="en">
|
||||
<head>
|
||||
<meta charset="UTF-8" />
|
||||
<meta name="viewport" content="width=device-width, initial-scale=1.0" />
|
||||
<title>Semantic Image Search</title>
|
||||
<link
|
||||
href="https://fonts.googleapis.com/css2?family=Inter:wght@400;600&display=swap"
|
||||
rel="stylesheet"
|
||||
/>
|
||||
<style>
|
||||
body {
|
||||
background: linear-gradient(135deg, #f0f4ff, #f9fbff);
|
||||
font-family: "Inter", sans-serif;
|
||||
color: #111e68;
|
||||
padding: 2rem;
|
||||
margin: 0;
|
||||
min-height: 100vh;
|
||||
}
|
||||
|
||||
h1 {
|
||||
text-align: center;
|
||||
margin-bottom: 2rem;
|
||||
font-size: 2.5rem;
|
||||
font-weight: 600;
|
||||
}
|
||||
|
||||
form {
|
||||
display: flex;
|
||||
flex-wrap: wrap;
|
||||
justify-content: center;
|
||||
align-items: center;
|
||||
gap: 1rem;
|
||||
margin-bottom: 3rem;
|
||||
animation: fadeIn 1s ease-in-out;
|
||||
}
|
||||
|
||||
input[type="text"] {
|
||||
width: 300px;
|
||||
padding: 0.75rem 1rem;
|
||||
font-size: 1rem;
|
||||
border-radius: 10px;
|
||||
border: 1px solid #ccc;
|
||||
box-shadow: 0 2px 6px rgba(0, 0, 0, 0.05);
|
||||
transition: box-shadow 0.3s ease;
|
||||
}
|
||||
|
||||
input[type="text"]:focus {
|
||||
outline: none;
|
||||
box-shadow: 0 0 0 3px rgba(17, 30, 104, 0.2);
|
||||
}
|
||||
|
||||
button {
|
||||
background-color: #111e68;
|
||||
color: white;
|
||||
font-weight: 600;
|
||||
font-size: 1rem;
|
||||
padding: 0.75rem 1.5rem;
|
||||
border-radius: 10px;
|
||||
border: none;
|
||||
cursor: pointer;
|
||||
transition:
|
||||
background-color 0.3s ease,
|
||||
transform 0.2s ease;
|
||||
}
|
||||
|
||||
button:hover {
|
||||
background-color: #1f2e9f;
|
||||
transform: translateY(-2px);
|
||||
}
|
||||
|
||||
.grid {
|
||||
display: grid;
|
||||
grid-template-columns: repeat(auto-fill, minmax(260px, 1fr));
|
||||
gap: 1.5rem;
|
||||
max-width: 1600px;
|
||||
margin: auto;
|
||||
animation: fadeInUp 1s ease-in-out;
|
||||
}
|
||||
|
||||
.card {
|
||||
background: white;
|
||||
border-radius: 16px;
|
||||
overflow: hidden;
|
||||
box-shadow: 0 6px 14px rgba(0, 0, 0, 0.08);
|
||||
transition:
|
||||
transform 0.3s ease,
|
||||
box-shadow 0.3s ease;
|
||||
}
|
||||
|
||||
.card:hover {
|
||||
transform: translateY(-6px);
|
||||
box-shadow: 0 10px 20px rgba(0, 0, 0, 0.1);
|
||||
}
|
||||
|
||||
.card img {
|
||||
width: 100%;
|
||||
height: 100%;
|
||||
object-fit: cover;
|
||||
display: block;
|
||||
}
|
||||
|
||||
@keyframes fadeIn {
|
||||
0% {
|
||||
opacity: 0;
|
||||
transform: scale(0.95);
|
||||
}
|
||||
100% {
|
||||
opacity: 1;
|
||||
transform: scale(1);
|
||||
}
|
||||
}
|
||||
|
||||
@keyframes fadeInUp {
|
||||
0% {
|
||||
opacity: 0;
|
||||
transform: translateY(20px);
|
||||
}
|
||||
100% {
|
||||
opacity: 1;
|
||||
transform: translateY(0);
|
||||
}
|
||||
}
|
||||
</style>
|
||||
</head>
|
||||
<body>
|
||||
<div style="text-align: center; margin-bottom: 1rem">
|
||||
<img
|
||||
src="https://raw.githubusercontent.com/ultralytics/assets/main/logo/favicon.png"
|
||||
alt="Ultralytics Logo"
|
||||
style="height: 40px"
|
||||
/>
|
||||
</div>
|
||||
<h1>Semantic Image Search with AI</h1>
|
||||
|
||||
<!-- Search box -->
|
||||
<form method="POST">
|
||||
<input
|
||||
type="text"
|
||||
name="query"
|
||||
placeholder="Describe the scene (e.g., man walking)"
|
||||
value="{{ request.form['query'] }}"
|
||||
required
|
||||
/>
|
||||
<button type="submit">Search</button>
|
||||
</form>
|
||||
|
||||
<!-- Search results grid -->
|
||||
<div class="grid">
|
||||
{% for img in results %}
|
||||
<div class="card">
|
||||
<img src="{{ url_for('static', filename=img) }}" alt="Result Image" />
|
||||
</div>
|
||||
{% endfor %}
|
||||
</div>
|
||||
</body>
|
||||
</html>
|
||||
Loading…
Reference in a new issue