Computer Vision
Definition
AI field that enables computers to interpret and understand visual information from images and videos, enhancing automation and analysis.
Use Cases
- Amazon: Checkout-free retail using cameras and sensors to detect items customers take and automatically charge them — Computer vision models analyze video feeds to track customer interactions with shelves and products, combined with sensor fusion and backend systems to maintain a virtual cart (Reduced checkout friction and enabled faster store throughput by removing traditional cashier lines)
- Google: Real-time translation of text in the camera view (e.g., signs, menus) — OCR detects text regions, recognition converts images to text, and translation models translate the recognized text; the app overlays translated text back onto the live camera feed (Improved usability for travelers by enabling quick understanding of foreign-language text without manual typing)
- Tesla: Driver-assistance features using cameras to perceive lanes, vehicles, pedestrians, and traffic signals — On-vehicle camera systems feed computer vision models that perform detection and scene understanding; outputs are used by planning/control software to assist driving (Enabled advanced driver-assistance capabilities that can reduce driver workload in supported scenarios)
Provider Equivalents
- AWS: Amazon Rekognition
- Azure: Azure AI Vision
- GCP: Vertex AI Vision (and Vision API)
- OCI: OCI Vision
Frequently Asked Questions
- What's the difference between Computer Vision and image processing?
- Image processing focuses on changing or enhancing images (for example, resizing, denoising, sharpening, or adjusting contrast). Computer vision focuses on understanding what’s in an image or video (for example, detecting objects, reading text with OCR, recognizing defects, or tracking motion) so software can make decisions or trigger actions.
- When should I use Computer Vision?
- Use computer vision when you need to extract meaning from images or video at scale—such as automating visual inspection in manufacturing, reading documents with OCR, monitoring safety compliance (hard hats/vests), counting inventory on shelves, detecting damage for insurance claims, or analyzing medical images. It’s a good fit when manual review is slow, expensive, inconsistent, or too large to keep up with.
- How much does Computer Vision cost?
- Costs depend on (1) whether you use a managed API or build/train your own model, (2) how many images/videos you analyze, (3) the types of features used (OCR, face analysis, custom training, video analysis), and (4) compute and storage needs. Managed services typically charge per image, per page (for OCR), or per minute of video, plus any data storage/egress. Custom models add training costs (GPU/TPU time), ongoing inference costs, and MLOps costs (monitoring, retraining, labeling).
Category: ai-ml
Difficulty: intermediate
See Also