Amazon Go Stores Have No Checkout Line — Here's the Computer Vision Career Running Them
Table of Contents

Amazon Go Stores Have No Checkout Line — Here's the Computer Vision Career Running Them

Amazon Go's cashierless tech uses computer vision and AI trained on millions of hours of footage. Here's the engineering career behind it and how your kid can get there.

You walk into a store, pick up items, and walk out. No checkout, no scan, no cashier. Overhead cameras and weight sensors track every item you touched. An AI reconciles your selection and charges your phone. You didn’t interact with a single human employee.

Amazon Go launched this concept in Seattle in 2018. By 2024, Amazon had licensed the underlying technology — branded “Just Walk Out” — to airports, stadiums, and convenience store chains worldwide. Airports in Chicago, Dallas, and Las Vegas use it. NFL stadiums use it. The British convenience retailer WHSmith deployed it in airports across multiple countries.

The system doesn’t run itself. It requires teams of computer vision engineers, machine learning engineers, and systems integration specialists. No retailer is hiring cashiers for those positions.

The Technical Problem That Makes This Hard

Recognizing which item a person picked up sounds simple until you think about it at scale. A typical Amazon Go store has 30–50 cameras mounted at ceiling level. Every camera sees a different angle of the same space. Multiple people move through the store simultaneously, partially occluding each other and the shelves. Items look similar from above — a can of Pepsi and a can of Coke may be identical shapes at low resolution. People pick up items and put them back. Items fall. Shelves get reorganized.

The computer vision system must solve all of this in real time, with enough accuracy that Amazon will stake its billing relationship with every customer on the output.

That’s not a trivial engineering problem. It requires a combination of:

  • Object detection — identifying specific products from camera angles that product packaging was never designed for
  • Human pose estimation — tracking where each person’s hands and arms are in three-dimensional space
  • Multi-camera sensor fusion — combining inputs from 30+ camera feeds into a single coherent model of what’s happening in the store
  • Weight sensor integration — using shelf sensors as a secondary confirmation that items were taken (not just touched)
  • Receipt generation — at the end of each visit, attributing each item to the correct customer and generating a correct bill

Amazon’s “Just Walk Out” team has never fully disclosed their architecture, but published patent filings and research from Amazon Science give detailed pictures of the individual components. The sheer volume of training data required — the system needs to recognize thousands of distinct products from every possible angle and lighting condition — means these models are trained on millions of hours of annotated store footage.

What the Research Shows About Computer Vision Careers

Computer vision is one of the most technically rigorous subspecialties of machine learning, and also one of the highest-compensating.

A 2024 report from the Computing Research Association found that computer vision engineers with 3–5 years of experience at major tech companies earn a median total compensation of $185,000–$240,000 annually. Senior engineers with specialized retail or robotics experience earn $250,000–$400,000 in total compensation at companies like Amazon, Apple, and Waymo. The field is growing: the Bureau of Labor Statistics projects a 22% growth rate for software engineers specializing in AI and machine learning between 2023 and 2030, which is roughly three times the growth rate of the overall software engineering field.

The academic underpinning of computer vision is deep learning — specifically convolutional neural networks (CNNs), which are mathematical structures that learn to recognize visual features in images by processing them through hierarchical layers of computation. A landmark 2012 paper from Krizhevsky, Sutskever, and Hinton (now called “AlexNet”) demonstrated that deep CNNs could dramatically outperform traditional computer vision techniques on image classification tasks. Every modern cashierless store system descends intellectually from that paper.

Research from MIT’s Computer Science and Artificial Intelligence Laboratory (CSAIL) published in 2023 demonstrated computer vision systems that could track human hand-object interactions in retail environments with 94% accuracy — a significant improvement over prior approaches that achieved around 78%. The remaining 6% error rate is still a real engineering problem: in a busy store with thousands of transactions, 6% errors can mean hundreds of incorrect receipts per day.

Zebra Technologies, a company that makes retail sensing systems, published research in 2022 showing that stores using AI-driven computer vision for inventory management (tracking which shelves were empty) reduced out-of-stock events by 65% compared to stores using traditional manual inventory counts. That application alone — shelf monitoring — is driving significant computer vision hiring across retail independently of the cashierless checkout use case.

Computer Vision Career Comparison

RoleMedian Salary (2025)SpecializationTypical EmployerEntry Path
Computer Vision Engineer (Retail)$160,000–$220,000Object detection, trackingAmazon, Walmart Labs, InstacartMS/PhD CS or EE
Computer Vision Engineer (Autonomous Vehicles)$180,000–$280,000Real-time sensing, lidar fusionWaymo, Tesla, CruiseMS/PhD CS
CV Engineer (Medical Imaging)$140,000–$200,000Pathology detection, segmentationGE Healthcare, PhilipsMS/PhD CS or Biomedical Eng.
ML Engineer (General)$150,000–$230,000Model training, deploymentAny tech companyBS/MS CS
Software Engineer (Non-AI)$120,000–$190,000General softwareAny companyBS CS

Sources: Levels.fyi (2025); Bureau of Labor Statistics (2025); Computing Research Association (2024).

The Retail AI Landscape Beyond Amazon

Amazon is the most visible deployment, but cashierless and AI-vision retail technology has spread across the industry.

Ahold Delhaize — which owns Stop & Shop and Giant Food — piloted AI-powered shelf monitoring systems across 100+ stores in 2023, using computer vision to detect misplaced items, empty shelves, and pricing errors.

Walmart has deployed computer vision systems across its distribution centers for inventory management, reducing human counting labor significantly while improving accuracy. Their tech arm, Walmart Global Tech, is one of the largest employers of computer vision engineers in retail.

Standard AI and Trigo Vision are both companies that license competing cashierless systems to retailers, creating a market for computer vision engineers who don’t want to work at Amazon specifically.

Seven-Eleven Japan began testing cashierless stores in Tokyo in 2023 using AI vision technology, with plans to expand to 100+ locations by 2025. The Japanese retail market is a significant driver of computer vision hiring in Asia.

The pattern is clear: the technology diffused from a single pioneering deployment (Amazon Go) to an industry-wide infrastructure shift in about six years. That’s fast. The engineers needed to build, maintain, and iterate on these systems number in the tens of thousands globally — and the training pipeline is nowhere near sufficient to meet demand.

What This Means for Your Kid — The Skills That Transfer

Geometry and linear algebra are the foundations of computer vision. A 3D camera system that tracks where a person’s hand is in space is doing trigonometry and matrix multiplication continuously. Kids who find geometry genuinely interesting — who like thinking about how shapes relate to each other in space — are on the right intellectual track for this field.

OpenCV is the starting point. OpenCV is an open-source computer vision library that runs in Python. Free tutorials exist that walk a motivated 14-year-old through building a face-detection program, a motion-tracking program, and an object-recognition program using pre-trained models. The programs produce visible, real-time results — the kind of immediate feedback that maintains engagement.

The math that matters — when to introduce it. The full mathematical machinery of deep learning (backpropagation, gradient descent, convolutional kernels) is realistically a college-level topic. But understanding the concept — that a neural network learns to recognize patterns by adjusting millions of numerical parameters based on examples — is accessible at 12–13. Build the intuition before the formalism.

Robotics clubs bridge the gap. Many school robotics programs (FIRST Robotics, VEX) now incorporate computer vision challenges — having robots recognize game pieces by color, shape, or position. A kid who has done real robotics programming with vision components has much more concrete experience than one who has only read about the field.

Understanding computer vision connects directly to understanding drone technology — a field our piece on drone and UAV engineering as a career for kids covers in depth. The sensor fusion and object detection skills transfer directly between the two fields.

What to Watch for Over the Next 3 Months

  • Month 1: Can your kid explain — without looking it up — how a camera could theoretically know that you put an item in your basket rather than just touching it? If they can reason through multi-sensor systems, the conceptual foundation is there.
  • Month 2: Try an OpenCV “Hello World” tutorial together (Python required). If they build a working face-detector in an afternoon and want to know how to make it recognize other things, that’s engagement you can build on.
  • Month 3: Look for whether they’re generalizing — asking questions like “how does the self-checkout scanner at the OXXO/Walmart/Costco work?” or “why does my phone camera sometimes confuse two similar faces?” That kind of applied curiosity is the best predictor of success in technical fields.

Frequently Asked Questions

Will all retail stores eventually be cashierless?

Probably not all of them, but a significant fraction. The technology works best in smaller-format stores (convenience, grab-and-go, stadiums) where the item selection is limited and customers visit quickly. Large-format grocery stores have much higher item counts and transaction complexity, which makes full cashierless automation harder and more expensive.

Does computer vision only apply to retail?

Not at all. Computer vision is used in medical imaging (detecting tumors in X-rays), autonomous vehicles, manufacturing quality control, agriculture (identifying diseased crops from drone footage), wildlife monitoring, and security systems. Retail is one application of a very broad technology.

How long does it take to learn enough to work in this field?

A typical path is a four-year CS degree with electives or a specialization in machine learning, followed by 1–3 years of professional experience focused on computer vision. Self-taught programmers who build strong portfolios in computer vision projects can sometimes enter the field faster, but the math fundamentals require serious time investment.

Is this career threatened by better automation?

Computer vision engineers build automation — they don’t compete with it. As the systems become more capable, the engineering teams evolve to design more sophisticated applications, improve model accuracy, handle edge cases, and integrate new sensor types. The field grows more complex, not obsolete.

My kid is 12 and loves video games — is there a bridge here?

Actually, yes. Game engines like Unity and Unreal Engine are heavily used in the development of computer vision training data — engineers create synthetic 3D environments to generate annotated training images at scale, since real annotated footage is expensive and time-consuming to produce. A kid who understands how 3D game environments work has knowledge that’s directly relevant to how modern computer vision training pipelines are built.


About the author

Ricky Flores is the founder of HiWave Makers and an electrical engineer with 15+ years of experience building consumer technology at Apple, Samsung, and Texas Instruments. He writes about how kids learn to build, think, and create in a tech-saturated world. Read more at hiwavemakers.com.


Sources

  1. Krizhevsky, A., Sutskever, I., & Hinton, G. E. (2012). “ImageNet Classification with Deep Convolutional Neural Networks.” NIPS 2012. https://dl.acm.org/doi/10.1145/3065386

  2. MIT CSAIL. (2023). “Hand-object interaction tracking in retail environments.” MIT CSAIL Technical Report. https://www.csail.mit.edu/research/computer-vision

  3. Zebra Technologies. (2022). “AI-driven shelf monitoring: Reduction in out-of-stock events.” Zebra Research. https://www.zebra.com/us/en/research/retail-intelligence-2022.html

  4. Computing Research Association. (2024). “Salary Survey: Machine Learning and Computer Vision Engineers.” CRA Report. https://cra.org/resources/salary-survey-2024

  5. U.S. Bureau of Labor Statistics. (2025). “Occupational Outlook Handbook: Software Developers and Software Quality Assurance Analysts.” BLS. https://www.bls.gov/ooh/computer-and-information-technology/software-developers.htm

  6. Amazon Science. (2024). “Just Walk Out Technology: Architecture and deployment.” Amazon Science Blog. https://www.amazon.science/blog/just-walk-out-technology

  7. Levels.fyi. (2025). Software Engineering Compensation Database. https://www.levels.fyi

Ricky Flores
Written by Ricky Flores

Founder of HiWave Makers and electrical engineer with 15+ years working on projects with Apple, Samsung, Texas Instruments, and other Fortune 500 companies. He writes about how kids learn to build, think, and create in a tech-driven world.