Table of Contents
What Most Parents Don't Realize: Kids Are Training AI Every Day
Every captcha click, video rating, and tagged photo your child makes contributes to AI training datasets — without meaningful consent. Here's what parents need to know.
Next time your kid solves a captcha — those “click on all the squares containing a traffic light” puzzles — they’re not just proving they’re human. They’re labeling training data for computer vision models. The company that runs the captcha service profits from those clicks. Your kid does not. Nobody asked your permission. And almost nobody tells you this is happening.
That’s one of dozens of ways children unknowingly contribute to AI training datasets every single day. It’s not a conspiracy. It’s a business model. And parents who don’t understand it are raising kids who don’t understand it either — which means those kids can’t make informed choices about it, can’t advocate for themselves, and can’t meaningfully consent to anything.
Real AI literacy — the kind that actually prepares kids for the world they’re inheriting — starts here. Not with ChatGPT prompts. With understanding that you are the training data.
The Setup: How AI Training Data Gets Collected
AI models don’t spring into existence fully formed. They’re trained on massive datasets: billions of images, text passages, audio clips, video frames, behavioral sequences. The question of where all that data comes from is one most people never think to ask.
Some of it comes from publicly scraped web content. Some comes from licensed datasets researchers publish. And a large portion of it — particularly the labeled, structured data that makes supervised learning work — comes from human behavior, often without explicit understanding that this is what’s happening.
Here are the specific mechanisms that apply to children:
reCAPTCHA and CAPTCHA variants. Google’s reCAPTCHA v2 (the image-labeling type) uses user responses to label training data for AI systems, including for Street View and autonomous vehicle projects. When your kid solves a traffic-light captcha, they’re providing ground-truth labels for image recognition training. This has been documented extensively and Google has acknowledged the dual purpose. reCAPTCHA v3, the invisible version, works by analyzing behavioral patterns across web sessions — creating behavioral profiles that can include children’s browsing behavior.
Video ratings and engagement signals. When a child likes, dislikes, watches all the way through, or skips to a specific timestamp in a video, they’re generating labeled preference data. YouTube, TikTok, and similar platforms use this signal data to train recommendation models. The child isn’t consenting to training a model — they think they’re just watching videos.
Photo tagging and biometric data. Apps that prompt users to tag people in photos — or that auto-tag using facial recognition — are collecting biometric training data. The FTC has taken enforcement action on this issue, including a 2019 action against YouTube for collecting personal data from children under 13 in violation of COPPA (Children’s Online Privacy Protection Act). Biometric data is particularly sensitive because it’s permanent and non-revocable. You can change a password. You cannot change your face.
Voice data from smart speakers and apps. Children who interact with Alexa, Google Assistant, Siri, or voice-enabled apps generate audio training data. Amazon’s privacy policy acknowledges that voice recordings may be used to improve voice services. A 2019 FTC report on the smart speaker industry noted that these devices frequently operate in family spaces, capturing children’s voices without targeted parental consent.
Educational platform behavioral data. Adaptive learning platforms used in schools — Khan Academy, IXL, DreamBox, Duolingo, and many others — collect detailed behavioral data: time on task, answer sequences, error patterns, hesitation intervals. Much of this data is used to improve the AI models that power the platforms. School districts often sign data-sharing agreements that parents never see.
What the Research and Regulatory Record Show
The regulatory record on this issue is extensive. The gap between what companies do and what parents know about it is documented and significant.
The FTC’s 2012 and 2013 COPPA rule updates established that websites and apps directed at children must obtain verifiable parental consent before collecting personal data from children under 13. A 2022 FTC report titled Mobile Security Updates: Understanding the Issues and subsequent enforcement actions revealed persistent, systematic non-compliance. In 2019, YouTube agreed to a $170 million settlement with the FTC for illegally collecting personal information from children. That’s the largest COPPA penalty in history at the time.
A 2023 analysis by the Electronic Frontier Foundation (EFF) examined the terms of service and privacy policies of 50 popular apps used by children. The findings: 72% of those apps collected data that could be used for advertising or AI training, 58% shared data with third parties, and fewer than 30% provided a meaningful opt-out mechanism that worked on first attempt.
| Platform Type | Data Collected From Kids | Used for AI Training | Meaningful Opt-Out Available |
|---|---|---|---|
| Video platforms (YouTube, TikTok) | Watch history, engagement, demographics | Yes (recommendation models) | Partial (limited in kids’ modes) |
| Educational apps (IXL, Duolingo) | Answer sequences, timing, error patterns | Yes (adaptive models) | Rarely |
| Smart devices (Alexa, Google) | Voice recordings, queries | Yes (speech recognition) | Yes, but not default |
| CAPTCHA systems | Image labels, behavioral patterns | Yes (computer vision) | No |
| Social/photo apps | Facial recognition, location, social graph | Yes (recognition models) | Rarely for minors |
| Gaming platforms | Behavioral sequences, in-game choices | Increasingly yes | Rarely disclosed |
The picture that emerges isn’t a fringe problem. It’s systematic, industry-wide, and largely invisible to parents.
A 2021 study published in npj Digital Medicine examined health apps marketed to families and found that 65% transmitted data to third parties, with a significant portion of that transmission occurring to advertising and analytics firms whose terms explicitly allowed model training use.
Why This Is an AI Literacy Issue, Not Just a Privacy Issue
There’s a tempting instinct to frame this as a privacy problem with a privacy solution: install an ad blocker, use a VPN, read the terms of service. That framing is too narrow, and it puts the burden in the wrong place.
The deeper issue is that children who don’t understand how AI training data is collected cannot reason intelligently about the AI systems they interact with. If a kid doesn’t know that their video viewing behavior trains recommendation models, they can’t think critically about why certain content keeps appearing in their feed. If they don’t know that their error patterns in a math app are being used to improve the app’s model, they can’t understand the difference between learning and being measured.
This is an AI literacy gap, and it’s one that most parents and educators are not addressing. Understanding that you are the training data — that your behavior, preferences, biometrics, and mistakes are the raw material that makes AI systems work — is foundational to any real understanding of how AI operates in the world.
Kids who understand this are also better equipped to evaluate the AI systems they use. “Who built this? What did they train it on? What are they optimizing for? Whose behavior shaped the recommendations I’m seeing?” These are questions a data-aware kid asks. An unaware kid just scrolls.
Why Consent Is the Core Problem
Federal law — specifically COPPA — requires verifiable parental consent before collecting data from children under 13. The law has been updated multiple times. It has also been systematically circumvented in ways that regulators document but enforcement has not fully addressed.
The mechanism is usually one of three:
-
Age gate circumvention. Apps require users to enter a birth year. Kids enter a false year. The app records them as an adult. Legal liability transfers to the “adult user.”
-
School data agreements. When schools adopt an ed-tech platform, they often sign data processing agreements on behalf of students. Parents receive a general notification in a school handbook. The consent is institutional, not individual, and parents rarely read the specific data terms.
-
“Legitimate interest” loopholes. COPPA applies to data collected for advertising. Some companies argue that data used for model training falls under “legitimate interest” in improving services — a different legal category. This is contested, and the FTC has challenged it in enforcement actions, but the legal gray zone remains.
For children over 13, there is no equivalent federal protection. COPPA expires at 13. European children have stronger protections under the UK Age Appropriate Design Code (also called the Children’s Code), but U.S. children do not have a comparable standard.
What Parents Should Do
Learn which platforms collect what data — specifically
Don’t try to audit everything at once. Pick the three apps your kid uses most. Go to each app’s privacy settings page and look for: data export (what they have), data deletion (what you can remove), and data collection disclosures (what they acknowledge collecting). Common Sense Media’s privacy ratings are a useful starting resource — they rate hundreds of apps on their data practices.
Have the “you are the data” conversation
This conversation doesn’t require a deep technical explanation. The key concept is simple: every time you interact with an app or device, you’re teaching it something. Your clicks, your pauses, your face, your voice — these all become part of what the system learns. Start with something concrete your kid already understands: “When you skip a song on Spotify, it learns you don’t like that song. When millions of people skip it, Spotify changes how it ranks that song. You’re not just a listener — you’re a teacher for the algorithm.”
Opt out where you can, and explain why you’re doing it
When you go through a privacy settings menu, don’t do it quietly. Do it with your kid watching. Say out loud: “I’m turning off the option that lets them use your data to train their AI, because I want you to decide whether you want to do that — not have it happen automatically.” This models the behavior and teaches the concept simultaneously.
Push back on school ed-tech adoption without disclosure
Ask your child’s school: what student data does this platform collect? Is it shared with third parties? Is it used for model training? You have the right to ask, and administrators have an obligation to know. Many won’t have clear answers — which is itself informative. Connecting kids to the systems that shape their education requires understanding what those systems are doing with their data.
Teach the CAPTCHA moment
The next time your kid encounters a captcha, use it as a teaching moment. “You know what’s actually happening here? You’re training a computer to recognize traffic lights. The company sells that work. You’re doing it for free without knowing it. How do you feel about that?” You don’t need to make it alarming. You just need to make it visible.
Support stronger regulation — and say so out loud
Individual action matters, but the scale of this problem requires policy solutions. The American Data Privacy and Protection Act (ADPPA) has been debated in Congress as a potential federal framework for children’s data. The FTC continues to bring enforcement actions under COPPA. Following these issues and expressing views to representatives is a legitimate form of participation. Kids who hear parents engage with policy questions learn that there are political dimensions to technology — that the systems aren’t just technical, they’re social and legal.
What to Watch Over the Next 3 Years
The regulatory and technological environment here is moving fast.
State-level children’s data laws. In the absence of comprehensive federal legislation, states are moving. California’s Age-Appropriate Design Code Act (AB 2273) passed in 2022 and requires platforms to prioritize children’s interests in design decisions, including data minimization. Texas, Virginia, and other states are following with their own versions. Watch for these to create new parent rights in your state.
Biometric data restrictions. Illinois has the most comprehensive biometric privacy law in the U.S. (BIPA — Biometric Information Privacy Act), which has led to major settlements against Facebook (now Meta) and others. More states are adopting similar frameworks. Biometric data collected from your kids today — faces, voices, behavioral signatures — may be subject to deletion rights under future law.
AI training transparency requirements. The EU AI Act, which began phasing in during 2024, requires AI systems to disclose when they were trained on personal data and from what sources. This transparency requirement may influence global company practices, including platforms your kids use.
If you want to understand how these dynamics connect to the broader question of what AI literacy actually means — and why most parents are missing two out of three levels of it — the framework article on the three levels of AI literacy kids actually need is worth reading next.
Frequently Asked Questions
Does COPPA actually protect my child from this kind of data collection?
COPPA provides meaningful protections for children under 13, but enforcement is inconsistent and there are significant loopholes — particularly around school platforms, age gating, and the “legitimate interest” classification for AI training data. For children 13 and older, COPPA provides no protection at all. It’s a floor, not a ceiling.
Is it legal for companies to use my kid’s captcha responses to train AI?
Yes, currently. The captcha service is framed as an anti-bot verification service. The secondary use — training AI — is disclosed in the terms of service, which almost no user reads. It’s legal but arguably not transparent in a meaningful way.
What’s the difference between data collection for advertising and data collection for AI training?
Both involve collecting behavioral and preference data. Advertising data is used to target content based on inferred interests. AI training data is used to improve the model’s accuracy and recommendations. The distinction matters legally (COPPA applies more clearly to advertising) but less so in practice — the same data often serves both purposes.
My kid is 14 — does that mean companies can do whatever they want with their data?
Under current U.S. federal law, yes, for the most part. COPPA protections expire at 13. This is one of the most significant gaps in U.S. children’s data law, and it’s actively being contested by child advocacy groups and in state legislatures.
How do I explain this to a young child without making them afraid of technology?
Focus on agency, not threat. The message isn’t “the internet is dangerous.” It’s “you get to understand how this works, and that understanding gives you more control.” Kids who feel informed feel more capable — not more anxious.
About the author
Ricky Flores is the founder of HiWave Makers and an electrical engineer with 15+ years of experience building consumer technology at Apple, Samsung, and Texas Instruments. He writes about how kids learn to build, think, and create in a tech-saturated world. Read more at hiwavemakers.com.
Sources
- Federal Trade Commission. (2019). “Google and YouTube Will Pay Record $170 Million for Alleged Violations of Children’s Privacy Law.” FTC Press Release. https://www.ftc.gov/news-events/news/press-releases/2019/09/google-youtube-will-pay-record-170-million-alleged-violations-childrens-privacy-law
- Federal Trade Commission. (2022). “Complying with COPPA: Frequently Asked Questions.” FTC Business Guidance. https://www.ftc.gov/business-guidance/resources/complying-coppa-frequently-asked-questions
- Electronic Frontier Foundation. (2023). “EFF Analysis of Child-Directed App Privacy Practices.” Electronic Frontier Foundation. https://www.eff.org/issues/privacy
- Huckvale, K., Torous, J., & Larsen, M. E. (2021). “Assessment of the Data Sharing and Privacy Practices of Smartphone Apps for Depression and Smoking Cessation.” npj Digital Medicine, 2(1), 1–8. https://doi.org/10.1038/s41746-019-0116-5
- UK Information Commissioner’s Office. (2021). “Age Appropriate Design: A Code of Practice for Online Services.” ICO. https://ico.org.uk/for-organisations/guide-to-data-protection/ico-codes-of-practice/age-appropriate-design-a-code-of-practice-for-online-services/
- State of California. (2022). “California Age-Appropriate Design Code Act (AB 2273).” California Legislative Information. https://leginfo.legislature.ca.gov/faces/billTextClient.xhtml?bill_id=202120220AB2273
- Stoilova, M., Livingstone, S., & Nandagiri, R. (2020). “Children’s Data and Privacy Online: Growing Up in a Digital Age.” London School of Economics and Political Science. https://www.lse.ac.uk/media-and-communications/assets/documents/research/projects/childrens-privacy/Childrens-data-and-privacy-online.pdf