ARXIV
AI BUCKET

Primary category mix inside the AI surge · cs.LG, CV, CL & allies

Published March 27, 2026

6Categories

2015–25Window

PrimaryRule

The headline story is well known: AI-related submissions on arXiv grew fast. The more interesting question for practitioners is which lanes inside “AI” carried the mass — machine learning (`cs.LG`), vision (`cs.CV`), language (`cs.CL`), narrow AI (`cs.AI`), neural computing (`cs.NE`), and statistics-side ML (`stat.ML`). Below: mix within a defined AI bucket (each paper counted once by primary category), plus how that bucket sits against all CS primaries. All-arXiv yearly totals use the official monthly submissions CSV from arxiv.org/stats/get_monthly_submissions, summed by calendar year.

Counts use each paper once by primary category. AI bucket = cs.LG + cs.CV + cs.CL + cs.AI + cs.NE + stat.ML. · All-arXiv yearly totals are summed from arXiv’s official monthly submissions CSV at arxiv.org/stats/get_monthly_submissions. · Data cut: 2026-03-27 · All-arXiv yearly totals: monthly submissions (CSV)

2026 YTD · all arXiv73,322Jan–Mar 2026 · from monthly CSV (partial year; not comparable to full-year chart rows)

AI bucket (primary)9.8K → 51K418% growth 2015–2025

cs.LG share of bucket35.5% → 40.5%Machine learning primary codes

cs.CL share of bucket15.5% → 22.6%NLP / language

All arXiv submissions105K → 284KOfficial annual totals (sum of months)

What each label means

Codes are arXiv primary subject classes. Hover or tap a color in the charts below for counts; click to open the drill-down panel with links to live listings.

—Machine learning: Core ML: supervised/unsupervised/RL, methodology, robustness, fairness — general learning papers and many ML applications.
—Computer vision: Images/video, recognition, segmentation, scene understanding — the vision side of “AI” on arXiv.
—Computation & language (NLP): Natural language: models, retrieval, speech/text, NLP benchmarks — often where LLM-era work lands.
—Statistics — ML: ML with a statistics framing (same research universe as cs.LG, different archive).
—Artificial intelligence (narrow): Classic AI topics (planning, KR, search) — excludes ML/NLP/vision, which have their own codes.
—Neural & evolutionary: Neuro-inspired and evolutionary algorithms, neurodynamic models, related non-mainstream ML threads.

Mix within the AI bucket

Stacked bars show how primary submissions split across six AI-related categories each year. Width is proportional to papers in that category. Hover a segment for the tooltip; click to dig deeper.

2015

9.8K

2016

14K

2017

18K

2018

24K

2019

30K

2020

36K

2021

38K

2022

37K

2023

40K

2024

44K

2025

51K

2015 vs 2025 — share of the AI bucket

Normalized to 100% within the bucket. Hover a row for the tooltip; click the label or bar to dig deeper.

2015

35.5%

31%

15.5%

6.1%

7.2%

4.7%

2025

40.5%

24.1%

22.6%

6.9%

3.6%

2.3%

AI bucket as % of all CS primaries

Roughly what fraction of computer-science submissions (primary) fall into these six categories combined — illustrative totals aligned with the same JSON.

2015

37.6%

2016

47.3%

2017

55%

2018

60.8%

2019

64.9%

2020

67.4%

2021

68.5%

2022

68.7%

2023

68.2%

2024

67.3%

2025

67.3%

Data Sources

arXiv OAI-PMH

Bulk metadata harvest (preferred for an up-to-date mirror). Categories follow the public taxonomy; each paper has a primary subject class.

OAI guide →Category taxonomy →

arXiv reports

Official submission statistics by category and year — useful for cross-checking aggregates.

Submissions by category →

Monthly submissions CSV

All-arXiv yearly totals are the sum of calendar months from arXiv’s official monthly submission statistics; download the CSV from the link below.

Monthly submissions CSV →Bulk data overview →