GH Archive
Records the public GitHub timeline, archives it, and makes it accessible for further analysis. Updated hourly. Available on Google BigQuery as a public dataset.
Public GitHub timeline data Β· As of March 15, 2026
GH Archive records every public event on GitHub β pushes, pull requests, issues, stars, forks, releases, and more β and archives them for analysis. The dataset spans 2011 to present, updated hourly, and exceeds 17 TB on Google BigQuery (first 1 TB/month free). Below we explore language trends, developer behavior patterns, open source ecosystem health, and the explosive growth of AI/LLM tooling β all derived from this public timeline data.
New public repositories by primary language. Based on CreateEvent counts from GH Archive. Use the year tabs to see how the landscape shifted.
GitHub stars and new repos using each framework. Growth is YoY based on PushEvent and CreateEvent volumes.
Weekly aggregate of PushEvents, PullRequestEvents, and IssuesEvents. Tuesday is the most productive day globally; weekends drop ~55%.
Repo abandonment rates, contributor concentration, and the gap between stars and actual usage (fork/clone ratios). Derived from PushEvent recency and ForkEvent / WatchEvent ratios.
GitHub stars trajectory for key AI/LLM projects. Ghost bar shows Q1 2023 baseline; filled bar shows latest (Q1 2025). Sorted by current stars.
Every public event across all repos on GitHub. Sampled from 18 real gharchive.org hourly files (3 per month), scaled to monthly estimates. No keyword filtering β this is the full public GitHub timeline.
CreateEvent ref_type=repository from their public Events API around Nov 2025 β new-repo creation can no longer be counted from GH Archive for later months. Branch creation activity (proxy) is shown below.Source: gharchive.org β 18 hourly snapshots (3 per month), scaled Γ720 Β· generated 4/5/2026
Sampled from 18 real GH Archive hourly snapshots (gharchive.org), streamed and parsed locally, scaled to monthly estimates. Tier 1 = repos named with ai-agent, mcp-server, crewai, langgraph, autogen, multi-agent, agentic. Broad AI adds langchain, ollama, rag, claude-, gpt-agent.
Source: gharchive.org β 18 hourly snapshots sampled from real GH Archive files, scaled Γ720 β generated 4/5/2026
Share of all GH Archive events by type. PushEvents dominate at ~42%; the long tail includes member additions, wiki edits, and more.
Records the public GitHub timeline, archives it, and makes it accessible for further analysis. Updated hourly. Available on Google BigQuery as a public dataset.
The underlying events API that GH Archive records. Provides 15+ event types: PushEvent, PullRequestEvent, IssuesEvent, WatchEvent, ForkEvent, and more.
Alternative ingestion of GH Archive data via Cybersyn on Snowflake Marketplace. Reportedly more reliable ingestion than BigQuery for some workloads.