Skip navigation

Credits

Methodology

Research design for this year’s Octoverse combined both survey and telemetry data to triangulate insights into how developers write code and build community. Our two datasets - survey and GitHub telemetry - were not linked to protect the privacy of developers; our theory-based predictive model was tested with surveys and GitHub developer behavior patterns independently.

Survey design

Our survey design was theory-based and cross-sectional. Our target population was developers who use GitHub, and we advertised the survey broadly through social media and an in-browser banner. No personally identifying information was collected, and responses were not linked to GitHub data.

Survey responses were evaluated using latent constructs. Several survey questions were used to measure each practice or outcome using Likert-type questions, then evaluated for validity and reliability using standard methods; the constructs exhibited good psychometric properties. We then evaluated the hypothesized model using Partial Least Squares, a method appropriate for evaluating predictive relationships in exploratory research. All paths were statistically significant at the p<0.01 level or stronger. Structural equation models were evaluated using complete responses only, which included over 12,000 responses.

GitHub telemetry

The data reported for the telemetry analysis includes three types:

Open source

The data for this section comes from analyzing all GitHub activity-public (including open source) and private-year over year. The period of comparison is October 1, 2018 through September 30, 2021 when three year comparisons are shown; for single year time periods, the analysis covers October 1, 2020 through September 30 2021. The geographic distribution of active users included in the analysis is shown in the demographic section. We analyzed 1.2M repositories in this category.

At work

This data comes from analyzing paid organization accounts that meet the following criteria:

  • Created before October 1, 2018 with activity each month through September 2021
  • On a paid Team or Enterprise Cloud account

To allow for easier year-over-year comparisons, we normalize our analysis using per-user figures, unless noted otherwise. Only aggregate, anonymous data is reported. More than 37,000 organizations are included in our analysis, with demographic information reported in the section above. We analyzed 2.7M repositories in this category.

Open source at work

This data comes from analyzing open source repositories owned by paid organization accounts that meet the following criteria:

  • Created before October 1, 2018 with activity each month through September 2021
  • We analyzed 100K (0.1M) repositories in this category.

Acknowledgements

The authors would like to thank several people who contributed to this year’s research. All contributions are listed alphabetically by type of contribution.

The authors would like to thank several people who contributed to this year’s research. All contributions are listed alphabetically by type of contribution.

Authors: Nicole Forsgren, Eirini Kalliamvakou, Alyss Noland

Data scientists: Liz Redford, Shonte Stephenson

Subject Matter Expert reviewers: Greg Ceccarelli, Emily Gould, Jez Humble, Chandra Maddila, Caitie McCaffrey, Janice Neimer, Sarah Novotny, Ryan J. Salva, Dustin Smith, Heidi Waterhouse, Ashley Willis

Copyeditor: Cheryl Coupe

Designers: Siobhan Doyle, Gemma Busquets

Special contribution: Peter Cihon