Project: Programming Language C++
Author: Andrew Tomazos <firstname.lastname@example.org>
Abstract: We parsed 4,689,316,529 C/C++ tokens from 2,566,989 C/C++ source files taken from 11,423 open source packages of a popular Linux distribution. For each of the 50,325,647 distinct token spellings, we counted the number of occurrences, and output these tokens and counts into a single data file. We make that data file available for download as the ACTCD16 dataset.