Gtools

There was a section in my presentation about gtools, which is the suite of commands I coded for working with big data more efficiently. While gtools still has to work within the contraints of Stata, it's underlying code is in C and is much faster than Stata in many places, to the point where some big data tasks are much less of a bottleneck.

The original impetus for gtools was writing a faster collapse. While in Stata 17/MP, collapse has caught up to gcollapse, the package has expanded well beyond this original idea, and thare is a lot of functionality that remains much faster than any other Stata programs available. (Even in the case of collapse, gcollapse offers the merge and merge replace options, for example.)
Please visit gtools.readthedocs.io for detailed documentation and examples. Below I reproduce some of the tables with an overview of how gtools compares to other Stata commands, which you can also find on the official site.

Gtools commands with a Stata equivalent

Function	Replaces	Speedup (IC / MP)	Unsupported	Extras
gcollapse	collapse	-0.5 to 2 (Stata 17+); 4 to 100 (Stata 16 and earlier)		Quantiles, merge, labels, nunique, etc.
greshape	reshape	4 to 20 / 4 to 15	"advanced syntax"	`fast`, spread/gather (tidyr equiv)
gegen	egen	9 to 26 / 4 to 9 (+,.)	labels	Weights, quantiles, nunique, etc.
gcontract	contract	5 to 7 / 2.5 to 4
gisid	isid	8 to 30 / 4 to 14	`using`, `sort`	`if`, `in`
glevelsof	levelsof	3 to 13 / 2 to 7		Multiple variables, arbitrary levels
gduplicates	duplicates	8 to 16 / 3 to 10
gquantiles	xtile	10 to 30 / 13 to 25 (-)		`by()`, various (see usage)
	pctile	13 to 38 / 3 to 5 (-)		Ibid.
	_pctile	25 to 40 / 3 to 5		Ibid.
gstats tab	tabstat	10 to 50 / 5 to 30 (-)	See remarks	various (see usage)
gstats sum	sum, detail	10 to 20 / 5 to 10	See remarks	various (see usage)

(+) The upper end of the speed improvements are for quantiles (e.g. median, iqr, p90) and few groups. Weights have not been benchmarked.

(.) Only gegen group was benchmarked rigorously.

(-) Benchmarks computed 10 quantiles. When computing a large number of quantiles (e.g. thousands) pctile and xtile are prohibitively slow due to the way they are written; in that case gquantiles is hundreds or thousands of times faster, but this is an edge case.

Extra commands

Function	Similar (SSC/SJ)	Speedup (IC / MP)	Notes
fasterxtile	fastxtile	20 to 30 / 2.5 to 3.5	Allows `by()`
	egenmisc (SSC) (-)	8 to 25 / 2.5 to 6
	astile (SSC) (-)	8 to 12 / 3.5 to 6
gstats hdfe		(.)	Allows weights, `by()`
gstats winsor	winsor2	10 to 40 / 10 to 20	Allows weights
gunique	unique	4 to 26 / 4 to 12
gdistinct	distinct	4 to 26 / 4 to 12	Also saves results in matrix
gtop (gtoplevelsof)	groups, select()	(+)	See table notes (+)
gstats range	rangestat	10 to 20 / 10 to 20	Allows weights; no flex stats
gstats transform			Various statistical functions

(-) fastxtile from egenmisc and astile were benchmarked against gquantiles, xtile (fasterxtile) using by().

(+) While similar to the user command 'groups' with the 'select' option, gtoplevelsof does not really have an equivalent. It is several dozen times faster than 'groups, select', but that command was not written with the goal of gleaning the most common levels of a varlist. Rather, it has a plethora of features and that one is somewhat incidental. As such, the benchmark is not equivalent and gtoplevelsof does not attempt to implement the features of 'groups'

(.) Other than the dated 'hdfe' command, I do not know of a stata command that residualizes variables from a set of fixed effects. The 'hdfe' command, as far as I can tell, morphed into the 'reghdfe' package; the latter, however, is a fully-functioning regression command, while 'gstats hdfe' only residualizes a set of variables.