Once the derived variables had been created, a variety of checks were performed to ensure that they had been calculated correctly.
Checking activity and sports measures
The main activity-related derived variables are created for multiple composite activities (as described in the previous section).
Initially, checks were carried out at the activity level. Firstly, the hard logic of the syntax used to derive each measure was checked against the specification.
Next, selected activities were tested by cross-tabulating the raw (source) variables against the derived variables to confirm that the data matched as it should.
Finally, multiple scenarios were tested to ensure that more complex questionnaire responses did not ‘break’ the routing.
This included checking cases where data for specific input variables were missing, out of range or had been imputed or capped. In addition, checks were conducted to ensure that other answers were feeding into participation measures correctly.
The composite sports variables were only created once it was confirmed that the individual activity variables had been derived correctly.
Checks were also needed to ensure that the correct activities fed into each composite, which would then be used for multiple participation variables.
Primarily, the SPSS syntax was checked against the specification (which was itself checked and signed off by the Sport England team) to ensure that composite variables were defined correctly.
It was not possible to check every single participation measure for each composite. Instead, for one participation measure (12-monthly participation), all the composites were checked to ensure that all cases where a participant had mentioned an individual activity were counted towards the composite variable.
Then for a single composite, all the participation measures were checked to ensure they had been created correctly.
Comparisons were made between different participation measures to check that the way in which they related was consistent with how they had been defined. Where inconsistencies were found, these were investigated, and corrections made.
Where problems were found, the syntax was corrected, the variables recreated, and the checks repeated to ensure that the final data were correct.
Checking demographic variables
Demographic variables were checked primarily by cross-tabulation of the raw variables against the derived variables.
A sense check was applied to variables to ensure that the frequencies ‘looked’ right – for example by checking IMD quartiles against local authority.
Finally, the demographic variables were checked against each other to ensure that they were internally consistent.
This included checking that age bands tallied across variables and that derived variables which used the same source data contained the same number of valid responses.