Data validation and management: Back-coding of other sports

Back-coding of other sports

Capturing open data on other sports and activities

In the online questionnaire, respondents were offered a list of about 180 different activities to choose from, many of which included several activities within them (e.g. squash and racquetball) and some of which appeared multiple times to aid respondents in finding them.

Nonetheless, there were still activities which people did over the last year which were not in our list of activities or which they had not found in the list. Also, on the paper questionnaire, only about 50 activities are offered on the list.

Therefore, both the online and paper questionnaire offered a space for respondents to record other activities which they did in the last year and last 28 days and to provide details on the frequency, duration and intensity.

For this data to feed into the main data set, these needed to be coded into the categories of activities. These categories included those provided on the questionnaire as well as a few additional categories for activities which were mentioned but were not in the original list (e.g. high ropes courses).

Coding the answers

A coding scheme was created which included all the answers from the online and postal questionnaires, some additional generic responses (e.g. football where the type was not specified) and new activities not included in the original scheme.

In addition, a code was created for instances where multiple relevant activities were included in one other answer so they could feed into overall composites, and codes were created for arts activities and for activities which were not relevant at all (to ensure they did not feed into the composites).

All the answers were brought together in Excel. A VLOOKUP was used to automatically code any answers which were worded exactly as the code was labelled.

All remaining other answers were manually coded against the code list. All manually coded answers were added to the master look-up list, which was used for a VLOOKUP in later rounds of back-coding.

Over time the master list became longer, and the most common activities could be automatically coded using the VLOOKUP, but any which could not be manually coded.

At the end of the process, all other answers had been assigned a code which indicated which type of activity they were, which could then be used in the derivation of the participation and composite variables.

Once codes had been assigned to all open-ended responses, the coded data was then merged back into the main dataset. To do this, the data was pulled into an SPSS file, then matched back onto the core data by serial and mode.

This combination of matching variables ensured that each case was unique in the coding and the data, thereby making certain cases were matched back correctly.

Once the coded data had been matched back onto the core data, a great deal of care had to be taken to ensure that derived variables correctly captured response data.

If the data were added into raw variables too soon, the derived variables would have been calculated incorrectly.

To mitigate against this, the other back-coded data were treated separately, with derived variables being created for these other variables, as for the standard activity variables.

Data was then back-coded as a very last step in the data-processing. This ensured that the duration measure, for example, was created as the sum of the durations of the pre-coded activity and the back-coded activity.

Any capping and imputation of activity lengths (as described in the following section) was also applied to the ‘other’ codes. This was done by creating the values for each activity, then applying these to the ‘other specify’ codes, per the specific activity mentioned.

Data validation and management: Back-coding of other sports

Back-coding of other sports

Capturing open data on other sports and activities

Coding the answers

Sign up to our newsletter