Stage 2 is now well under way, in the first month of testing we’ve got complete datasets on about 50 children (or 10% of our target sample). Once again, schools in Surrey are demonstrating just how fabulous they are. They are doing a stellar job of getting consent forms returned from families and have welcomed us with open arms. Some have even asked if we’ll be doing the screening programme again this summer, though I do pick up a sense of relief when I tell them that is most definitely not on the cards!
One thing teachers have been asking me is how we’ve selected the cohort for detailed assessment from the initial pool of 7532. This is an excellent question and one that occupied quite a bit of head space this summer. It also generated huge debate amongst the SCALES team, so I thought I’d share a little bit about the process.
The first decision we needed to make was whether we would exclude any cases before we selected the main cohort. Here we ran into some competing aims of the project. In the first instance, we want to have a truly representative picture of the population of children starting school. In that case, we don’t want to exclude anyone. However, the longer term aim is to follow up children with primary language impairments. In that case, it could be difficult to interpret findings if our ‘high-risk’ group to consist primarily of children with existing diagnoses of other developmental disorders, or significant sensory impairments, or those who speak English as an additional language (EAL). (You may be interested to here that ~800 children in our sample were reported to speak additional languages). We also have a very practical consideration – the children need to be able to take part in our further assessments, otherwise we are wasting everyone’s time.
In the end, we made two exclusions. We separated children with EAL from the main cohort so that we could sample them separately. We are seeing a proportion of these children now, but are also applying for some additional funding to see more of them at a later date. The second decision was a little more difficult. To our astonishment, 1% of the sample were reported to have ‘no phrase speech’ at the end of their first year in school, and half of them were in mainstream classes! As this yielded the maximum score on the screen, quite a number of these children could have been in the final cohort. We want to know about them in lots of detail, but decided to see them all as a separate group, rather than just see a proportion of them in the final cohort (this will be yet more work for us, but is also completely fascinating to me).
The next thing to do when you have lots of data is to make a graph of it. This is a really useful way of spotting obvious errors in the dataset. You really hope you don’t end up with a distribution that looks like this, because it suggests something strange is going on. Instead, one hopes for a normal distribution (remember our bell shaped curve?) or perhaps one that is slightly skewed, assuming that most of the children we screen will have no difficulties whatsoever.
Now the next decision we need to make is where to draw the cut that will define the ‘high-risk’ group. This is a pretty arbitrary decision and a tricky one – make the cut too generous (say bottom 15%, or one standard deviation below the mean) and you are more likely to identify children with mild or transient difficulties. Make the cut too severe, and you are likely to end up with children who do have persistent language impairments, but also probably have more complex difficulties and global developmental delays.
We began by taking a cut at about 12%; now obviously for a screen it is simpler if you can just use a single cut-off score but it was quickly obvious that this was going to be problematic for us. For a start, there were big gender differences – twice as many boys were identified as being at-risk than girls (though interestingly when we applied that cut-off score to the EAL children there were no gender differences). The next shocker was that almost half of the children in the high risk group were summer born. We’d jokingly talked before about the ‘summer born boy’ phenomena, but there it was and we couldn’t ignore it.
So basically we had to create six mini-populations taking account of gender and season of birth. Then we looked at the distribution of scores within each of those six cells and used a cut-off score that would identify 14% of each of those groups as being ‘high-risk’.
We also had to make a decision about gender ratios in the final cohort. Like most developmental disorders, boys outnumber girls in clinical samples. And as I mentioned, this seemed to be the case in our sample as well (though interestingly, in the Iowa population study, boys only marginally outnumbered girls in the high-risk sample). So we could maintain those gender ratios in our final sample, but there is very little research on girls with language impairments or how gender affects clinical presentation of language impairment. So we decided to over-sample girls. This simply means that we elected to have equal numbers of boys and girls in our sample,regardless of what the proportion of boys is in the final high-risk sample. When we do our analyses, it is possible to ‘weight’ cases so that the results yield a truer estimate of what would be expected in the population. The advantage of doing it our way is that we should have sufficient numbers of both genders to say something sensible about possible sex differences.
We then had a few discussions about socio-economic status (SES). There is some evidence that children from more deprived backgrounds are more likely to have difficulties with language and communication. It is therefore possible that our two risk groups might differ significantly on SES; we could try to ensure that this didn’t happen by matching groups on an SES variable. However, if SES is reliably associated with risk status, then matching for SES might eliminate some important differences that are worth exploring in more detail. We therefore decided not to adjust for SES but rather to see where the cases fell. We do have SES information on everyone, but have not looked at it yet in any detail.
To get the final numbers, we also had to guesstimate what our response rate might be. We want to assess 500 children – 300 high risk and 200 low risk, but we know that not everyone we invite will take part. Some families will have moved, some will not want to be involved and still others will just not return consent forms (I too am terribly remiss at looking in by daughter’s book bag and sending slips back to school!). So we need to invite more people than we want to see. On the other hand, when parents agree to take part they have a reasonable expectation that we will then see their child. And given the level of co-operation and enthusiasm that exists in Surrey, we were fairly confident that schools would be pro-active in ensuring parents got the information and followed up on it (we were right about that!). So we decided to invite just over 600 children in the hopes that 500 of them will say yes.
Having made all of these decisions, my head hurt! The wonderful Andrew Pickles (world’s greatest statistician) then assigned everyone we screened a random number, sorted them, and took a percentage from every possible group (e.g. summer girl, high risk; winter boy, low risk) to ensure the right numbers for the final 600. To further eliminate any testing biases, all of the schools were then randomly assigned to one of six testing blocks such that there would be ~100 children to assess in each block (which corresponds to a 6-week half-term). Within each block we book schools as and when and they invite the children on our behalf.
So there you have it. Once again, other people may have taken a different approach, but this is how we did it. It has been a very rigorous process of debate, weighing up pros and cons, looking up existing evidence and then hoping for the best. And as always, the screening data have raised several new questions and possibilities that we will hopefully be able to follow up in such a rare and wonderful sample. But for now, I must get some sleep as I am in school this week and don’t have time to tell you about the decisions we took in planning the final test battery….