Data management plan for data on adults with Autism and ADHD
Until the publication of the DSM-5 in 2013, Autism was an exclusionary criterion in making the diagnosis of Attention Deficiency Hyperactivity Disorder (ADHD), and the two diagnoses could not be made concurrently. The link between ADHD and Autism has not been fully researched yet.
Prior research has suggested as many as 30 to 80% of patients with Autism may meet criteria for ADHD [i],[ii]. Many individuals with ADHD also show high level of social deficits and ASD type symptoms 1,[iii] . Conservative figures shared by UK Adult ADHD Network are that 40% of people with autism have ADHD and 20-25% of people with ADHD have autism (especially if their IQ is normal or high).
ADHD is a massive public health problem. Poorly managed ADHD leads to low educational achievement, drop out of mainstream education and/or pupil referral centre, leading to accidents, juvenile offending, teenage pregnancies, smoking, drug, alcohol and substance abuse, gambling, and early death. Nevertheless, ADHD is not mentioned in the NHS long term plan, but mental health, suicide, autism and learning difficulties are.
Currently, NHS England collects data on Autism but not on ADHD.[iv] Nevertheless, there is not one database which lists all the people in the UK who have autism or indeed intellectual disability. The Learning Disability Mortality Review (LeDeR) programme was established in 2016 following the Confidential Inquiry into premature deaths of people with learning disabilities (CIPOLD). A study by LeDeR and CiPOLD looked on mortality of people with intellectual disability and found that the number of people with intellectual disability held in GP records is an underestimate (missing data). The data was derived from the Clinical Practice Research Datalink (CPRD) and the Office for National Statistics (ONS) deaths of anonymised records of several hundred general practices (GP) in the UK and spanned the immediately previous 4 year period, April 2010-2014. The CPRD covers 5% of the population of England (2.8 million people- 11.6 million person years). A total of 15,000 intellectual disability patients were identified-those who had been recorded on GP databases as having intellectual disability, as well as an additional 10% who although not being labelled as having intellectual disability had other identifiers which identify them as having intellectual disability -by definition of some of their associated physical health diagnoses). Furthermore, there are also issues with geographical distribution of data as people with intellectual disabilities may be living in more deprived areas. Finally there are huge waiting lists for autism diagnostic services (2-3 years) meaning that the data we collect will be a big underestimate of the prevalence and lag behind true time stamps.
The impact of my project will be to identify
- Prevalence of Autism in adults
- Prevalence of ADHD in adults
- Comorbidity (symptoms or diagnosis) of Autism and ADHD in adults with either condition
Further goals could be
- Time delay between referral and diagnosis of Autism
- Time delay between referral and diagnosis of ADHD
Data profiling and quality
The data required to identify the adult population with autism, will be collected from clinical data sets (GP electronic patient records, psychiatry electronic patient records held by the mental health trusts and social services) and clinical coding data (LeDeR- review of deaths of people with intellectual disabilities, CIPOLD- Confidential Inquiry into premature deaths of people with intellectual disabilities, MHSDS- Mental health services data set, CPRD- Clinical Practice Research Datalink and ONS- Office for National Statistics). Autism cases are defined by the presence of ICD-10 codes beginning with F84. The information from social services could determine people who use services for autism eg community centres or residential homes.
All activity relating to patients of any age who receive care for a suspected or diagnosed mental health and wellbeing need, Learning Disability, autism or other neurodevelopmental conditions is within scope of the MHSDS. The MHSDS re-uses clinical and operational data for purposes other than direct patient care, for example: commissioning, service improvement and service design. It defines the data items, definitions and associated value sets extracted or derived from local information systems.
The data required to identify adult population with attention deficit disorder will need to be collected from the same datasets however again there may be data missing. ADHD cases are defined by the presence of ICD-10 codes beginning with F90. There are huge waiting lists for ADHD diagnostic services and some people choose to seek private diagnoses which are not necessarily communicated to their GP and therefore are not coded. One possibility is to look at prescribing databases, to collect information on prescriptions of ADHD-specific medications, which by definition will have required a diagnosis of ADHD. There are many people with substance disorders or with offending behaviour who suffer from undiagnosed ADHD. Again, due to the waiting lists there will be a time delay and missing data.
Real world data will be used, and free text entries have to be collected as well as codes. As busy clinicians may not consistently code the diagnosis, rather write free text in patient notes, data sets need to be examined to extract all the relevant data. Data will be findable, with search keywords.
All the codes referring to different historical names of autism must be collected- autism, autism spectrum disorder, autism spectrum condition, Asperger’s, Kanner’s, communication difficulties, pervasive developmental disorder not otherwise specified etc. All the codes referring to different historical names of ADHD must also be collected- Attention Deficit Disorder, hyperkinetic disorder, Disorder of Attention, Memory and Processing. Due to the long waiting lists, the free text entries making reference to waiting for assessments or historical -childhood – diagnoses need to be collected. It is possible that information from private diagnostic services (for autism and for ADHD) can be requested via freedom of information act. Finally, as these neurodevelopmental conditions are heritable, there may be an argument for examining family members’ records for the sake of completion.
In order to increase the scope of the project from national to international, a federated inquiry to the network of multiple hospital disease registries and other data providers associated with the European Health data evidence network (EHDEN.EU) will be posed. This will be queried at each distribution site and only the results (frequency distribution ie. patient numbers) will be returned. There may be a need to consult colleagues who are fluent in the languages of the countries in question. The standard set of the International consortium of health outcomes measurements (ICHOM.ORG) has a standards set on mental health – autism and neurodevelopmental disorders. This will be obtained and data will be collected about symptoms and diagnoses of ADHD among adults with autism and vice versa.
Data transformation and processing
As outlined above, the various data sets will need to be combined in order to achieve a population of people with autism, as individual data sets will be missing some data. After, the data sets will need to be combined again this time looking at clinical codes of ADHD (see above for the various codes). A specific purpose for processing the health data is to make a detailed analysis of patterns of “missingness” in data (the manner in which data are missing from a sample of a population) and to make suggestions for how missing data can be completed.
Data transformation involves extracting it from the Mental health services data set, the GP electronic patient records, the psychiatry electronic patient records held by the mental health trusts, local authority adult social services, LeDeR- review of deaths of people with intellectual disabilities, CIPOLD- Confidential Inquiry into premature deaths of people with intellectual disabilities, Clinical Practice Research Datalink (CPRD) and the Office for National Statistics (ONS)].
Data will first be checked to ensure that there has not been accidental identifiable data provided in any of the tables. Data tables will then be reviewed to ensure that they are complete and usable. Data will be checked for field completion, and if they are not well populated, a new request for data will be made. In order to ensure data conforms to the defined schema, field names will be standardised and data will be cleansed. Corrupt or irregular data will be removed.
The type, format, range of values of data tables will be checked. Data will be cleansed, ie identify and remove duplicates, incorrectly named or incomplete fields, and join together split data. Data will be mapped and grouped into clinically relevant groups to improve the analysis. Data will be reshaped into a SQL database, ready for visualisation. A data model link will be created which will allow interrogation of the data stored in the transformed data tables.
Ethics, governance and security
Prior to any data processing, appropriate and compliant data sharing documentation will be drafted and signed with the participating organisations. Before transforming the data, it must be replicated to a data warehouse architected for analytics. Most organisations choose a cloud data warehouse which is a lot faster than on-premises data warehouse. Data shared from an organisation’s system will be stored in a private, encrypted, access-restricted UK or international data server. All data will be de-identified, so no NHS numbers, names, postcodes or dates of birth are included and no one can identify the data even if they are directly involved with the data.
Information governance documentation will reflect the data specification and clinical validation of the extracted data. All the appropriate information governance will be respected and authorisation will be sought from all organisations involved in this project. Data request forms will be created and a secure link will be created to UK or International data servers.
Data will be accessible, and there will not be a need for a data access committee (to evaluate / approve access requests). Data will remain available and will be updated monthly. It is important to make data interoperable for research and for clinical care, and that versions are tracked.
In the international, federated query, as no data is moved outside its registry, this would be compliant with GDPR (general data protection regulation) and respects the privacy and security of personal information.
Data storage and architecture
There would be limitations with a batch-processing Extract-Transform-Load (E-T-L) approach because this would be time-consuming, too rigid to support ad hoc workflows, too brittle (especially in NHS legacy systems) and opaque. An ideal architectural infrastructure would allow flexibility to add new parameters (eg new source of data), transparent (accessible on a mental health dashboard), and would not break if steps are altered.
The E-L-T approach is notable for its speed and for having change data capture (CDC), a technology for replicating changes to data as they occur. This automates the Extract and the Load portions of the data pipeline. It supports machine learning operations (MLOps), the continuous delivery and automation of real-data data pipelines using machine learning and CDC technology. The T (transformation) is handled by Data warehouse automation (DWA) which, being automatic, eliminates human error and automatically designs any structures necessary.
Conclusion
The scope of my project is to determine the prevalence of comorbid ADHD among adults with Autism, and vice versa (prevalence of comorbid Autism among adults with ADHD) on a national UK or international level. Data will need to be profiled, transformed, processed, interrogated, stored in an appropriate E-L-T system. The goal will be to make data findable, accessible, interoperable and re-usable.
[i] CADDRA CADDRA – Canadian ADHD Resource Alliance: Canadian ADHD Practice Guidelines, 4.1 Edition, Toronto ON; CADDRA, 2020. p. 27
[ii] Rong Y, Yang C-J, Jin Y, Wang Y. (2021) Prevalence of attention-deficit/hyperactivity disorder in individuals with autism spectrum disorder: a meta-analysis. Res Autism Spectr Disord 83:101759. doi: 10.1016/j.rasd.2021.101759 accessed 8/12/22
[iii] Hours C, Recasens C, Baleyte J-M (2022) ASD and ADHD Comorbidity: What Are We Talking About? Front. Psychiatry 13:837424. doi: 10.3389/fpsyt.2022.837424 accessed 8/12/22
[iv] https://digital.nhs.uk/data-and-information/publications/statistical/autism-statistics/quarter-3-october-to-december-2020-21 accessed 8/12/22
[v] https://digital.nhs.uk/data-and-information/data-collections-and-data-sets/data-sets/mental-health-services-data-set/about accessed 8/12/22
[vi] https://digital.nhs.uk/data-and-information/publications/statistical/mental-health-services-monthly-statistics/performance-december-2020-provisional-january-2021#related-links accessed 8/12/22
[vii] Slaby I, Hain HS, Abrams D et al (2022) An electronic health record (EHR) phenotype algorithm to identify patients with ADHD and psychiatric comorbidities. Journal of Neurodevelopmental disorders 14:37 https://doi.org/10.1186/s11689-022-09447-9 accessed 12/12/2022
12/12/2022 copyright
- A few definitions of ADHD and autism
- About me
- Adult ADHD examples
- Adults with undiagnosed ADHD
- Adults with undiagnosed Autism
- Autistic women and girls
- Blog
- If
- Importance of diagnosis
- Key researchers
- Offending and ADHD
- Privacy Policy
- Professionals with Autism, ADHD or dyslexia
- Terms of use of the website
- Useful links about ADHD
- Useful links about Autism
- Useful links about dyslexia and neurodivergence
- Useful links about mental health for young people