This transcript has been created automatically and may contain small mistakes. Ansgar Lange: Good morning everybody! My name is Ansgar and I'm pleased to have our founder and CTO on the call, Rocío Acuña Hidalgo. She has a background as a medical doctor and a Ph.D. in human genetics and over a decade of experience in genomics technologies and clinical genetic analysis. And I think she will be talking in particular about how we are using white-box models for variant interpretation. So, the floor is yours and I am looking forward to your session! Rocío Acuña Hidalgo: Good morning and thank you for joining this morning's webinar. My name is Rocío, and I'm the chief technology officer and co-founder of Nostos Genomics. Today I will be speaking about how we can use white-box artificial intelligence models for variant interpretation and how AION, our platform for AI-driven variant interpretation, can help you. I will start with an outline of the different topics that we will be talking about today. Firstly, I'll give some background on genetic testing and variant interpretation. I'm sure that most people here are familiar with this, but I just want to give some background too, to make sure that we're all on the same page. I'll speak about the current applications of artificial intelligence and machine learning in health care. And finally, I'd like to bridge both topics by sharing insights into how machine learning can help with variant interpretation. So let's begin with genetic testing and variant interpretation. Well, we all know that DNA shapes our lives. And for millions of people around the world, this means that they live with a health condition that is caused by variance in their DNA. And while there are many different disorders in which genetics play a role, during this talk I will be focusing on monogenic diseases, which are those that are caused by variance in a single gene, and I will be referring to them as genetic diseases. So first off, there are more than 350 million people worldwide who live with a genetic disease, and this is close to 5% of the world population. So even though we think of genetic diseases as individually rare, collectively, they're actually quite common and they represent quite a significant burden on quality of life and health for the world population. So we currently know of more than 7000 different genetic diseases that range from neurodevelopmental conditions, blindness, deafness, skeletal disorders, inherited cancer, and neurodegenerative diseases. So it's a very wide array of different diseases, very heterogeneous conditions, and they affect people of all ages. But an important point here is that a significant proportion of patients with genetic diseases are children. And a third important point is that on average, the journey for a person who has a genetic disease from the very first symptoms until the moment that they receive a diagnosis is on average 6 to 8 years. So this includes often multiple visits to medical doctors and specialists and having different clinical testing performed. And of course, this all comes at a very high personal cost for patients, and very high financial costs for health systems, as time goes on and the quality of life of the patient decreases as well. Genomic medicine is an emerging medical field in which a person's genetic information is used to guide their clinical care. And this field is emerging mainly thanks to developments in sequencing technologies, which paved the way for this type of medical care to repair and have also substantially transformed the field of genetics. So the main point here, I think, is that as sequencing technologies have developed and the cost of sequencing has been dropping continuously for the last 15 years, genetic tests like exome sequencing or genome sequencing have become increasingly accessible. First in a research setting, and then later on as a routine diagnostic test, as they are being used now in the clinic. As a result of these technological developments, there are thousands of new disease gene entities that have been established really over the last 20 years. And this has changed the way that we understand how genetic variation can shape our health and how it can contribute to disease. This is one side. On the other side, through the increase in sequencing, there's been an exponential increase in the amount of genetic data available. This includes genetic variants that have been sequenced both in healthy and affected individuals. So databases such as gnomAD, such as the 100,000 Genome Project, or ClinVar have highlighted the range and the diversity of the genetic variation that we see in humans and have really pointed out the challenge that we have currently in linking genetic variants to clinical outcomes. When we speak about sequencing, actually sequencing is just the very first step in genetic testing. After sequencing, the data that is produced by the sequencing machine undergoes primary and secondary analysis, including base calling, read mapping, and variant calling to generate a VCF file. And here then this moves on to the tertiary analysis, which is variant interpretation. This is a step that takes a VCF file as input, which is essentially a list of all the genetic variants that have been identified in the sample. So for an average exon, this is about 20,000 variants. And for an average genome, this will be 3 to 4 million variants. In this step, a human expert will take the information that is available for each band to decide whether this variant is potentially disease-causing for the individual that is being analyzed. And the information can come from different sources. It can come from human data, molecular data, or computational data. Basically, this information is put together, integrated, and interpreted according to guidelines such as those from the American College of Medical Genetics, ACMG for short, aiming to classify the variants either as pathogenic, therefore disease-causing, or benign, and therefore not relevant to a patient's phenotype. What often happens, though, is that there's not enough data that is available to make a confident decision as to whether a variant is pathogenic or benign. And so we end up with this third category, which is the variant of unknown significance or VUS, which, unfortunately, it's quite a big category in which most variants end up. So this process is quite complex. It's a manual process and it takes on average ten to sixteen hours of the variant analyst time per case. The reason this takes so long is, first of all, that variant interpretation requires filtering out and assessing a really large number of variants. As I mentioned, tens of thousands for an average exome and millions for an average genome. The goal of this process is really to sift through all these variants to identify just one or two, or a handful, that are clinically relevant for patients. So an image that is often evoked to represent this process is looking for a needle in a haystack, although some people have said that it's more like looking for a needle in a stack of needles because it's very difficult to try to identify the signal from the noise in this data. The second reason why this process takes so long is that most of the genetic variants are either not classified at all, or they're classified as variants of unknown significance. So to get a sense of this, for example, the latest version of gnomAD contains more than 600 million short nucleotide variants in this database, whereas we only have 1.2 million variants classified in ClinVar, out of which half of them are of unknown significance. This shows that less than one in a thousand of all the variants that have been observed in humans are currently classified in public databases. So this means that a variant scientist can't just simply look up a variant classification in a database, but need to go through the process of classification for a large number of variants for each sample. So this means that variant interpretation is a largely manual process. It requires expert knowledge to be carried out, and it's extremely laborious. Unfortunately, variant interpretation is also critical to the success of the test. On average, we see that between 30 and 40% of patients undergoing genetic testing receive a clear molecular diagnosis. Although this diagnostic yield depends on each individual disease and each individual lab where the test is being carried out. So what this means is that about 70% of patients who undergo genetic testing receive either negative results or inconclusive results. And there are many different reasons why there's such a high number of negative results. But one really important factor in this is also the time that one has available to spend on a single case to try to identify the causal variants for each individual patient. This wraps up our introduction, and I'd like to change gears a little bit and move to our next topic: current applications for machine learning in health care. I would like to start this topic, first of all, by making a distinction between the terms artificial intelligence and machine learning, because we see that they're often used interchangeably. Artificial intelligence, or A.I. for short, is the broader discipline in which software or a computer performs a task that is usually done by a human because it requires a certain level of intelligence. So it's the bigger field, and we've represented it here [on the slide] as this larger circle that contains three subdomains, which are machine learning, computer vision, and natural language processing. Today we will be focusing mostly on machine learning. But I do want to touch on the other two types of artificial intelligence and give some examples because they have really exciting applications in healthcare currently. Machine learning, or ML for short, is in very simple terms the field of AI in which the aim is to teach software or a computer to perform a task by having it learn from patterns in data that would provide it with computer vision, in very simple terms, is the field of AI in which one aims to teach software or a computer to interpret and to understand the visual world through image processing. Natural language processing, NLP for short, is a subdomain of artificial intelligence in which we aim to teach software or a computer to interpret and process human language. These three subdomains of artificial intelligence have their own characteristics. But of course, the distinction is not always as clear-cut as we show in this diagram [on the slides], there certainly is a lot of overlap between the different fields. One example that I can give for this is, for instance, self-driving cars. Self-driving cars use tools from machine learning by teaching, for example, the car how to drive autonomously. They use tools from computer vision by taking images with cars, for example, out in the streets, and decoding what is happening in a visual scene. And it takes tools from natural language processing when, for example, the car is understanding spoken instructions from the driver. So there's a lot of overlap in how these tools are being used in different applications. But if we bring this back to today's topic, I do want to dive a little bit into some examples of the use of artificial intelligence for healthcare applications. One very exciting application for machine learning is in terms of predictive modeling. Here, ML takes from some data that is provided from patients' clinical or molecular characteristics, making predictions that can be for diagnostic purposes or prognostic purposes. One tangible example of this would be, for example, for a patient who has cancer, to use a machine learning algorithm to make a prediction based on this patient's clinical features, sex, their age and integrate information from molecular tissue imaging characteristics of their tumor. Taking all of this information together to predict under which drug treatment the patient will have the best response. And then based on this design, a personalized recommendation for drug treatment or for a combination of drug treatments that are going to have the best outcomes for an individual patient based on their characteristics and their total factors. I think this is the promise of personalized and precision medicine. If we're talking about computer vision, I think a really exciting application that is already starting to come into place in the clinic is to use algorithms for X-ray interpretation to identify abnormalities in the image. So in this case, for example, an algorithm would look at this chest x-ray [on the slides] and tell us whether a patient has a lung infection or a lung tumor by just quickly analyzing the image. Finally, another really interesting application of artificial intelligence in the domain of natural language processing is its application to electronic health records, where we can use these algorithms to extract data and derive insights that can be useful for individual patient care. By extracting information that might be in manual notes that have been digitalized, we can really complete the clinical picture of an individual patient and have more information on their trajectory and then all of their medical history. This can inform better medical care for these individual patients. On a larger scale, in which one could extract information from electronic health records at a population level, this would allow us to derive really meaningful insights that can be used then for research on a larger population, for example, for clinical trials or drug discovery. These are just a few examples. I think something really important that ties all this together and that is important at the core of using artificial intelligence in a healthcare setting is the need for trust and acceptance, both from healthcare professionals and also from patients when using artificial intelligence in this domain. It's really important to understand the risks of using such approaches. So, for example, if an algorithm may have some biases in its predictions that we are aware of this and can avoid this leading to consequences in the real world. A really important element that ties together trust, acceptance, understanding of risks, detection of errors, and detection of biases is interpretability. And this interpretability refers to the degree to which a human can understand how an algorithm can come to its predictions or classifications. When we talk about interpretability, one of the main ways in which we can make the model interpretable is through the choice of the model itself. And we typically distinguish between black box models and white box models. I'm going to give a little bit of an overview of each of these. Black box models are typically complex models that are capable of carrying out rather complex tasks. When you give them input, they carry out this complex task, and then they produce an output. So these models are often really complex. They have thousands or millions of parameters or weights. And even when one is capable of seeing the structure of the model and seeing the weights of the models, the inner mechanics of the model, which means the process through which the model went from input to output, is really difficult to understand without additional tools. What often happens is that a second machine learning model is deployed called a decoder. This is used to explain the results once they've been produced by the model. So we have a first model that goes from inputs to outputs, and then we have a second model called the decoder that takes the results once they've been produced and then tries to decode and explain why these results have been produced. I want to give an example here [on the slide] of what a black box model looks like and how a decoder works. So here we have a black box model, which is called CheXNet. It's a convolutional neural network of 121 layers. What it takes as input is images from chest x-rays. What it produces as output is a prediction as to whether a patient from which this image was taken has an infection in their lungs or not. Here [on the slide] we can see that it predicts that the patient has a likelihood of having pneumonia. What has been used as a decoder for this model is an additional algorithm that transforms the signal into Heatmaps, so that once we have the input, which is the chest X-ray and we have the output, which is the diagnosis of 85% likelihood of having pneumonia, we also have the decoder produced for us, this image which is a heatmap, which is telling us what are the areas of the image that were most significant in producing this prediction. We can see here that in this heatmap, this area of the chest is highlighted in red, indicating that this is the most important thing in making this prediction. But what is quite important here, while this is an example that works really well, this is because the output produced here is an image reflecting the input image quite closely. And I think computer vision allows itself well to this type of modeling. But essentially what we've done is let the model predict the patient's likelihood of having pneumonia, which is not directly interpretable. Then we used a second tool to try to understand why this prediction was made. This is in contrast with white box models. White box models are models that are directly interpretable without the need for additional tools. This is called inherently explainable AI, where a decoder is not necessary. This is especially important for health care applications where trust and acceptance is important, and where it's crucial to understand the risks and to detect any potential errors and biases in the predictions that the models are making. White box models allow experts to verify the prediction by understanding the inner logic of the algorithms and understanding which of the features contributed to a prediction. For me, this evokes the image of a transparent engine like the image that I'm showing here [on the slides]. And this transparent engine allows us to see how the different pieces of the engine work. We could think of a metaphor, where we think of the human organism as a transparent engine. In this engine, through a series of biological mechanisms, alterations at the genetic level have downstream effects across different molecular, cellular, and tissue levels, therefore contributing to disease. So there is an underlying biological mechanism that translates why a genetic variant has an impact at the clinical level. And I think that this is an element that is really important to capture when doing variant interpretation because it allows variant scientists to understand why predictions have been made and to map them onto biological processes. So I'd like to move now to our final topic for today, which is how we can use machine learning to help with variant interpretation. So we've previously talked about variant interpretation as a laborious and complex task that is critical to the success of a genetic test. It's carried out by human experts in labs and hospitals. And we propose two ways in which machine learning can really enhance this process. And I'll go a little bit deeper into this. The first way in which machine learning can really enhance this process and help support human experts that do that are doing this variant interpretation in labs is the automation of variant interpretation. This means automating two steps, essentially the first one being variant classification, which is this categorization of the variant as either pathogenic or benign. And this can be done according to the intrinsic molecular properties of variants. And in second place, variant prioritisation, which can then use case-specific information such as the clinical features or the segregation of variants of the patient to rank variants according to their likelihood to be the cause of disease for a patient. So I see particularly the value of automated variant interpretation or what we call "easy cases." These are cases of patients who have genetic diseases where the diagnosis is relatively straightforward. What this application has as a really strong benefit is in helping variant scientists process these cases faster. This means lowering the turnaround time for tests and lowering the costs of processing tests, which for the patient translates into a shorter time for results and therefore also a shorter time to diagnosis. However, I think this also has a really important benefit for the variant scientist: by helping speed up the processing of these "easy cases", it helps free up their time and then also allows them to have the time to focus where their expertise is most needed and matters the most, which is the complex cases of patients who have genetic diseases that are not as easy to diagnose. And for this second point, what machine learning can do is really help prioritize candidate variants for manual assessment by variant scientists by taking information from different sources, integrating it, and using these insights to highlight variants of unknown significance (VUS) that would otherwise go undetected by classic variant filtering strategies. The main benefit that this application has is that it allows potentially for an increase in the diagnostic yield for patients that would have otherwise received negative or unclear results. And for the patients, this means that it also avoids doing any unnecessary tests, which is, of course, beneficial to the patient, but also to the health care systems. In this aspect, I'd like to present AION, which is our platform for better interpretation, fueled by white box explainable models that integrate molecular, clinical and experimental data and allows us to discover pathogenic variants and provide molecular and clinical diagnosis for patients undergoing genetic testing. So we've developed proprietary machine learning models to create a database that contains over 100 million classified variants across the coding regions of the genome. And this is 100 times larger than the largest publicly available databases such as ClinVar. I'd like now to give an overview of the workflow of AION and how it can be used for enhanced interpretation. So as a first step, I am taking VCF files that contain the genetic data from patients as input, and we receive clinical data as input in the form of human phenotype ontology terms (HPOs). So for the VCF files, we currently support VCF files that have been generated through several secondary analysis pipelines. We also support the analysis of gene panels, whole exome sequencing, and whole genome sequencing data. Our customers and variant scientists can submit singleton cases or they can submit trio cases to perform true analysis and check for variant segregation in the trio. This information is, of course, included in the analysis of the variants and ranking afterwards, the prioritization in terms of clinical data. We receive clinical data in the form of human phenotype ontology terms (HPOs), and we allow input both as IDs, the numerical terms, or as HPO terms in natural human language. We also support synonyms. So if the exact term is not available, variant scientists can submit a close synonym to this term. We also allow for copy-pasting of lists of HPO terms. Someone can copy this directly from the report and paste it into our user interface. So once the VCF file and the HPO terms have been submitted, the VCF file undergoes annotation through our annotation pipeline, where we will add more than 100 different variant properties, which include molecular consequences at the DNA, RNA, and protein level, information about evolutionary and human variation, clinical consequences and ACMG criteria. After annotation, the VCF file will undergo classification and we provide currently three different classifications, according to three different sources. First of all, classification according to ClinVar. We are currently providing quarterly updates to ClinVar. The variant can also be classified according to the guidelines from the ACMG and finally a classification by our proprietary white box algorithm for variant classification where we will classify all variants and coding regions of the genome plus/minus 20bo intronic. The classification will be as benign or pathogenic. But in addition to this benign/pathogenic classification, we will also assign each variant a likelihood score that reflects the evidence that supports each prediction. To go a little bit deeper into these classifications, this is a screenshot [on the slide] from our user interface where you can see that we provide classification for ClinVar: in this case, the variant is pathogenic. For ACMG, the variant is likely photogenic, and for our proprietary white box algorithm, the variant is pathogenic with 83% likelihood. "Likelihood" reflects the likelihood that a variant is either pathogenic or benign, given the evidence available for the variant. So it ranges from 100% likelihood that the variant is pathogenic to 100% likelihood that the variant is benign. The closer the value is to one of the two extremes, the stronger the evidence available to make this classification. Of course, then the closer this number is to 50%, the higher the likelihood that there is less evidence to classify the variant as either clearly pathogenic or fairly benign. So what we want always, of course, is to have this value be as high as possible. But when it is not possible, for instance in a case where we see 51%, this indicates to the variant scientists that even though the variant was classified as pathogenic, the evidence available to make this classification is not strong. This classification is achieved by integrating data from 30 different data sources into several different modules that summarize the impact that a variant has on different biological processes. And again, this is a screenshot from our user interface [on the slide] where we have these four modules, one of them reflecting [a low] frequency in a healthy population and the regional constraints to variation in human population. The second point being conservation scores and reflecting the constraint to variation throughout evolution. The third point being alterations at the RNA level, and this score reflecting the likelihood of a variant being disruptive at the RNA level. So in this case, splice alteration. And in this fourth module, we see predictions related to amino acid changes and the alterations at the protein level. So as you can see, for each of these modules, there will be an intermediate prediction in which we can see basically for each of these four properties, which are constraint to variation and healthy population, constraint to variation throughout evolution, alterations at the RNA level, and then alterations of the protein level. We can have intermediate predictions regarding the status of each of these modules and how much information and evidence there is available to make this prediction. This information is, as I mentioned, integrated into this final prediction, which is shown and the results page with an 'ML' and this output, both the classification and likelihood, can be used to evaluate variants based on the molecular properties and the strength of the evidence available. Once the classification has taken place, we analyze all variants that are candidate pathogenic variants and go into the prioritization. So for variant prioritization, we take into account both the genetic and the clinical data that has been provided for each case. And currently, the phenotype matching process, as I mentioned, is done based on HPO terms, and the algorithm itself is quite robust to imprecisions in the term submitted. This means, if we have a patient with a symptom that is not well described in the clinical note and we only receive the information that the patient has, for instance, a malformation in the digits, the algorithm will be able to prioritize this information. So it's quite robust to imprecision and to incompleteness in the patients' descriptions. What the prioritizer does is that it outputs a ranked list of combinations of pathogenic variants with the associated diseases. So this reflects a candidate molecular and clinical diagnosis for the patient, together with the likelihood for each of these. So this step from the input of the file with HPO terms to providing a list of top variants associated with a molecular and clinical diagnosis, this process takes on average about 2 minutes for an exome instead of the 10 to 16 hours that we mentioned previously. At this stage, then the variant scientists can go through the list of ranked variants, select which one is most relevant to the patient, and export it as part of the report. And as the last part of my presentation today, I would like to briefly walk you through our validation study, where we aimed to benchmark the clinical performance of our proprietary algorithms that have been embedded in AION. So to do this, what we wanted to do was to compare what was the proportion of cases in which AION correctly identified the pathogenic variant in comparison with a rule-based approach such as the one from the ACMG guidelines. So to test this, we use the dataset of over 5000 examples from synthetic patients with monogenic diseases reflecting a wide array of diseases within 17 different categories, which include intellectual disability, hereditary cancer, metabolic disorders, neurodegenerative diseases, and so on. And of course, importantly, the variants that we used as part of our training dataset were excluded from this test dataset to make sure that we were, of course, doing everything properly. So these are the results [on the slides]. And in this graph, in blue, you can see the proportion of solved cases by ACMG guidelines compared to the proportion of solved cases by AION, which is shown in pink. And what we consider solved is that the variant that is the disease-causing variant in the patient is identified as pathogenic, or in the case of asymmetry as pathogenic or likely pathogenic. And what we see is that with ACMG guidelines, this was the case for 61% of the cases, whereas with AION this was 96% of the cases. What we take from this is that AION consistently outperforms variant interpretation based on patient guidelines. The average increase of a little bit higher than 50% in the diagnostic yield. I would like to conclude my presentation today with the following take-home messages. We've talked about how sequencing technologies, but also computing technologies are continuing to develop. Yet variant interpretation still remains a really complex and laborious process that is done often mostly manually. And this is quite a critical point because variant interpretation is a fundamental step in the success of genetic tests. We've also gone over many different applications for artificial intelligence and machine learning in health care. But one of the main points is that trust and interpretability are crucial to be able to use artificial intelligence in a healthcare setting. And part of establishing trust and interpretability is to be able to understand why a model has made a certain prediction. And in this aspect, white box models are really important in contributing to this aspect. We've also discussed how machine learning can help in variant interpretation, first through automated variant interpretation, especially for so-called "easy cases". But by doing this, it also frees variant scientists so that they can focus their time and they can use their expertise in analyzing candidate variants that have already been prioritized and then potentially find disease-causing variants for patients. And finally, AION is our platform for guided interpretation, which is fueled by white box machine learning algorithms that help classify and prioritize variants. This white box aspect provides interpretability in the results, which helps our customers verify the results while maintaining high performance with a 50% higher diagnostic yield than the ACMG guidelines.