MyPROSLE was developed with the aim of bringing molecular information from patients to clinical practice. For this, we have created a new methodology with which to summarize the transcriptome of individual patients in a series of biological functions or gene-modules, which were later used to predict up to a total of 111 different clinical outcomes, including clinical manifestations, laboratory measurements, cytokine levels, autoantibodies, cell percentages, nearby disease flares and response to drugs.
To understand a little better how everything works, let's see a brief introduction:
First key point is that the method should reflect the molecular portrait of the patients, for this we decided to use gene expression data because we consider that transcriptome is a good reflection of what is happening in a patient at the molecular level. But transcriptome contains thousands and thousands of genes, and so it is not easily interpretable. In fact, genes do not act alone, rather form regulatory networks acting in specific biological functions. Therefore, we decided to project the genes into 600 gene modules, or networks, previously described, related with immune functions. In this way, we reduce the dimensionality from thousands of genes to hundreds of gene-modules.
Now, we need to be able to quantify how deregulated each gene-module is in each patient with respect to healthy controls. For this, we create M-scores (formula 1). Briefly, the M-score of a gene-module in a patient is calculated by adding the z-score of all its genes with respect to healthy controls. That is, the further the value is from 0, the more dysregulated the gene module is with respect to the healthy state, both positive and negative.
Where xj is the expression of gene j in an individual patient, μxjH and σxjH are the mean of the expression and the standard deviation of gene j in healthy samples and n is the number of genes from the module i. The M-scores fit a centered normal distribution. By definition, a value of 1.65 and -1.65 in such distribution will cover 95 percent of the positive and negative values, respectively, which would correspond to a p-value of 0.05 for each tail. Therefore, we can consider statistically significant those M-scores that take values greater than 1.65 (in absolute value).
At this point, we had about 600 gene-modules, but not all of them have to be relevant to SLE. Therefore, we selected only the gene-modules that are significantly dysregulated in at least 10% of patients in at least 3 datasets. In this way, we selected gene-modules that may be relevant in small groups but that are consistent across dataset. 206 gene-modules were selected.
Finally, we performed a clustering to see how the different gene-modules are grouped, and we obtained 9 very clear clusters of gene-modules that summarized 9 main biological functions related with SLE, or SLE-signatures (Figure 1).
As conclusion of this part: We can summarize the molecular portrait of a patient into 206 functional gene-modules or in 9 main SLE-signatures. And the M-scores of each module/signature quantifies how dysregulated that biological pathway is with respect to the healthy state.
Next, we use the values of M-scores of a total of more than 6000 patients to build prediction models for different clinical outcomes, which can be applied to any new patient through this same website, for this, we are going to see the section How to use.
How to useFirst, go to the Analysis page. MyPROSLE works with patient expression data (from 1 patient), preferably already normalized and transformed to logarithm. You can upload your data in a tabulated table where the first column would be the genes (in Gene Symbol) and the following columns would correspond to the different patients. You can see the format in more detail in Figure 2 or download an example.
M-scores can be calculated using healthy controls as reference or by patient-patient similarity. If you do not enter data from healthy individuals, the second methodology is selected by default.
To test, we can click on the Run an example button (found in the Analysis section) and data for 15 Lupus patients will be automatically loaded.
Once the expression data of the patients (and healthy controls, optionally) have been selected/loaded, we click on the Run Analysis button.
First, MyPROSLE calculates the M-scores for each patient and for each of the 206 Gene-modules. At this point, we obtain a figure like figure 1 personalized for each patient, as well as a table with all the M-scores per patient and per gene-modules and also with the median M-scores based on the SLE-signature at which that each Gene-module belongs to (Section Personalized molecular profiling).
Once we have the M-scores for a patient, MyPROSLE applies the prediction models previously generated for different clinical outcomes to it. In section Predict clinical outcomes, the tables with the predictions for each clinical variable per patient can be viewed and downloaded. You can download the following table to obtain detailed information about how to interpret the results for each clinical variable, as well as to see the performance results of each of the applied models.Download supplementary table
Contact us if you have any doubt or suggestion.
TeamDaniel Toro-Domínguez 1
Contact: firstname.lastname@example.orgMarta E. Alarcón-Riquelme 1,2
Pedro Carmona-Sáez 1,3
Manuel Martínez-Bueno 1
Raúl López-Domínguez 1,3
Jordi Martorell-Marugán 1,3
Elena Carnero-Montoro 1
Guillermo Barturen 1
Daniel Goldman 4
Michelle Petri 4
1 GENYO. Centre for Genomics and Oncological Research: Pfizer, University of Granada, Andalusian Regional Government
2 Karolinska Institute
3 University of Granada
4 John Hopkins University School of Medicine
Personalized molecular profiling
Figure 1. Molecular portrait of the patient. Each bar represents a gene module and its M-score colored by SLE signature. SLE-signatures summarize the main biological functions involved in the development of the pathology.
Predict clinical outcomes
Figure 2. Clinical manifestations predicted as positive or present in the patient. The clinical manifestations were grouped by categories according to the main tissue or organ they affect. The color of each category (represented by an affected organ icon) depends on the number or percentage of positive clinical manifestations within each category. Review the right table or Figure 3 for more details on the predicted clinical manifestations in each case.
Figure 3. The figure shows in the x-axis the absence/Presence probability (from 0 to 1) of suffering each clinical manifestation for categorical variables, or the predicted value for numerical variables (in picogram/milliliter for cytokines and in percentage for cell types) for each patient. Descriptions of each clinical variable and predictive model performances can be downloaded from help tab.