TECH REVIEW

CWx (Cascaded Wx): Technology for Prognosis Biomarker Discovery

Deargen has announced the research results on an AI technology CWx that can discover biomarkers on disease prognosis. The research achievements has been published in the journal Frontiers in Genetics.

Introduction

Prognostic biomarkers are an important concept for diagnosing and treating disease. Although there are many statistics and machine learning algorithms for biomarker discovery, We are specially securing biomarker discovery technologies through deep learning technology.

However, the patient’s prognostic data includes censored data (patients with no events such as death and recurrence, etc. up to the follow-up time), which is different from the general feature selection algorithm. Thus, it is difficult to analyze accurately. Besides, prognostic analysis becomes more difficult especially in domain data with large feature space and small sample number such as RNA-seq. To overcome these problems, we developed Cascaded Wx that is a prognostic biomarker discovery algorithm.

Deargen’s Prognosis Relative Gene Selection Algorithm

For prognostic data, five carcinoma RNA-seq Transcriptome data including LUAD (LUng ADeno carcinoma) from The Cancer Genome Atlas (TCGA) were used. If major prognostic genes can be selected through Transcriptome Data, various biomarkers such as disease diagnosis, treatment strategy establishment, and treatment target gene presentation, etc. can be presented, The superiority of our model was proved by comparing 12 representative feature selection algorithms including Cox model and ElasticNet that are the most widely used in prognostic analysis with the performance of Cascaded Wx. 

Method

Basically, Cascaded Wx Framework is designed based on the Wx algorithm. The biggest motivation is analyzed by usually dividing into high-risk groups and low-risk groups based on a point in time (for example, death within three years) when analyzing prognostic data. However, in such an analysis, it was difficult to express the degree of risk. There was also a problem that there were no algorithms destined to handle large feature spaces such as RNA-seq. To overcome these two problems, we designed the following analysis framework.

Biomarker screening is performed in three steps, and as the step increases, the number of patients decreases and the feature space (number of genes) is also designed to be reduced at the same time. (17) In each step, insignificant features are discarded. At this time, a feature importance score (Discriminative Index Score) was obtained using the Wx algorithm. Similarly, the final Top-K gene was selected by using the Wx algorithm in the final step.

Conclusion

The following is a comparison of Cascaded Wx with key algorithms in LUAD RNA-seq data. The C-index (0-1, high is better) using 1 to 100 genes was compared by using selected Top-100 genes. We can see that our model,CWx, is more excellent than algorithms in different categories, including Cox model, ElasticNet, and Deseq2.

We also found that when using a higher rank gene that Cascaded Wx chose, it showed high performance in major carcinomas as well as LUAD carcinomas.

Deargen’s opportunities

We are validating the markers selected as prognostic biomarkers in patients with lung cancer so as to see the usefulness of Cascaded Wx, a prognostic gene selection framework. Also, it is expected to explore and uncover a variety of disease targets and biomarkers that have not been discovered until now through CWx algorithm designed based on Wx.