Machine learning + lipid multiomics for accurate diagnosis and mechanism of pancreatic ductal adenocarcinoma

2022-05-08 0 By

Ductal adenocarcinoma of the pancreas (PDAC) is one of the deadliest cancers and is characterized by rapid progression, metastasis, and difficulty in diagnosis.However, there are currently no effective humoral based assays available for PDAC.In Science Advances (JCR 1 area of Chinese Academy of Sciences, Impact factor:14.136) Metabolic Detection and Systems Analyses of pancreatic ductal adenocarcinoma through machine learning,Lipidomics, and Multi-omics “, describes the use of machine learning (ML) to analyze, train, test, and validate large-scale lipidomics results in stages, and finally find 17 characteristic lipid metabolite sets with good classification effect, which can be used to diagnose PDAC.This study also demonstrates the potential application of machine learning combined with metabolomics in disease diagnosis.Study design Results 1. A total of 1416 metabolites were detected in serum lipidomics group (PDAC) of 333 PDAC patients and control group (NC) of 262 healthy subjects under positive ion mode, which belonged to 19 lipid groups.A total of 669 metabolites were detected in negative ion mode, which belonged to 16 lipid groups.2. Classification of lipidomics results based on machine learning algorithm divided 595 samples from the discovery cohort into 495 training cohort (training set: 372;Cross validation set: 123) and test set 100.Support vector machine (SVM, a machine learning classification algorithm) was used to classify and analyze lipid metabolites detected in positive and negative ion modes.After 5000 cycles of calculation, the SVM classification model in positive ion mode had an average accuracy of 82.26%, specificity of 98.05%, and sensitivity of 66.48% in the test set.In the negative ion mode, the average accuracy, specificity and sensitivity of SVM classification model in the test set were 85.88%, 71.93% and 99.83% respectively.These results suggest that lipidomics combined with SVM is a promising method to detect PDAC.3. Machine learning to find features that can effectively classify lipid metabolites Greedy algorithms evaluate by selecting the top features (i.e., lipid metabolites) one by one.For each current feature, the current feature is tagged and added to the selected feature set once the combination of the previously selected feature and the current feature reaches a higher performance level.For example, for the NTH iteration, based on the feature set containing previously selected features, the greedy algorithm first adds the current feature to the set and then performs 500 quadruple cross-validations to evaluate the average performance.If the average performance obtained is superior to the previous feature set, it means that the current feature is an addition to the existing selected feature set and is critical for identifying PDAC and healthy controls, and the current feature is retained in the feature set.The accuracy, specificity and sensitivity of the classification model composed of 27 characteristic lipid metabolites in positive ion mode were 93.61%, 89.92% and 97.30% in the test set.The accuracy, specificity and sensitivity of the classification model composed of 19 characteristic lipid metabolites in anion mode were 90.40%, 83.15% and 97.66% in test set.In addition, the researchers found that compared with traditional feature selection, the feature selection based on greedy algorithm is more accurate.Based on the characteristic metabolites found by greedy algorithm, 12 and 8 lipid metabolites detected in positive ion mode and negative ion mode were selected as the final characteristic lipid metabolites.The classification model based on the combination of 17 characteristic lipid metabolites can achieve the best classification effect with the minimum number.4. The classification model was validated in a large validation cohort. A targeted detection method based on mass spectrometry (MRM targeting lipidomics) was established to detect 17 characteristic lipid metabolites in serum samples from 1898 individuals.(1) Conduct verification in the aforementioned discovery queue: the 595 samples of the aforementioned discovery queue are divided into training set (n= 495) and test set (n=100) as internal verification.Multivariate binary logistic regression analysis showed that gender and age status had limited influence on the classification model, indicating that the classification model was specific for PDAC and healthy controls, independent of age and sex.The accuracy, specificity and sensitivity of the classification model on the training set were 89.49%, 89.15% and 89.75%, respectively.The accuracy, specificity and sensitivity were 86.00%, 80.00% and 92.00% respectively.The AUC of training set and test set reached 0.9591 and 0.9444 respectively.These results demonstrate the accuracy and effectiveness of PDAC based on lipid metabolites assisted by machine learning algorithm.(2) Further validation in an independent cohort: The classification model was further validated in an independent cohort containing 1003 individual sera (600 PDAC and 403 healthy controls).The AUC of the classification model was 0.9309, the accuracy was 88.24%, the sensitivity was 93.00%, and the specificity was 81.43%.Of the 600 PDAC samples, 86.38% (406/470) of early PDAC samples (phase I-II) and 90%(113/130) of late PDAC samples (phase III-IV) were accurately detected.These results indicate that the PDAC detection method assisted by machine learning algorithm based on lipid metabolites can effectively detect PDAC at all stages.(3) Validation in a new clinical cohort: The researchers examined the performance of a machine learning-assisted metabolic PDAC assay in a prospective, single-blind hospital cohort.The cohort included 130 cancer-free individuals who had undergone medical examination and 170 patients who had undergone pancreatic surgery, including 70 patients with benign pancreatic disease (cancer-free individuals) and 100 patients diagnosed with PDAC.The accuracy, specificity, sensitivity and AUC of the classification model were 85.00%, 81.00%, 93.00% and 0.9389 respectively.In this cohort, the machine-learning-assisted metabolic PDAC assay accurately detected 90.91% (50/55) of early PDAC(stage I-II) and 95.56% (43/45) of late PDAC(stage III-IV) samples.(4) Compared with other PDAC detection methods: The AUC of machine learning-assisted metabolic PDAC detection method was 0.9309, with accuracy of 88.24%, sensitivity of 93.00%, and specificity of 81.43%.While the classic PDAC biomarker carbohydrate antigen CA19-9 had an AUC of 0.8790, an accuracy of 83.00%, a sensitivity of 79.00%, and a specificity of 85.00%.The AUC of CT scan was 0.7098, the accuracy was 86.67%, the sensitivity was 78.00% and the specificity was 91.00%.In addition, machine learning-assisted metabolic PDAC detection was also more effective than CA19-9 and CT scanning in the detection classification of benign pancreatic diseases.Therefore, machine learning-assisted detection of metabolic PDAC has clinical application value, and AI method combined with CA-919 or CT scan may benefit clinical diagnosis of PDAC.(1) Matrix assisted laser desorption ionization mass spectrometry (MALDI-MSI) was used to detect 17 characteristic lipid metabolites in 5 pairs of PDAC cancer and paracancer tissue samples.The results showed that the variation trend of six characteristic lipid metabolites was consistent with the results of lipidomics mentioned above.(2) Proteomic results from 10 PDAC tissues and 5 paired adjacent pancreatic tissues revealed dysregulation of multiple proteins and pathways involved in lipid metabolism.(3) Analysis of single cell RNA sequencing results from 24,178 pancreatic tissue cells from PDAC patients and 5280 normal pancreatic tissue cells from an open database yielded 10 cell lineages.We distinguished PDAC cells from epithelial cells based on large-scale copy number variation (CNV).Glycerophospholipid metabolism was found to be the most significantly altered lipid metabolity-related pathway in PDAC cells.(4) Similar results were found in the TCGA-GTEX data set and independent mRNA microarray results.These results suggest that there is a wide range of lipid metabolism disorders in PDAC.We have developed a prototype approach that combines machine learning and metabolomics to improve the metabolomic-targeted disease detection program.The results show that the machine learning-assisted metabolic PDAC detection method has a better effect than the traditional method, which proves the potential application of this method in PDAC assisted diagnosis.Appropriate clinical application of this method may facilitate accurate diagnosis of PDAC patients and may guide more effective treatment.With years of technical accumulation in the field of target screening and validation services, the established standardized, engineered and systematic GRP platform provides scientific research services for Chinese research doctors and speeds up the transformation of scientific research achievements.The proteome platform is equipped with multiple timsTOF Pro, Exploris 480 high precision mass spectrometer, professional leading analysis software such as Spectronaut Plusar and Mascot, providing professional detection services such as 4D, DIA, TMT, PRM and phosphorylated modification group.Powerful machine learning algorithms, IPA analysis, protein genome analysis services, systematic biomarkers, molecular typing, drug targets, gene function research and other solutions, really make the majority of research doctors more worry, less effort, more efficient scientific research.