Browsing by Author "Silander O"
Now showing 1 - 2 of 2
Results Per Page
Sort Options
- ItemIdentifying important microbial and genomic biomarkers for differentiating right- versus left-sided colorectal cancer using random forest models(BioMed Central Ltd, 2023-07-11) Kolisnik T; Sulit AK; Schmeier S; Frizelle F; Purcell R; Smith A; Silander OBACKGROUND: Colorectal cancer (CRC) is a heterogeneous disease, with subtypes that have different clinical behaviours and subsequent prognoses. There is a growing body of evidence suggesting that right-sided colorectal cancer (RCC) and left-sided colorectal cancer (LCC) also differ in treatment success and patient outcomes. Biomarkers that differentiate between RCC and LCC are not well-established. Here, we apply random forest (RF) machine learning methods to identify genomic or microbial biomarkers that differentiate RCC and LCC. METHODS: RNA-seq expression data for 58,677 coding and non-coding human genes and count data for 28,557 human unmapped reads were obtained from 308 patient CRC tumour samples. We created three RF models for datasets of human genes-only, microbes-only, and genes-and-microbes combined. We used a permutation test to identify features of significant importance. Finally, we used differential expression (DE) and paired Wilcoxon-rank sum tests to associate features with a particular side. RESULTS: RF model accuracy scores were 90%, 70%, and 87% with area under curve (AUC) of 0.9, 0.76, and 0.89 for the human genomic, microbial, and combined feature sets, respectively. 15 features were identified as significant in the model of genes-only, 54 microbes in the model of microbes-only, and 28 genes and 18 microbes in the model with genes-and-microbes combined. PRAC1 expression was the most important feature for differentiating RCC and LCC in the genes-only model, with HOXB13, SPAG16, HOXC4, and RNLS also playing a role. Ruminococcus gnavus and Clostridium acetireducens were the most important in the microbial-only model. MYOM3, HOXC4, Coprococcus eutactus, PRAC1, lncRNA AC012531.25, Ruminococcus gnavus, RNLS, HOXC6, SPAG16 and Fusobacterium nucleatum were most important in the combined model. CONCLUSIONS: Many of the identified genes and microbes among all models have previously established associations with CRC. However, the ability of RF models to account for inter-feature relationships within the underlying decision trees may yield a more sensitive and biologically interconnected set of genomic and microbial biomarkers.
- ItemTracking the international spread of SARS-CoV-2 lineages B.1.1.7 and B.1.351/501Y-V2 with grinch(F1000 Research Limited, 2021-09-17) O'Toole Á; Hill V; Pybus OG; Watts A; Bogoch II; Khan K; Messina JP; COVID-19 Genomics UK (COG-UK) consortium; Network for Genomic Surveillance in South Africa (NGS-SA); Brazil-UK CADDE Genomic Network; Tegally H; Lessells RR; Giandhari J; Pillay S; Tumedi KA; Nyepetsi G; Kebabonye M; Matsheka M; Mine M; Tokajian S; Hassan H; Salloum T; Merhi G; Koweyes J; Geoghegan JL; de Ligt J; Ren X; Storey M; Freed NE; Pattabiraman C; Prasad P; Desai AS; Vasanthapuram R; Schulz TF; Steinbrück L; Stadler T; Swiss Viollier Sequencing Consortium; Parisi A; Bianco A; García de Viedma D; Buenestado-Serrano S; Borges V; Isidro J; Duarte S; Gomes JP; Zuckerman NS; Mandelboim M; Mor O; Seemann T; Arnott A; Draper J; Gall M; Rawlinson W; Deveson I; Schlebusch S; McMahon J; Leong L; Lim CK; Chironna M; Loconsole D; Bal A; Josset L; Holmes E; St George K; Lasek-Nesselquist E; Sikkema RS; Oude Munnink B; Koopmans M; Brytting M; Sudha Rani V; Pavani S; Smura T; Heim A; Kurkela S; Umair M; Salman M; Bartolini B; Rueca M; Drosten C; Wolff T; Silander O; Eggink D; Reusken C; Vennema H; Park A; Carrington C; Sahadeo N; Carr M; Gonzalez G; SEARCH Alliance San Diego; National Virus Reference Laboratory; SeqCOVID-Spain; Danish Covid-19 Genome Consortium (DCGC); Communicable Diseases Genomic Network (CDGN); Dutch National SARS-CoV-2 surveillance program; Division of Emerging Infectious Diseases (KDCA); de Oliveira T; Faria N; Rambaut A; Kraemer MUGLate in 2020, two genetically-distinct clusters of severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) with mutations of biological concern were reported, one in the United Kingdom and one in South Africa. Using a combination of data from routine surveillance, genomic sequencing and international travel we track the international dispersal of lineages B.1.1.7 and B.1.351 (variant 501Y-V2). We account for potential biases in genomic surveillance efforts by including passenger volumes from location of where the lineage was first reported, London and South Africa respectively. Using the software tool grinch (global report investigating novel coronavirus haplotypes), we track the international spread of lineages of concern with automated daily reports, Further, we have built a custom tracking website (cov-lineages.org/global_report.html) which hosts this daily report and will continue to include novel SARS-CoV-2 lineages of concern as they are detected.