The increasing availability of public multiomics and other infection and immunity research data has revolutionised biomedical research. However, the COVID-19 pandemic highlighted challenges in rapidly leveraging this wealth of information for therapeutic development against emerging infectious diseases. Moreover, developing therapeutics requires an in-depth understanding of pathogen-host interactions and cellular responses to infection.
To address these challenges, we have developed a scalable, integrated multiomics database combining extensive public data with in-house datasets. Our database includes over 100 RNA-Seq studies, 1500 CRISPR screens, protein-protein interactions, and host-virus protein-pulldown studies. Using a relational SQL architecture, we enable complex queries across these diverse data types, facilitating novel relationship discovery. The platform generates knowledge graphs and interaction networks to prioritise targets for experimental validation.
We demonstrated the platform's utility by comparing results with a published study on West Nile Virus (WNV) antiviral targets in human macrophages. Our approach, which scored genes based on differential expression, CRISPR hits, drug targets, and viral protein interactions, scored three of the four experimentally validated targets in the top 25 candidates, including AIM2, IFI27, and CCL5. Notably, we also identified DPP4 as the second highest scoring candidate, a known coronavirus receptor, as a potential novel target for WNV and other Flaviviruses.
By standardising data collection and integration, we have created a scalable platform that can adapt to emerging threats and serve as a valuable resource for pandemic preparedness. To enable users to build their own database, we provide scripts to build from scratch, add assets, and update for the latest versions of online resources.