Date: 11 May 2018

Understanding protein functions is crucial to unlocking the value of genomic data for biomedical research and innovation. Delivering personalised health and precision medicine requires a detailed understanding of the consequences of sequence variants in proteins and their impact on phenotype. The widening gap between known proteins and their functions has encouraged the development of methods to automatically infer annotations. Artificial intelligence and machine learning hold a large repertoire of algorithms and methodologies to discover and infer prediction models. Coupled with the new big data technologies for interactive analytics and data transformation, the AI/ML methods represent valuable assets that could enhance the discovery of protein functions. 

This tutorial will help you understanding how to use Spark and Interactive Analytics to make sense of protein data and build machine learning models to infer their functions. 

Contact: Sangya Pundir - webinars@ebi.ac.uk

Keywords: Ensembl Genomes, DNA & RNA (dna-rna), Molecular building blocks of life, GPU, Data protection, Job dispatcher, Bioimage analysis resource, GPU, Data protection, Accessibility, Missense variation, Biostatistics, Rfam, DNA & RNA (dna-rna), non-coding RNA, Infernal software, Sequence annotation, Root microbiome, Abiotic stress, land management, Plant genotype, Plant webinar series, HPC, Data protection, database development, cross-linked databases, Plant database, Plant webinar series, data infrastructure, Plant webinar series, Plant breeding, Data standards, data managemnet, data sharing, Plant webinar series, Hyb-Seq method, Flowering plants, Artificial intelligence, Crop improvement, Pangenomics, Pangenomes, Plant webinar series, Virtual humans, Drug-target identification, plant-microbe interactions, Plant webinar series, Spatial transcriptomics, Plant research, Europe PubMed Central, Open Targets Platform, Drug targets, Machine learning, Plant webinar series, Mathematical modelling, plant science, Data integration, Virtual humans, plant-environment interaction, Plant webinar series, Phenotyping, field phenotyping, Deep phenotyping, EOSC-Life, NHGRI-EBI GWAS Catalog, clinical data, genome-wide association, UniProt: The Universal Protein Resource, Proteins (proteins), Proteomics, API, Proteomes, Peptide search, Ensembl Genomes, Plant webinar series, Plant research, plants, European Variation Archive, EVA, Variant clusters, Variant data annotation, Constraint-based metabolic modelling, UniProt: The Universal Protein Resource, Proteins (proteins), UniProt knowledgebase, protein variant impact, disease-associated protein variants, Bioethics, FAIR principles, ELSI, cohort data, translational research, BioModels database, Mathematical modeling, Reproducibity, Systems biology models, workflows, federated analysis, polygenic risk scores, IntAct Molecular Interaction Database, PSICQUIC, IMEx, Complex portal, Agent-based modelling, Macrophages, Tumorigenesis, Ensembl, Europe PubMed Central, Literature (literature), Training (Training), On-demand, teaching, introduction, Building blocks, Data analysis, COSMIC, Cancer mutation, Somatic mutation, UniProt: The Universal Protein Resource, Proteins (proteins), UniRule, UniFIRE, ARBA, Protein annotation, ChEMBL: Bioactive data for drug discovery, Chemical compounds, drug-like molecules, Chemogenomics, UniProt: The Universal Protein Resource, MetaboLights: Metabolomics repository and reference database, Biocurator, Programming, Data management, Green Algorithms, Open data, Environmental impact, Carbon footprint, UniProt: The Universal Protein Resource, Proteins (proteins), Protein function, HPC workflows, Orchestrator, Gene expression (gene-expression), Chemosensitivity assay, Experimental protocols, Drug screening, MICHA, EOSC-Life, European Variation Archive, European Genome-phenome Archive, EVA, EGA, Open data, restricted access, UniProt: The Universal Protein Resource, Proteins (proteins), UniProt, Introduction, UniProtKB, Proteome, UniProt: The Universal Protein Resource, Protein Data Bank in Europe, Gene expression (gene-expression), genes, Proteins (proteins), Introductory, HPC, GDPR, Data security, Ensembl, Expression Atlas, Gene expression (gene-expression), genes, Introductory, UniProt: The Universal Protein Resource, Proteins (proteins), API, Programmatic access, UniProtKB, Proteomes, UniParc, UniRef, Pfam, InterPro, Proteins (proteins), protein sequence search, Domain architecture, Protein taxonomy, UniProt: The Universal Protein Resource, Proteins (proteins), UniProt, Cross domain (cross-domain), Introduction, Beginner, AlphaFold Database, Proteins (proteins), AI structure prediction, Cross domain (cross-domain), COVID-19, Coronavirus, COVID-19 Data Portal, Virology, EOSC-Life, Computational simulations, Signalling prior knowledge, Cancer therapy response, PRIDE: The Proteomics Identifications Database, Proteins (proteins), AlphaFold Database, Proteins (proteins), protein structure prediction, AI system, Competency framework, Computational simulations, Signalling prior knowledge, Cancer therapy response, Europe PubMed Central, Literature (literature), Publication, Journal club, Preprints, Protein Data Bank in Europe, Proteins (proteins), Aggregated view, Structural similarity, Proteins (proteins), PPI networks, Computational simulations, Cell signaling, UniProt: The Universal Protein Resource, Proteins (proteins), Malaria, Biological networks, Graph theory, Building biological networks, Europe PubMed Central, Literature (literature), Biocuration, Information Manager, Industry, DNA & RNA (dna-rna), Rare-variant, Variant association, SKAT, European Genome-phenome Archive, Sensitive data, Biocuration, Biocurator, Human cohort data, FAIR, Gene Ontology, International Mouse Phenotyping Consortium Portal, Biocuration, Biocurator, Biocurators, Biocuration, UniProt: The Universal Protein Resource, biocuration, programming skills, training, career development, Ensembl, Cell-level simulations, Personalised medicine, PerMedCoE, Single cell, NHGRI-EBI GWAS Catalog, Phenotype, Genome-Wide Association Studies, MetaboLights: Metabolomics repository and reference database, Ensembl, Genome, Cell-level simulations, Personalised medicine, PerMedCoE, Signaling networks, Cell-level simulations, Personalised medicine, PerMedCoE, BioImage Archive, Europe PubMed Central, Literature (literature), API, Programmatic access, REST, Funding, Cell-level simulations, Personalised medicine, PerMedCoE, ChEMBL: Bioactive data for drug discovery, UniChem: Chemical Structure Cross-referencing, Chemical biology (chemical-biology), Bioactivity data, Drug-like compounds, Programmatic access, API, Ensembl, DNA & RNA (dna-rna), Programmatic access, API, REST, Open access, FAIR, Europe PubMed Central, Literature (literature), Open access, Preprints, Cell-level simulations, BioStudies Database, EBI Search, Cross domain (cross-domain), Open access, Finding data, UniProt: The Universal Protein Resource, Proteins (proteins), World Sight Day, Eye development, Eye disease, Cross domain (cross-domain), Open access, Open software, IntAct Molecular Interaction Database, Complex Portal, Cross domain (cross-domain), Metadata, FAIR, GDPR, Open access, Europe PubMed Central, Literature (literature), Preprints, Web Services, Dbfetch, Database fetch, programmatic access, Cell-level simulations, Personalised medicine, PerMedCoE, API, Open Targets, AlphaFold Database, Proteins (proteins), Protein structure prediction, Colab notebooks, EBI Search, RESTful API, RESTful Web Services, Protein Data Bank in Europe, Proteins (proteins), PDBe, Ligand binding sites, InterPro, UniProt: The Universal Protein Resource, Protein families, Protein functional analysis, Protein sequences, Programmatic access, Europe PubMed Central, Literature (literature), Preprints, Biocuration, Data curation, DNA & RNA (dna-rna), Proteins (proteins), Biomolecular sequences, Reactome pathways database, ReactomeGSA, scRNA-seq, Multiomics, IntAct Molecular Interaction Database, Cytoscape, PSICQUIC, IMEx, Europe PubMed Central, Literature (literature), PsyArXiv, Psychology, Open Targets Platform, Drug targets, ChEMBL: Bioactive data for drug discovery, Chemical biology (chemical-biology), Bioactive data, Drug targets, Drug-like molecules, UniProt: The Universal Protein Resource, Proteins (proteins), BioStudies Database, Literature (literature), Biological studies, Literature, Open access, UniProt: The Universal Protein Resource, Proteins (proteins), Neurodegenerative disorders, UniProtKB, Rfam, RNA families, FAIR, Data management, Data sharing, UniProt: The Universal Protein Resource, Proteins (proteins), UniProtKB, Metabolites, Protein Data Bank in Europe, Proteins (proteins), Molecular visualisation, Molecular structures, Electron density maps, PDBe, FAIR, Data sharing, Data management, EBI BioSamples Database, Cross domain (cross-domain), BioSamples, FAIR, Metadata, European Genome-phenome Archive, EGA, Phenotype, Data Use Ontology, DUO, FAIR, Data management, ELIXIR, Ensembl Variant Effect Predictor, Ensembl, Variant Effect Predictor, VEP, Variation data, Annotation, RNAcentral, DNA & RNA (dna-rna), Non-coding RNA, Programmatic access, Europe PubMed Central, Preprints, Literature search, Publications, Complex Portal, Macromolecular complexes, UniProt Align, UniProt BLAST, UniProt: The Universal Protein Resource, Proteins (proteins), Sequence analysis tools, Multiple sequence alignments, Europe PubMed Central, SciLite annotation, Text mining, MGnify, DNA & RNA (dna-rna), Programmatic access, API, Microbiome, Environment, Python, UniProt: The Universal Protein Resource, Proteins (proteins), Proteomes, MGnify, DNA & RNA (dna-rna), Microbiome, Environment, European Variation Archive, Programmatic access, API, EVA, Variant data, Europe PubMed Central, Literature (literature), Programmatic access, API, REST, Publications, Citations, Text mining, UniProt: The Universal Protein Resource, Proteins (proteins), UniProt Disease Portal, UniProtKB, PRIDE: The Proteomics Identifications Database, Proteins (proteins), Complex Portal, EBI BioSamples Database, ENA Sequence Search, Europe PubMed Central, IntAct Molecular Interaction Database, Data management, Open access, Reactome pathways database, ReactomeGSA, Pathway analysis, Single cell RNA-seq, Multiomics, Protein Data Bank in Europe, Proteins (proteins), Programmatic access, API, PDBe, Data visualisation, Protein Data Bank in Europe, Proteins (proteins), PDBe, GitHub, Programmatic access, API, Ensembl Genomes, Ensembl Rapid Release, Protein Data Bank in Europe, Proteins (proteins), Programmatic access, API, PDBe, Graph database, Protein Data Bank in Europe, protein, Programmatic access, API, PDBe, NHGRI-EBI GWAS Catalog, GWAS, Genome-wide association studies, SNP-trait associations, Protein Data Bank in Europe, Proteins (proteins), Programmatic access, API, PDBe, International Mouse Phenotyping Consortium Portal, Gene function, Mouse models, IMPC, Mammalian Phenotype Ontology, Mouse knockouts, Protein Data Bank in Europe, Programmatic access, API, PDBe, Cross domain (cross-domain), Xenograft models, Cancer, EurOPDX, InterProScan, Proteins (proteins), Protein families, Protein functional analysis, Enzyme Portal, Cross domain (cross-domain), Metabolites, Metabolism, Protein Data Bank in Europe, Proteins (proteins), PDBe, Graph database, PDBe-KB, API, Programmatic access, InterPro, Proteins (proteins), Interpro API, Programmatic access, Europe PubMed Central, Literature (literature), Europe PMC, PubMed, Preprints, InterPro, Proteins (proteins), Protein function, InterPro, Proteins (proteins), Protein function, Protein families, Europe PubMed Central, Literature (literature), Europe PMC, Preprints, PubMed, UniProt: The Universal Protein Resource, Proteins (proteins), COVID-19, SARS-CoV-2, UniProtKB, Ensembl Genomes, Genome browser, Vertebrate genomes, Comparative genomics, ChEMBL: Bioactive data for drug discovery, Chemical biology (chemical-biology), Drug-like properties, Bioactivity data, Open Targets, eQTL Catalogue, API, Europe PubMed Central, Europe PMC, PubMed, Expression Atlas, Single cell RNA-seq, Galaxy, Single Cell Expression Atlas, Drop-seq data, DNA & RNA (dna-rna), Expression Atlas, Gene expression (gene-expression), Single cell RNA-seq, Single Cell Expression Atlas, SCEA, Xenograft models, Cancer, EurOPDX, PDX, UniProt: The Universal Protein Resource, Proteins (proteins), UniParc, UniRef, UniProtKB, Europe PubMed Central, Literature (literature), Europe PMC, PubMed, Ensembl, Genes, GENCODE, Gene annotation, Clinical genomics, InterPro, Proteins (proteins), Genome3D, Genome annotations, Protein domain prediction, BioStudies Database, Cross domain (cross-domain), Biological studies, Data sharing, UniProt: The Universal Protein Resource, Proteins (proteins), Functional annotation, UniFire, Europe PubMed Central, Expression Atlas, Gene expression (gene-expression), Gene expression, Differential expression, Ensembl, Gene expression (gene-expression), Gene regulation, Enzyme Portal, Proteins (proteins), Enzyme-related information, UniProt: The Universal Protein Resource, Proteins (proteins), Alzheimer's Disease, Variant and disease, UniProt: The Universal Protein Resource, Proteins (proteins), Enzyme annotation, Rhea, European Nucleotide Archive, ENA, Raw sequencing data, Open-access data archive, WormBase, Worm genomics, Parasite data, Ensembl, Wheat genome, Wheat genes, HMMER - protein homology search, Proteins (proteins), Hidden Markov models, Sequence similarity search, Protein homology, QuickGO, Ontologies (ontologies), Gene ontology, Gene ontology annotation, Controlled vocabulary, Cross domain (cross-domain), Data visualisation, Designing scientific figures, Ensembl, Gene expression (gene-expression), Mouse genome, Mouse strains, Homology and variation, UniProt: The Universal Protein Resource, Proteins (proteins), Spark analytics, Interactive analytics, Modelling, ML methods

Organizer: European Bioinformatics Institute (EBI)

Host institutions: EMBL-EBI

Capacity: 200

Scientific topics: Protein function prediction, Function analysis, Machine learning


Activity log