First Steps with SQL for Data Science

Overview

There are several reasons why learning or improving SQL (Structured Query Language) can be beneficial for data science:

  1. SQL is the standard language for working with relational databases, which are a common source of data for data science projects. By learning SQL, you'll be able to easily retrieve, manipulate, and analyze the data stored in these databases.

  2. SQL allows you to efficiently work with large datasets. When working with data science projects, it is common to deal with datasets that are too large to fit into memory. SQL provides powerful tools for filtering, grouping, and aggregating large datasets, so you can work with subsets of the data that are small enough to fit into memory.

  3. SQL can help you to understand the underlying data better. Even though SQL is a programming language, it is declarative, unlike most of the general-purpose programming languages like Python, C++ or Java. It means the code you write will tell the SQL engine what you want and the engine will take care of the HOW to get the results, so it will help to get more understanding about the dataset, its structure and its constraints

  4. SQL is a good tool for data preparation, cleaning and validation. Since it is a powerful tool to manipulate and filter the data, you could use it to prepare your dataset to a better shape before applying any statistical or machine learning models.

  5. SQL can be a valuable skill in the job market. Many companies store data in relational databases, and SQL knowledge is often a required or preferred skill for data science positions.

In summary, learning SQL can be a valuable addition to your data science toolkit, as it allows you to efficiently work with large datasets and can be a valuable skill in the job market : Survey

Audience

This course is addressed to Data scientists, programmers, bioinformaticians, researchers or students.

Learning outcomes

At the end of the course, the participants are expected to:
* Understand what a relational database is
* Understand the basics of SQL
* Write and execute simple to complex SQL queries
* Retrieve data from a relational database using the SQL language

* Filter, order and group query results
* Join tables together
* Use set operations
* Use dates and times in SQL.

Prerequisites

Knowledge / competencies

This course is addressed to data scientists, programmers, bioinformaticians, researchers or students and there is no requirement.

Technical

You are required to bring your own laptop with administrative rights. You will be asked to download and install a database client prior to the tutorial.

Outline

Module 1
* What is SQL?
* Data models: structure and content
* Datatypes in SQL
* Presentation of our used case
* Retrieve data with SELECT
* Filtering with SQL
* Limiting your results with LIMIT
* Advanced filtering: IN, OR, AND and NOT

Module 2
* Wildcards
* Ordering your results with ORDER BY
* Aggregate functions: AVG, COUNT, MIN, MAX and SUM
* Grouping your results with GROUP BY

Module 3
* Subqueries
* Introduction to joining tables
* Cross joins
* Inner joins
* Left and right joins

Module 4
* Set operations with UNION, INTERSECT and EXCEPT
* Date and Time
* Views
* Conclusion

Application

The registration fees for academics are 100 CHF and 500 CHF for for-profit companies.

You will be informed by email of your registration confirmation. Upon reception of the confirmation email, participants will be asked to confirm attendance by paying the fees within 5 days.

Applications close on 10/04/2024. Deadline for free-of-charge cancellation is set to 10/04/2024. Cancellation after this date will not be reimbursed. Please note that participation in SIB courses is subject to our general conditions.

Venue and Time

This course will be streamed.

It will start at 9:00 and end around 17:00. Precise information will be provided to the participants in due time.

Additional information

Coordination: Grégoire Rossier

You are welcome to register to the SIB courses mailing list to be informed of all future courses and workshops, as well as all important deadlines using the form here.

Please note that participation in SIB courses is subject to our general conditions.

SIB abides by the ELIXIR Code of Conduct. Participants of SIB courses are also required to abide by the same code.

For more information, please contact training@sib.swiss.

Keywords: SQL, programming

Authors: Dillenn Terumalai, Florent Tassy, SIB Swiss Institute of Bioinformatics


Activity log