Data processing with Python (o R) (TDADPP)

Data processing refers to the management and analysis of data along its entire life cycle. In this course, students learn how to develop a Python software able to collect, organize and analyse data in order to obtain a first exploratory knowledge


Audience (and prerequisites)

Anyone with a basic knowledge of Python language who wants to explore its applications in Data Science


Approaches (Objective)

Data Collection

  • Open data sources
  • API
  • Scraping


Data Representation

  • Data formats
  • Relational algebra
  • Database


Data Quality Assessment

  • Data source and data fusion
  • Data volume
  • Data standards and data impact


Data wrangling with pandas

  • Indexing
  • Reshaping
  • Merging and joining
  • Cleaning and normalisation


Exploratory Data Analysis

  • Descriptive statistics
  • Data visualisation
  • Feature engineering
  • Clustering