Learning to Quantify

An AIDA course

Download the course material as a .zip file Download the course material as a tar.gz file

          Policy makers or computer scientists may be interested in
    finding the needle in the haystack (...), but social scientists
       are more commonly interested in characterizing the haystack.
  
                            (Daniel J. Hopkins and Gary King, 2010)

Learning to Quantify: Inferring Unbiased Estimators of Class Prevalence via Machine Learning

This course provides an introduction and an overview of learning to quantify (a.k.a. “quantification”), i.e. the task of training estimators of class proportions in unlabeled data by means of supervised learning. In data science, learning to quantify is a task of its own related to classification yet different from it, since estimating class proportions by simply classifying all data and counting the labels assigned by the classifier is known to often return inaccurate (“biased”) class proportion estimates.

The course introduces learning to quantify by looking at the supervised learning methods that can be used to perform it, at the evaluation measures and evaluation protocols that should be used for evaluating the quality of the returned predictions, at the numerous fields of human activity in which the use of quantification techniques may provide improved results with respect to the naive use of classification techniques, and at advanced topics in quantification research.

The course is suitable to researchers, data scientists, or PhD students, who want to come up to speed with the state of the art in learning to quantify, but also to researchers wishing to apply data science technologies to fields of human activity (e.g., the social sciences, political science, epidemiology, market research) which focus on aggregate (“macro”) data rather than on individual (“micro”) data. The course also includes a hands-on part, in which the students will be guided through the implementation and/or use of quantification tools. This hands-on part also includes a brief introduction to QuaPy, a Python-based open-source library for learning to quantify.

Structure of the course

The course will be held online (the link will be announced via email to all who have registered) and will be structured in 4 slots of two hours each. The course schedule is as follows:

14 March, 10.00 - 12.00
16 March, 10.00 - 12.00
21 March, 10.00 - 12.00
23 March, 10.00 - 12.00

Lecturers:


Alejandro Moreo: is a Researcher at the Institute for the Science and Technologies of Information of the Italian National Council of Research (ISTI-CNR). He has obtained a PhD in Computer Science and Information Technologies from the University of Granada, Spain, in 2013. His research interests lie at the interface of data mining and machine learning, with particular emphasis on deep learning, representation learning, and transfer learning for cross-lingual text classification. He has taught several programming courses at the University of Granada, a course on Text Mining for the MSc program in Data Science at the University of Pisa, a full-day course on text classification and sentiment analysis at AFIRM 2019, and half-day tutorials on learning to quantify at SIGIR 2019 and SocInfo 2020. A list of his publications can be consulted here. Contact him at alejandro.moreo@isti.cnr.it	Fabrizio Sebastiani: is a Director of Research at the Institute for the Science and Technologies of Information of the Italian National Council of Research (ISTI-CNR); formerly he was an Associate Professor at the Department of Pure and Applied Mathematics of the University of Padova, Italy (2005/06) and a Principal Scientist at the Qatar Computing Research Institute (2014/16). His research interests lie at the interface of data mining, machine learning, information retrieval, and natural language processing, with particular emphasis on text classification, authorship analysis, technology-assisted review, and learning to quantify. On the topics of this proposal Fabrizio has done active research for more than 12 years. On the same topics, he has extensive experience in delivering courses at graduate level, tutorials at international conferences, and courses at summer schools. A list of his publications can be consulted here. Contact him at fabrizio.sebastiani@isti.cnr.it

Alejandro Moreo: is a Researcher at the Institute for the Science and Technologies of Information of the Italian National Council of Research (ISTI-CNR). He has obtained a PhD in Computer Science and Information Technologies from the University of Granada, Spain, in 2013. His research interests lie at the interface of data mining and machine learning, with particular emphasis on deep learning, representation learning, and transfer learning for cross-lingual text classification. He has taught several programming courses at the University of Granada, a course on Text Mining for the MSc program in Data Science at the University of Pisa, a full-day course on text classification and sentiment analysis at AFIRM 2019, and half-day tutorials on learning to quantify at SIGIR 2019 and SocInfo 2020. A list of his publications can be consulted here. Contact him at alejandro.moreo@isti.cnr.it

Fabrizio Sebastiani: is a Director of Research at the Institute for the Science and Technologies of Information of the Italian National Council of Research (ISTI-CNR); formerly he was an Associate Professor at the Department of Pure and Applied Mathematics of the University of Padova, Italy (2005/06) and a Principal Scientist at the Qatar Computing Research Institute (2014/16). His research interests lie at the interface of data mining, machine learning, information retrieval, and natural language processing, with particular emphasis on text classification, authorship analysis, technology-assisted review, and learning to quantify. On the topics of this proposal Fabrizio has done active research for more than 12 years. On the same topics, he has extensive experience in delivering courses at graduate level, tutorials at international conferences, and courses at summer schools. A list of his publications can be consulted here. Contact him at fabrizio.sebastiani@isti.cnr.it

Online Material

The material of this course can be downloaded here.

Pre-requisites:

Basic notions of machine learning and probability (and some coding skills) will be very helpful.