Currrent Courses

  • 2018

    CS4145 Crowd Computing (with Nava Tintarev)

    Crowd Computing is an emerging field that sits at the intersection of computer science and data science. Crowd computing studies how large groups of people can solve complex tasks that are currently beyond the capabilities of artificial intelligence algorithms, and that cannot be solved by a single person alone. It involves algorithmically engagement and coordination of people by means of Web-enabled platforms. These complex tasks are mainly focused on the creation, enrichment, and interpretation of data, making crowd computing a building block of data science. Examples of such tasks include the coordinated creation of data about real world events when electronic sensors are not available; the annotation of existing data sets to create ground truth data for the training of machine learning algorithms; and the analysis and interpretation of Web data to spot identify inappropriate content (e.g.,hate speech, or fake news). Crowd computing is an essential tool for any data-driven company: from Facebook to Microsoft, from Google to IBM, from Spotify to Pandora, all major companies employ crowd computing to fulfil their data needs, both by involving employees, and by reaching out to anonymous crowds through online marketplaces like Amazon Mechanical Turk and CrowdFlower.

    The objective of the Crowd Computing course is to introduce the scientific and technical underpinnings of crowd computing, and to investigate how it can be used for computer science applications (e.g., information retrieval, machine learning, next-generation interfaces, and data mining) and for real world applications (e.g., cultural heritage preservation, online knowledge creation, smart cities, etc.). The course is designed around one key challenge, the creation and consumption of (high quality) data.

  • Present 2013

    TI1506 Web & Database Technology (with Claudia Hauff)

    Web & Database Technologies is a first year course that provides an introduction to Web and Database technology and programming. Topics related to Web technology and programming include: Introduction to the Web (low-level technical details); Introduction to Web development and app design; Front-end development: HTML5, CSS3, JavaScript; Back-end development: JavaScript; Mechanisms for authentication, security and sessions.
    Topics related to Web technology and programming include: Introduction to Databases; Overview of Database Languages and Architectures; The basic relational model; SQL: Data Definition, Constraints, Updates, Queries, Views; Conceptual Data Modelling Using the Entity-Relationship Model; Mapping a Conceptual Design into a Logical Design.

Available Master Projects

My research group is open for students who want to do their thesis on subjects in the wider area of social data science and crowd computing. As a rough indication of possible subjects for thesis projects, below there are some subjects that are open for new students. Many of these topics can be approached in a collaboration between academia and industry.

Urban Data Analytics for Smart Cities

In the context of the SocialGlass project, we are looking for master students with a passion for data science, and an interest in improving the quality of life in our cities. In SocialGlass we develop new urban data science methods that can help addressing issues in domains such as transportation, crowdedness in the city, responsible energy consumption, urban planning, and business attractiveness. Examples of available MsC thesis include:

  • Developing (Deep) Machine Learning models to quantify and predict safety / crime rates in urban neighborhoods, using combinations of StreetView, social media, and socio-economic data
  • Developing (Deep) Machine Learning models to quantify and predict quality-of-life aspects (e.g. segregation, deprivation etc.) in cities, using satellite imagery and social media data
  • Developing models and implementing systems to recommend new POI locations in cities

Analysing Individual Energy Consumption Behaviour using Social Media Data

Currently, energy consumption data are primarily being gathered by (smart) energy meters at the household level. While such data is highly reliable and temporally complete, its acquisition requires access to the energy infrastructure; moreover, such data semantically poor, The aim of this project is to explore the potential usefulness of social media as an alternative source to collect data about individuals energy consumption behaviour. We focus on four components of energy lifestyle namely: Dwelling, Mobility, Food consumption and Leisure. The output of this project will be a social media analysis pipeline for collecting and classifying the energy related social media posts (e.g. tweets) and finally generating an energy consumption profile for social media users.

Chatbots Able to Learn New Skills

There are chatbots serving a purpose to retrieve information (e.g. “when is the next train to Amsterdam?”) or perform a transaction (e.g. “Purchase one ticket to Escher museum in The Hague”). Such chatbots are usually designed for a specific narrow use case and their functionality is hardcoded. Extending functionality of such chatbot requires an intervention of a software developer to the codebase of the chatbot. We envision a chatbot system, which can extend its functionality by learning from users, crowd workers, experts, or even automatically. Think about Wikipedia. Years ago, it lacked articles on many topics. Now with contributions of thousands of people all around the world it is hard to find a topic, which is not covered there. Similarly if thousands of people teach new skills to such chatbot, it will be able to effectively serve millions of users in wide range of domains.

Generating Chatbots based on APIs and DB schemas

Currently it is possible to develop a chatbot semi-automatically based on Q/A dataset or based on an API. We believe that a logical next step is to be able to construct a chatbot automatically based on data base schema or REST API. This research will help to understand how to map database schema and API endpoints tree with a conversation tree, and to allow fast creation of chatbots.

Human Aided Bots - Dialogue Management

When we purchase a coffee the conversation we have with the barista is quite standard. In contract at work the conversation with a colleague about solving a unique complex problem is not predefined, and we adapt along its way. Similarly chatbots usually manage to follow a dialogue in a predefined domain it is designed for quite well, and fail to do so in more complex and less predictable conversation scenarios. We aim to address this issue by designing methods and tools for modeling both fixed and open dialogues. A special interest is understanding dialogues on the go, even if this chatbot was not initially designed for such chatbot.

Forecasting of financial status of SMEs with Social Data

The goal of this MsC project is to combine social media data (e.g., from Twitter and LinkedIn) with financial data and macroeconomic data from public sources (e.g. Google Trends, Yahoo Finance, business magazine articles), towards forecasting the financial status of small and medium enterprises. The work will be performed in collaboration with the newly founded Exact Data Science core team. Your work will allow customers of Exact (i.e., the entrepreneurs) to leverage the power of big data for better data-driven decision making. As an intern in Exact, you will experience hands-on all phases of the typical data scientist work: from data collection and transformation to feature engineering, from training predictive models to deploying them in actual production code.

Enterprise Crowdsourcing

While machine learning and artificial intelligence applications are gaining popularity, enterprises are devoting more and more attention to enterprise crowd-sourcing as an effective technique able to capitalize on their available human resources to achieve inclusion of in-house human generated data. The aim of this thesis project, to be performed in collaboration with IBM Netherlands, is to advance the state-of-the-art in enterprise crowdsourcing by studying how different task design and participation incentives affect the quality and reliability of the employees' work.

Extracting Domain Specific Entities and Relations from text (Web pages)

Extracting entities of interest (e.g dataset, method, evaluation metrics) and their relations (e.g. isUsedBy, ComparedWith, ..) from massive text corpora (e.g Clueweb) is important for enhancing the semantic search, linking information across different sources and etc. The aim of this project is to devise methods to automatically extract the entities of interest and the relations between them. Relevant MSc courses Information Retrieval, Pattern Recognition.

Long-Tail Named Entity Extraction

This engineering-heavy MSc thesis focuses on implementing a framework for named entity recognition and extraction from natural text, with a focus on rare entities. In collaboration with our team, novel NER and NEE methods are developed, implemented and evaluated on scientific publication corpora . The final result is released as a well-documented open source project.

Past Courses

  • 2017 2013

    IN4325 Information Retrieval

    Retrieving relevant information is one of the central activities in modern knowledge-driven societies. As the amount and variety of data increase at an unprecedented rate, access to relevant, possibly unstructured information is becoming more and more challenging. The World Wide Web is now the primary source of information for leisure and work activities. The real value of the Web can only be unlocked if the huge amount of available data can be found, analysed, and exploited so that each user can quickly find information that is both relevant and comprehensive for their needs.

    Information Retrieval (IR) is the discipline that deals with the representation, storage, organisation of, and access to information items, and it is concerned with providing efficient access to large amounts of unstructured contents, such as text, images, videos etc. The objective of the IN4325 - Information Retrieval course is to introduce the scientific underpinnings of the field of Information Retrieval. The course aims at providing students basic information retrieval concepts and more advanced techniques for efficient data processing, storage, and querying. Students are also provided with a rich and comprehensive catalogue of information search tools that can be exploited in the design and implementation of Web and Enterprise search engines.