Filter by type:

Sort by year:

Work and Play: An Experiment in Enterprise Gamification

Laurentiu Catalin Stanculescu, Alessandro Bozzon, Robert-Jan Sips, Geert-Jan Houben
Conference Paper Proceedings of the 19th ACM Conference on Computer-Supported Cooperative Work & Social Computing

Abstract

In recent years, gamification, “the use of game design elements in non-game contexts”, has drawn the attention of an increasing number of scientists. Although several studies highlighted the benefits of gamification in many applications, its potential in the enterprise environment still needs to be fully understood.

This work contributes to the studies in enterprise gamification with an experiment performed at a large multinational company. The experiment involved 206 employees for a period of 2 months. We describe a modular and extensible framework for enterprise gamification, designed to seamlessly integrate with existing enterprise-class Web systems. We studied how a gamified tool can help to foster employees' engagement with such systems, by making day-to-day tasks more stimulating.

We show how different game mechanics can help to achieve two business needs, namely social interaction and learning. To this end, we exploited the gamification framework to develop a Q&A Web application combined with learning, news sharing, and social connections capabilities. Results provide strong evidence of how a gamified experience can foster learning and social behaviour in employees, and provide new insights about the effectiveness of several game mechanics in an enterprise context.

Designing complex crowdsourcing applications covering multiple platforms and tasks

Alessandro Bozzon, Marco Brambilla, Stefano Ceri, Andrea Mauri, Riccardo Volonterio
Journal Paper Journal of Web Engineering, Volume 14 Issue 5-6, November 2015. Pages 443-473. Rinton Press

Abstract

A number of emerging crowd-based applications cover very different scenarios, including opinion mining, multimedia data annotation, localised information gathering, marketing campaigns, expert response gathering, and so on. In most of these scenarios, applications can be decomposed into tasks that collectively produce their results; tasks interactions give rise to arbitrarily complex workflows.

In this paper we propose methods and tools for designing crowd-based workflows as interacting tasks. We describe the modelling concepts that are useful in this framework, including typical workflow patterns, whose function is to decompose a cognitively complex task into simple interacting tasks for cooperative solving.

We then discuss how workflows and patterns are managed by CrowdSearcher, a system for designing, deploying and monitoring applications on top of crowd-based systems, including social networks and crowdsourcing platforms. Tasks performed by humans consist of simple operations which apply to homogeneous objects; the complexity of aggregating and interpreting task results is embodied within the framework. We show our approach at work on a validation scenario and we report quantitative findings, which highlight the effect of workflow design on the final results.

On the impact of knowledge extraction and aggregation on crowdsourced annotation of visual artworks

Jasper Oosterman, Jie Yang, Alessandro Bozzon, Lora Aroyo, Geert-Jan Houben
Journal Paper Computer Networks. Volume 90, 29 October 2015, Pages 133–149. Elsevier.

Abstract

Cultural heritage institutions more and more provide online access to their collections. Collections containing visual artworks need detailed and thorough annotations of the represented visual objects (e.g. plants or animals) to enable human access and retrieval. To make these suitable for access and retrieval, visual artworks need detailed and thorough annotations of the visual classes. Crowdsourcing has proven a viable tool to cater for the pitfalls of automatic annotation techniques. However, differently from traditional photographic image annotation, the artwork annotation task requires workers to possess the knowledge and skills needed to identify and recognise the occurrences of visual classes. The extent to which crowdsourcing can be effectively applied for artwork annotation is still an open research question. Based on a real-life case study from Rijksmuseum Amsterdam, this paper investigates the performance of a crowd of workers drawn from the CrowdFlower platform. Our contributions include a detailed analysis of crowd annotations based on two annotation configurations and a comparison of these crowd annotations with the ones from trusted annotators. In this study we apply a novel method for the automatic aggregation of local (i.e. bounding box) annotations, and we study how different knowledge extraction and aggregation configurations affect the identification and recognition aspects of artwork annotation. Our work sheds new light on the process of crowdsourcing artwork annotations, and shows how techniques that are effective for photographic image annotation cannot be straightforwardly applied to artwork annotation, thus paving the way for new research in the area.

A Platform for Urban Analytics and Semantic Data Integration in City Planning

Achilleas Psyllidis, Alessandro Bozzon, Stefano Bocconi, Christiaan Titos Bolivar
Conference Paper Computer-Aided Architectural Design Futures. The Next City - New Technologies and the Future of the Built Environment. Volume 527 of the series Communications in Computer and Information Science. pp 21-36. Springer.

Abstract

This paper presents a novel web-based platform that supports the analysis, integration, and visualization of large-scale and heterogeneous urban data, with application to city planning and decision-making. Motivated by the non-scalable character of conventional urban analytics methods, as well as by the interoperability challenges present in contemporary data silos, the illustrated system – coined SocialGlass – leverages the combined potential of diverse urban data sources. These include sensor and social media streams (Twitter, Instagram, Foursquare), publicly available municipal records, and resources from knowledge repositories. Through data science, semantic integration, and crowdsourcing techniques the platform enables the mapping of demographic information, human movement patterns, place popularity, traffic conditions, as well as citizens’ and visitors’ opinions and preferences about specific venues in a city. The paper further demonstrates an implemented prototype of the platform and its deployment in real-world use cases for monitoring, analyzing, and assessing city-scale events.

Harnessing Engagement for Knowledge Creation Acceleration in Collaborative Q&A Systems

Jie Yang, Alessandro Bozzon, Geert-Jan Houben
Conference Paper User Modeling, Adaptation and Personalization. Volume 9146 of the series Lecture Notes in Computer Science pp 315-327. Springer.

Abstract

Thanks to reputation and gamification mechanisms, collaborative question answering systems coordinate the process of topical knowledge creation of thousands of users. While successful, these systems face many challenges: on one hand, the volume of submitted questions overgrows the amount of new users willing, and capable, of answering them. On the other hand, existing users need to be retained and optimally allocated. Previous work demonstrates the positive effects that two important aspects, namely engagement and expertise valorisation, can have on user quality and quantity of participation. The magnitude of their effect can greatly vary across users and across topics. In this paper we advocate for a more in-depth study of the interplay that exists between user engagement factors in question answering systems. Our working hypothesis is that the process of knowledge creation can be accelerated by better understanding and exploiting the combined effects of the interests and expertise of users, with their intrinsic and extrinsic motivations. We perform a study over 6 years of data from the StackOverflow platform. By defining metrics of expertise and (intrinsic and extrinsic) motivations, we show how they distribute and correlate across platform’s users and topics. By means of an off-line question routing experiment, we show how topic-specific combinations of motivations and expertise can help accelerating the knowledge creation process.

The Inclusive Enterprise: Vision and Roadmap

Robert-Jan Sips, Alessandro Bozzon, Gerard Smit, Geert-Jan Houben
Conference Paper Engineering the Web in the Big Data Era Volume 9114 of the series Lecture Notes in Computer Science pp 621-624. Springer.

Abstract

For companies across the globe, building and sustaining a talent pipeline has become top priority. Job satisfaction is a core reason for employee retention and has shown to be more dependent on the organisational climate, which includes aspects such as working conditions, leadership and inclusion, than on variables such as structure, size, and pay, which are easy to quantify and control. This paper presents a vision of an “Inclusive Enterprise”, and elaborates on how a computer science approach can help to sense and foster inclusion and well-being in enterprise environments.

Social Glass: A Platform for Urban Analytics and Decision-making Through Heterogeneous Social Data (Demo)

Stefano Bocconi, Alessandro Bozzon, Achilleas Psyllidis, Christiaan Titos Bolivar, Geert-Jan Houben
Conference Paper Proceedings of the 24th International Conference on World Wide Web. Pages 175-178. ACM New York, NY, USA ©2015 .

Abstract

This demo presents Social Glass, a novel web-based platform that supports the analysis, valorisation, integration, and visualisation of large-scale and heterogeneous urban data in the domains of city planning and decision-making. The platform systematically combines publicly available social datasets from municipalities together with social media streams (e.g. Twitter, Instagram and Foursquare) and resources from knowledge repositories. It further enables the mapping of demographic information, human movement patterns, place popularity, traffic conditions, as well as citizens' and visitors' opinions and preferences with regard to specific venues in the city. Social Glass will be demonstrated through several real-world case studies, that exemplify the framework's conceptual properties, and its potential value as a solution for urban analytics and city-scale event monitoring and assessment.

Locating domain-specific contents and experts on social bookmarking communities

Simon Kassing, Jasper Oosterman, Alessandro Bozzon, Geert-Jan Houben
Conference Paper Proceedings of the 30th Annual ACM Symposium on Applied Computing. Pages 747-752. ACM New York, NY, USA ©2015 .

Abstract

Social bookmarking communities are now major content production platforms. There, millions of users interact every day on a great variety of knowledge domains, creating new contents, linking to existing ones, and engaging in constructive discussions. Relevant domain-specific content is often mixed with less useful contributions, and domain experts often have to find their way through lurkers and Web trolls. Such a diversity in topics and quality is a distinctive property of this class of Web sites. This diversity interferes with the ability to locate relevant content and users, and this hinders the usage of social bookmarking communities for tasks such as structured knowledge creation, or crowdsourcing. In this paper we investigate how relevant domain-specific content, in the form of submissions shared by (expert) users, can be effectively located in the social bookmarking platform reddit. We contribute with a framework process for the identification and characterisation of domain-specific content and knowledgeable users, and apply it to the reddit platform. Our work provides novel insights into the properties and dynamics of reddit, and represents an important step towards a better use of social bookmarking communities as a source of knowledge and expertise.

Harnessing Heterogeneous Social Data to Explore, Monitor, and Visualize Urban Dynamics

Achilleas Psyllidis, Alessandro Bozzon, Stefano Bocconi, Christiaan Titos Bolivar
Conference Paper Planning Support Systems and Smart Cities: Proceedings of the 14th international conference on computers in urban planning and urban management (CUPUM), CUPUM 2015 conference Cambridge (USA), July 7-10, 2015, Pages 1-22. MIT

Abstract

Understanding the complexity of urban dynamics requires the combination of information from multiple city data sources. Besides traditional urban data, geo-localized social media provide human-generated content, which may reflect in (near) real time the activities people undertake in cities. The challenge is to devise methods and tools that enable the integration and analysis of such heterogeneous sources of information. Motivated by this, we developed SocialGlass, a novel web-based application framework to explore, monitor, and visualize urban dynamics. By deploying our platform in three real-world use cases, the paper elaborates on the benefits and limitations of integrating social media with related city datasets. It further shows how the inherent spatiotemporal, demographic, and contextual diversities of social data influence the interpretations of (dynamic) urban phenomena.

E-WISE: An Expertise-Driven Recommendation Platform for Web Question Answering Systems (Demo)

Jie Yang, Alessandro Bozzon, Geert-Jan Houben
Conference Paper Engineering the Web in the Big Data Era. Volume 9114 of the series Lecture Notes in Computer Science. pp 691-694. Springer.

Abstract

This demo presents E-WISE, an expertise-driven recommendation platform built upon Web Question Answering (QA) systems to assist askers in question-answering process. Despite that crowdsourcing knowledge (e.g., on-line question-answering) is becoming increasingly important, it remains a big challenge to accelerate its process. E-WISE blends the recently developed methods for knowledge crowdsourcing acceleration, including 1) an edit suggestion component to improve question quality; 2) a question routing component that suggests a list of ranked answerers. Both components are automatic, and meanwhile enable a human controlling part: askers can make their decisions in selecting the right edits/answerers among the suggested ones, which guarantees the effectiveness of the suggesting components and provides feedback to the suggesting methods. E-WISE will be demonstrated through a case study on Stack Overflow – a popular QA systems, to exemplify its functions and potential in on-line knowledge creation.

Recommending Venues Using Continuous Predictive Social Media Analytics

Marco Balduini, Alessandro Bozzon, Emanuele Della Valle, Yi Huang, Geert-Jan Houben
Journal Paper IEEE Internet Computing, vol.18, no. 5, pp. 28-35, Sept.-Oct. 2014

Abstract

The authors' Continuous Predictive Social Media Analytics system operates in real time on social media streams and graphs to recommend venues to visitors of geo- and temporally bounded city-scale events. By combining deductive and inductive stream reasoning techniques with visitor-modeling functionalities, this system semantically analyzes and links visitors' social network activities to produce high-quality link predictions when information about preferences is sparse. The authors demonstrate their system's quality with experiments on real-world data.

Modeling CrowdSourcing Scenarios in Socially-Enabled Human Computation Applications

Alessandro Bozzon, Piero Fraternali, Luca Galli, Roula Karam
Journal Paper Journal on Data Semantics. Volume 3, Issue 3 , pp 169-188

Abstract

User models have been defined since the 1980s, mainly for the purpose of building context-based, user-adaptive applications. However, the advent of social networked media, serious games, and crowdsourcing/human computation platforms calls for a more pervasive notion of user model, capable of representing the multiple facets of social users and performers, including their social ties, interests, capabilities, activity history, and topical affinities. In this paper, we define a comprehensive model able to cater for all the aspects relevant for applications involving social networks and human computation; we capitalize on existing social user models and content description models, enhancing them with novel models for human computation and gaming activities representation. Finally, we report on our experiences in adopting the proposed model in the design and implementation of three socially enabled human computation platforms.

Textual and Content-Based Search in Repositories of Web Application Models

Bojana Bislimovska, Alessandro Bozzon, Marco Brambilla, and Piero Fraternali
Transaction Paper ACM Transactions on the Web (TWEB). Volume 8 Issue 2, March 2014, Article No. 11. ACM New York, NY, USA

Abstract

Model-driven engineering relies on collections of models, which are the primary artifacts for software development. To enable knowledge sharing and reuse, models need to be managed within repositories, where they can be retrieved upon users’ queries. This article examines two different techniques for indexing and searching model repositories, with a focus on Web development projects encoded in a domain-specific language. Keyword-based and content-based search (also known as query-by-example) are contrasted with respect to the architecture of the system, the processing of models and queries, and the way in which metamodel knowledge can be exploited to improve search. A thorough experimental evaluation is conducted to examine what parameter configurations lead to better accuracy and to offer an insight in what queries are addressed best by each system.

Method and system of management of queries for crowd searching

Stefani Ceri, Marco Brambilla, Alessandro Bozzon
Patent US Patent 8825701 B2. Publication date 2 Sep 2014. Filing date 16 Jul 2012.

Abstract

A computer-implemented method of management of queries for crowd searching is presented. In the method, an input model including input data e structured queries using query operators is mapped into an output model, which is obtained by modifying the input data and by adding the answers to the structured queries. Input data comprise at least one search object, which may be imported from a search system, at least one structured query operator corresponding to social interactions and a human-generated question related to the at least one search object. Mapping of the input model is according to a query task execution plan that defines a query task to be solved by a crowd, wherein mapping comprises selecting at least one social search engine running on a respective social platform and selecting one or more groups of responders interacting with the at least one social search engines and targeted to respond to the query task.

First International Workshop on User Interfaces for Crowdsourcing and Human Computation

Alessandro Bozzon, Lora Aroyo, Paolo Cremonesi
Workshop Paper Proceedings of the 2014 International Working Conference on Advanced Visual Interfaces Pages 398-400. ACM New York, NY, USA

Abstract

Recent years witnessed an explosion in the number and variety of data crowdsourcing initiatives. From OpenStreetMap to Amazon Mechanical Turk, developers and practitioners have been striving to create user interfaces able to effectively and efficiently support the creation, exploration, and analysis of crowdsourced information.

The extensive usage of crowdsourcing techniques brings a major change of paradigm with respect to traditional user interface for data collection and exploration, as effectiveness, speed, and interaction quality concerns play a central role in supporting very demanding incentives, including monetary ones.

The First International Workshop on User Interfaces for Crowdsourcing and Human Computation (CrowdUI 2014), co-located with the AVI 2014 conference, brought together researchers and practitioners from a wide range of areas interested in discussing the user interaction challenges posed by crowdsourcing systems.

Asking the right question in collaborative q&a systems

Jie Yang, Claudia Hauff, Alessandro Bozzon, Geert-Jan Houben
Conference Paper Proceedings of the 25th ACM conference on Hypertext and social media. Pages 179-189 ACM New York, NY, USA

Abstract

Collaborative Question Answering (cQA) platforms are a very popular repository of crowd-generated knowledge. By formulating questions, users express needs that other members of the cQA community try to collaboratively satisfy. Poorly formulated questions are less likely to receive useful responses, thus hindering the overall knowledge generation process. Users are often asked to reformulate their needs, adding specific details, providing examples, or simply clarifying the context of their requests.

Formulating a good question is a task that might require several interactions between the asker and other community members, thus delaying the actual answering and, possibly, decreasing the interest of the community in the issue. This paper contributes new insights to the study of cQA platforms by investigating the editing behaviour of users. We identify a number of editing actions, and provide a two-step approach for the automatic suggestion of the most likely editing actions to be performed for a newly created question.

We evaluated our approach in the context of the Stack Overflow cQA , demonstrating how, for given types of editing actions, it is possible to provide accurate reformulation suggestions.

Pattern-Based Specification of Crowdsourcing Applications (Best Paper Award)

Alessandro Bozzon, Marco Brambilla, Stefano Ceri, Andrea Mauri, Riccardo Volonterio
Conference Paper Proceedings of the 14th International Conference, ICWE 2014. Toulouse, France, July 1-4, 2014. Pages 218-235. Lecture Notes in Computer Science. Springer.

Abstract

In many crowd-based applications, the interaction with performers is decomposed in several tasks that, collectively, produce the desired results. Tasks interactions give rise to arbitrarily complex workflows. In this paper we propose methods and tools for designing crowd-based workflows as interacting tasks. We describe the modelling concepts that are useful in such framework, including typical workflow patterns, whose function is to decompose a cognitively complex task into simple interacting tasks so that the complex task is co-operatively solved.

We then discuss how workflows and patterns are managed by CrowdSearcher, a system for designing, deploying and monitoring applications on top of crowd-based systems, including social networks and crowdsourcing platforms. Tasks performed by humans consist of simple operations which apply to homogeneous objects; the complexity of aggregating and interpreting task results is embodied within the framework. We show our approach at work on a validation scenario and we report quantitative findings, which highlight the effect of workflow design on the final results.

Fashion 10000: an enriched social image dataset for fashion and clothing

Babak Loni, Lei Yen Cheung, Michael Riegler, Alessandro Bozzon, Luke Gottlieb, Martha Larson
Conference Paper Proceedings of the 5th ACM Multimedia Systems Conference. Pages 41-46. ACM New York, NY, USA.

Abstract

In this work, we present a new social image dataset related to the fashion and clothing domain. The dataset contains more than 32000 images, their context and social metadata. Furthermore the dataset is enriched with several types of annotations collected from the Amazon Mechanical Turk (AMT) crowdsourcing platform, which can serve as ground truth for various content analysis algorithms. This dataset has been successfully used at the Crowdsourcing task of the 2013 MediaEval Multimedia Benchmarking initiative. The dataset contributes to several research areas such as Crowdsourcing, multimedia content and context analysis as well as hybrid human/automatic approaches. In this paper, the dataset is described in detail and the dataset collection strategy, statistics, applications of dataset and its contribution to MediaEval 2013 is discussed.

Sparrows and Owls: Characterisation of Expert Behaviour in StackOverflow

Jie Yang, Ke Tao, Alessandro Bozzon, Geert-Jan Houben
Conference Paper Proceedings of 22nd International Conference, UMAP 2014, Aalborg, Denmark, July 7-11, 2014. Pages 266-277, Lecture Notes in Computer Science, Volume 8538, Springer International Publishing.

Abstract

Question Answering platforms are becoming an important repository of crowd-generated knowledge. In these systems a relatively small subset of users is responsible for the majority of the contributions, and ultimately, for the success of the Q/A system itself. However, due to built-in incentivization mechanisms, standard expert identification methods often misclassify very active users for knowledgable ones, and misjudge activeness for expertise.

This paper contributes a novel metric for expert identification, which provides a better characterisation of users’ expertise by focusing on the quality of their contributions. We identify two classes of relevant users, namely sparrows and owls, and we describe several behavioural properties in the context of the StackOverflow Q/A system. Our results contribute new insights to the study of expert behaviour in Q/A platforms, that are relevant to a variety of contexts and applications.

A study of the online profile of enterprise users in professional social networks

Alessandro Bozzon, Hariton Efstathiades, Geert-Jan Houben, Robert-Jan Sips
Conference Paper Proceedings of the companion publication of the 23rd international conference on World wide web companion Pages 487-492. International World Wide Web Conferences Steering Committee Republic and Canton of Geneva, Switzerland.

Abstract

Understanding the impact of corporate information publicly distributed on the Web is becoming more and more crucial. In this paper we report the result of a study that involved 130 IBM employees: we explored the correctness and extent of organisational information that can be observed from the online profiles of a company's employees. Our work contributes new insights to the study of social networks by showing that, even by considering a small fraction of the available online data, it is possible to discover accurate information about an organisation, its structure, and the factors that characterise the social reach of their employees.

Crowd vs. experts: nichesourcing for knowledge intensive tasks in cultural heritage (Poster)

Jasper Oosterman, Alessandro Bozzon, Geert-Jan Houben, Archana Nottamkandath, Chris Dijkshoorn, Lora Aroyo, Mieke H.R. Leyssen, Myriam C. Traub.
Conference Paper Proceedings of the companion publication of the 23rd international conference on World wide web companion Pages 567-568. International World Wide Web Conferences Steering Committee Republic and Canton of Geneva, Switzerland.

Abstract

The results of our exploratory study provide new insights to crowdsourcing knowledge intensive tasks. We designed and performed an annotation task on a print collection of the Rijksmuseum Amsterdam, involving experts and crowd workers in the domain-specific description of depicted flowers. We created a testbed to collect annotations from flower experts and crowd workers and analyzed these in regard to user agreement. The findings show promising results, demonstrating how, for given categories, nichesourcing can provide useful annotations by connecting crowdsourcing to domain expertise.

Crowdsourcing knowledge-intensive tasks in cultural heritage (Poster)

Jasper Oosterman, Archana Nottamkandath, Chris Dijkshoorn, Alessandro Bozzon, Geert-Jan Houben, Lora Aroyo
Conference Paper Proceedings of the 2014 ACM conference on Web science. Pages 267-268. ACM New York, NY, USA

Abstract

Large datasets such as Cultural Heritage collections require detailed annotations when digitised and made available online. Annotating different aspects of such collections requires a variety of knowledge and expertise which is not always possessed by the collection curators. Artwork annotation is an example of a knowledge intensive image annotation task, i.e. a task that demands annotators to have domain-specific knowledge in order to be successfully completed.

This paper describes the results of a study aimed at investigating the applicability of crowdsourcing techniques to knowledge intensive image annotation tasks. We observed a clear relationship between the annotation difficulty of an image, in terms of number of items to identify and annotate, and the performance of the recruited workers.

Towards a Top-K SPARQL Query Benchmark (Poster)

Shima Zahmatkesh, Emanuele Della Valle, Daniele Dell'Aglio, Alessandro Bozzon
Conference Paper Proceedings of the ISWC 2014 Posters & Demonstrations Track a track within the 13th International Semantic Web Conference, ISWC 2014, Riva del Garda, Italy, October 21, 2014. Pages 349-352. CEUR Workshop Proceedings 1272, CEUR-WS.org 2014

Abstract

The research on optimization of top-k SPARQL query would largely benefit from the establishment of a benchmark that allows com- paring different approaches. For such a benchmark to be meaningful, at least two requirements should hold: 1) the benchmark should resemble reality as much as possible, and 2) it should stress the features of the top- k SPARQL queries both from a syntactic and performance perspective. In this paper we propose Top-k DBPSB: an extension of the DBpedia SPARQL benchmark (DBPSB), a benchmark known to resemble real- ity, with the capabilities required to compare SPARQL engines on top-k queries.

A Case Study of Active, Continuous and Predictive Social Media Analytics for Smart City

Marco Balduini, Stefano Bocconi, Alessandro Bozzon, Emanuele Della Valle, Yi Huang, Jasper Oosterman, Themis Palpanas, Mikalai Tsytsarau
Workshop Paper Proceedings of the Fifth Workshop on Semantics for Smarter Cities a Workshop at the 13th International Semantic Web Conference (ISWC 2014), Riva del Garda, Italy, October 19, 2014. Pages 31-46. CEUR Workshop Proceedings 1280, CEUR-WS.org 2014

Abstract

Imagine you are in Milano for the Design Week. You have just spent a couple of days attending few nice events in Brera district. Which of the other hundreds of events spread around in Milano shall you attend now? This paper presents a system able to recommend venues to the visitors of such a city-scale event based on the digital footprints they left on Social Media. By combining deductive and inductive stream reasoning techniques with visitor-modeling functionality, this system semantically analyses and links visitors’ social network activities to produce high-quality recommendations even when information about visitors’ preferences for venues and events is sparse.

cItyAM: Managing Big Urban Data for Analyzing and Modelling Cities

Alessandro Bozzon, Claudia Hauff and Geert-Jan Houben
Journal Paper ERCIM News, Volume 2014, 2014

Abstract

The cItyAM platform has been developed as part of a research initiative started by the Web Information Systems group at the Delft University of Technology in 2012. It was driven and inspired by several European- and national-level initiatives including the City Data Fusion EIT ICT Lab project (conducted in collaboration with Politecnico di Milano, CNR Pisa, SIEMENS, and Telecom Italia), the Amsterdam Institute for Advanced Metropolitan Solutions (AMS) initiative (which includes WAGENINGEN UR, MIT, TNO, and other societal and industrial partners), the SHINE project, the ImReal EU FP7 project, and the COMMIT SEALINCMedia project (conducted in collaboration with the VU Amsterdam, CWI, the Rijksmuseum, and others), as well as the experience gained from the spinoff activities around Twitcident. The goal of the cItyAM platform is to progress urban data management, by advancing our ability to “feel the pulse” of cities: it provides methodologies and systems for data fusion, analysis and visualization.

Web Information Retrieval

Stefano Ceri, Alessandro Bozzon, Marco Brambilla, Emanuele Della Valle, Piero Fraternali, Silvia Quarteroni
Book Springer Verlag | Series: Data-Centric Systems and Applications | 282 pages, 120 illustrations | ISBN: 978-3-642-39313-6
image

Offers a unique combination of both traditional and Web-specific techniques of information retrieval

With the proliferation of huge amounts of (heterogeneous) data on the Web, the importance of information retrieval (IR) has grown considerably over the last few years. Big players in the computer industry, such as Google, Microsoft and Yahoo!, are the primary contributors of technology for fast access to Web-based information; and searching capabilities are now integrated into most information systems, ranging from business management software and customer relationship systems to social networks and mobile phone applications.

Ceri and his co-authors aim at taking their readers from the foundations of modern information retrieval to the most advanced challenges of Web IR. To this end, their book is divided into three parts. The first part addresses the principles of IR and provides a systematic and compact description of basic information retrieval techniques (including binary, vector space and probabilistic models as well as natural language search processing) before focusing on its application to the Web. Part two addresses the foundational aspects of Web IR by discussing the general architecture of search engines (with a focus on the crawling and indexing processes), describing link analysis methods (specifically Page Rank and HITS), addressing recommendation and diversification, and finally presenting advertising in search (the main source of revenues for search engines). The third and final part describes advanced aspects of Web search, each chapter providing a self-contained, up-to-date survey on current Web research directions. Topics in this part include meta-search and multi-domain search, semantic search, search in the context of multimedia data, and crowd search.

The book is ideally suited to courses on information retrieval, as it covers all Web-independent foundational aspects. Its presentation is self-contained and does not require prior background knowledge. It can also be used in the context of classic courses on data management, allowing the instructor to cover both structured and unstructured data in various formats. Its classroom use is facilitated by a set of slides, which can be downloaded from www.search-computing.org.

Order matters! Harnessing a world of orderings for reasoning over massive data

Emanuele Della Valle, Stefan Schlobach, Markus Krötzsch, Alessandro Bozzon, Stefano Ceri, Ian Horrocks
Journal Paper Semantic Web Journal. Volume 4, Number 2 / 2013. Pages: 219-231. IOS Press

Abstract

More and more applications require real-time processing of massive, dynamically generated, ordered data; order is an essential factor as it reflects recency or relevance. Semantic technologies risk being unable to meet the needs of such applications, as they are not equipped with the appropriate instruments for answering queries over massive, highly dynamic, ordered data sets. In this vision paper, we argue that some data management techniques should be exported to the context of semantic technologies, by integrating ordering with reasoning, and by using methods which are inspired by stream and rank-aware data management. We systematically explore the problem space, and point both to problems which have been successfully approached and to problems which still need fundamental research, in an attempt to stimulate and guide a paradigm shift in semantic technologies.

Exploratory search framework for Web data sources

Alessandro Bozzon, Stefano Ceri, Marco Brambilla, Davide Mazza
Journal Paper The VLDB Journal October 2013, Volume 22, Issue 5, pp 641-663

Abstract

Exploratory search is an information seeking behavior where users progressively learn about one or more topics of interest; it departs quite radically from traditional keyword-based query paradigms, as it combines querying and browsing of resources, and covers activities such as investigating, evaluating, comparing, and synthesizing retrieved information. In most cases, such activities are enabled by a conceptual description of information in terms of entities and their semantic relationships. Customized Web applications, where few applicative entities and their relationships are embedded within the application logics, typically provide some support to exploratory search, which is, however, specific for a given domain.

In this paper, we describe a general-purpose exploratory search framework, i.e., a framework which is neutral to the application logic. Our contribution consists of the formalization of the exploratory search paradigm over Web data sources, accessed by means of services; extracted information is described by means of an entity-relationship schema, which masks the service implementations. Exploratory interaction is supported by a general-purpose user interface including a set of widgets for data exploration, from big tables to atomic tables, visual diagrams, and geographic maps; the user interaction is translated to queries defined in SeCoQL, a SQL-like language and protocol specifically designed for supporting exploratory search over data sources. We illustrate the software architecture of our prototype, which uses the interplay of a query and result management system with an orchestrator, capable of incrementally building queries and of walking through the past navigation history. The distinctive feature of the framework is the ability to extract top solutions, which combine top-ranked entity instances.

We evaluate exploratory search from the end-user perspective in the context of a cognitive model for search, by studying the user’s behavior and the effectiveness of exploratory search in terms of quality of results produced by the search process; we also compare the effectiveness of interaction in using our multi-domain search system with the use of various replicas of the system, each acting upon a single domain, and with the use of conventional search engines.

Choosing the right crowd: expert finding in social networks

Alessandro Bozzon, Marco Brambilla, Stefano Ceri, Matteo Silvestri, Giuliano Vesci
Conference Paper Proceedings of the 16th International Conference on Extending Database Technology. Pages 637-648. ACM New York, NY, USA

Abstract

Expert selection is an important aspect of many Web applications, e.g., when they aim at matching contents, tasks or advertisement based on user profiles, possibly retrieved from social networks.

This paper focuses on selecting experts within the population of social networks, according to the information about the social activities of their users. We consider the following problem: given an expertise need (expressed for instance as a natural language query) and a set of social network members, who are the most knowledgeable people for addressing that need? We considers social networks both as a source of expertise information and as a route to reach expert users, and define models and methods for evaluating people's expertise by considering their profiles and by tracing their activities in social networks. For matching queries to social resources, we use both text analysis and semantic annotation. An extensive set of experiments shows that the analysis of social activities, social relationships, and socially shared contents helps improving the effectiveness of an expert finding system.

Reactive crowdsourcing

Alessandro Bozzon, Marco Brambilla, Stefano Ceri, Andrea Mauri
Conference Paper Proceedings of the 22nd international conference on World Wide Web. Pages 153-164. International World Wide Web Conferences Steering Committee Republic and Canton of Geneva, Switzerland

Abstract

An essential aspect for building effective crowdsourcing com- putations is the ability of "controlling the crowd", i.e. of dynamically adapting the behaviour of the crowdsourcing systems as response to the quantity and quality of completed tasks or to the availability and reliability of performers. Most crowdsourcing systems only provide limited and predefined controls; in contrast, we present an approach to crowdsourcing which provides fine-level, powerful and flexible controls. We model each crowdsourcing application as composition of elementary task types and we progressively transform these high level specifications into the features of a reactive execution environment that supports task planning, assignment and completion as well as performer monitoring and exclusion. Controls are specified as active rules on top of data structures which are derived from the model of the application; rules can be added, dropped or modified, thus guaranteeing maximal flexibility with limited effort.

We also report on our prototype platform that implements the proposed framework and we show the results of our experimentations with different rule sets, demonstrating how simple changes to the rules can substantially affect time, effort and quality involved in crowdsourcing activities.

An Introduction to Human Computation and Games with a Purpose (Tutorial)

Alessandro Bozzon, Luca Galli
Conference Paper Proceedings of the 13th International Conference, ICWE 2013, Aalborg, Denmark, July 8-12, 2013. Pages 514-517. Lecture Notes in Computer Science. Series Volume 7977. Springer Berlin Heidelberg

Abstract

Crowdsourcing and human computation are novel disciplines that enable the design of computation processes that include humans as actors for task execution. In such a context, Games With a Purpose are an effective mean to channel, in a constructive manner, the human brainpower required to perform tasks that computers are unable to perform, through computer games. This tutorial introduces the core research questions in human computation, with a specific focus on the techniques required to manage structured and unstructured data. The second half of the tutorial delves into the field of game design for serious task, with an emphasis on games for human computation purposes. Our goal is to provide participants with a wide, yet complete overview of the research landscape; we aim at giving practitioners a solid understanding of the best practices in designing and running human computation tasks, while providing academics with solid references and, possibly, promising ideas for their future research activities.

Personalized Nichesourcing: Acquisition of Qualitative Annotations from Niche Communities

Chris Dijkshoorn, Mieke H. R. Leyssen, Archana Nottamkandath, Jasper Oosterman, Myriam C. Traub, Lora Aroyo, Alessandro Bozzon, Wan Fokkink, Geert-Jan Houben, Henrike Hovelmann, Lizzy Jongma, Jacco van Ossenbruggen, Guus Schreiber, Jan Wielemaker
Workshop Paper Late-Breaking Results, Project Papers and Workshop Proceedings of the 21st Conference on User Modeling, Adaptation, and Personalization., Rome, Italy, June 10-14, 2013. CEUR Workshop Proceedings 997, CEUR-WS.org 2013

Abstract

Diversity and profundity of the topics in cultural heritage collections make experts from outside the institution indispensable for acquiring qualitative and comprehensive annotations. We define the concept of nichesourcing and present challenges in the process of obtaining qualitative annotations from people in these niches. We believe that experts provide better annotations if this process is personalized. We present a framework called Accurator, that allows to realize and evaluate strategies and applications for personalized nichesourcing.

Web Data Management through Crowdsourcing Upon Social Networks

Marco Brambilla, Alessandro Bozzon
Conference Paper ASONAM, 2012, 2012 IEEE/ACM International Conference on Advances in Social Network Analysis and Mining, 2012 IEEE/ACM International Conference on Advances in Social Network Analysis and Mining 2012, pp. 1123-1127.

Abstract

Retrieval and management of Web data is becoming a more and more complex problem, due to the amount of information to be dealt with, to the diversity of the information sources and of the data formats, and to the evolving expectations of users. In particular, some tasks such as quality assessment, opinion making, and sense extraction cannot be completely delegated to automatic procedures. More and more users are increasingly relying on social interaction to complete and validate the results of their online activities. For instance, scouting "interesting" results, or suggesting new, unexpected search directions in information seeking processes occurs in most times aside of the search systems and processes, possibly instrumented and mediated by a social network. In this paper we propose paradigm that embodies crowds and social network communities as first-class sources for the information management and extraction on the Web. Our approach aims at filling the gap between traditional Web systems (CMS, search engines and others), which operate upon world-wide information, with social systems, capable of interacting with real people, in real time, to capture their opinions, suggestions, and emotions by leveraging crowd sourcing practices and making them viable upon a social network. This enormously enriches the data manipulation experience for the user can be enormously enriched.

Diversification for Multi-domain Result Sets

Alessandro Bozzon, Marco Brambilla, Piero Fraternali, Marco Tagliasacchi
Conference Paper Proceedings of 12th International Conference, ICWE 2012, Berlin, Germany, July 23-27, 2012. Pages 137-152.Lecture Notes in Computer Science. Series Volume 7387. Springer Berlin Heidelberg

Abstract

Multi-domain search answers to queries spanning multiple entities, like “Find a hotel in Milan close to a concert venue, a museum and a good restaurant”, by producing ranked sets of entity combinations that maximize relevance, measured by a function expressing the user’s preferences. Due to the combinatorial nature of results, good entity instances (e.g., five stars hotels) tend to appear repeatedly in top-ranked combinations. To improve the quality of the result set, it is important to balance relevance with diversity, which promotes different, yet almost equally relevant, entities in the top-k combinations. This paper explores two different notions of diversity for multi-domain result sets, compares experimentally alternative algorithms for the trade-off between relevance and diversity, and performs a user study for evaluating the utility of diversification in multi-domain queries.

Modeling End-Users as Contributors in Human Computation Applications

Roula Karam, Piero Fraternali, Alessandro Bozzon, Luca Galli
Conference Paper Proceedings of 2nd International Conference, MEDI 2012, Poitiers, France, October 3-5, 2012. Pages 3-15.Lecture Notes in Computer Science. Series Volume 7602. Springer Berlin Heidelberg

Abstract

User models have been defined since the ’80s, mainly for the purpose of building context-based, user-adaptive applications. However, the advent of social networked media, serious games, and crowdsourcing platforms calls for a more pervasive notion of user model, capable of representing the multiple facets of a social user, including his social ties, capabilities, activity history, and topical affinities. In this paper, we overview several user models proposed recently to address the platform-independent representation of users embedded in a social context, and discuss the features of the CUbRIK user model, which is designed to support multi-platform human computation applications where users are called as collaborators in the resolution of complex tasks found in the multimedia information retrieval field.

Efficient Execution of Top-K SPARQL Queries

Sara Magliacane, Alessandro Bozzon, Emanuele Della Valle
Conference Paper Proceedings of 11th International Semantic Web Conference, Boston, MA, USA, November 11-15, 2012. Pages 344-360. Lecture Notes in Computer Science. Series Volume 7649. Springer Berlin Heidelberg

Abstract

Top-k queries, i.e. queries returning the top k results ordered by a user-defined scoring function, are an important category of queries. Order is an important property of data that can be exploited to speed up query processing. State-of-the-art SPARQL engines underuse order, and top-k queries are mostly managed with a materialize-then-sort processing scheme that computes all the matching solutions (e.g. thousands) even if only a limited number k (e.g. ten) are requested. The SPARQL-RANK algebra is an extended SPARQL algebra that treats order as a first class citizen, enabling efficient split-and-interleave processing schemes that can be adopted to improve the performance of top-k SPARQL queries. In this paper we propose an incremental execution model for SPARQL-RANK queries, we compare the performance of alternative physical operators, and we propose a rank-aware join algorithm optimized for native RDF stores. Experiments conducted with an open source implementation of a SPARQL-RANK query engine based on ARQ show that the evaluation of top-k queries can be sped up by orders of magnitude.

Answering search queries with CrowdSearcher

Alessandro Bozzon, Marco Brambilla, Stefano Ceri
Conference Paper Proceedings of the 21st international conference on World Wide Web. Pages 1009-1018. ACM New York, NY, USA

Abstract

Web users are increasingly relying on social interaction to complete and validate the results of their search activities. While search systems are superior machines to get world-wide information, the opinions collected within friends and expert/local communities can ultimately determine our decisions: human curiosity and creativity is often capable of going much beyond the capabilities of search systems in scouting "interesting" results, or suggesting new, unexpected search directions. Such personalized interaction occurs in most times aside of the search systems and processes, possibly instrumented and mediated by a social network; when such interaction is completed and users resort to the use of search systems, they do it through new queries, loosely related to the previous search or to the social interaction. In this paper we propose CrowdSearcher, a novel search paradigm that embodies crowds as first-class sources for the information seeking process.

CrowdSearcher aims at filling the gap between generalized search systems, which operate upon world-wide information - including facts and recommendations as crawled and indexed by computerized systems - with social systems, capable of interacting with real people, in real time, to capture their opinions, suggestions, emotions. The technical contribution of this paper is the discussion of a model and architecture for integrating computerized search with human interaction, by showing how search systems can drive and encapsulate social systems. In particular we show how social platforms, such as Facebook, LinkedIn and Twitter, can be used for crowdsourcing search-related tasks; we demonstrate our approach with several prototypes and we report on our experiment upon real user communities.

A Model-Driven Approach for Crowdsourcing Search

Alessandro Bozzon, Marco Brambilla, Andrea Mauri
Workshop Paper Proceedings of the First International Workshop on Crowdsourcing Web Search, Lyon, France, April 17, 2012. Pages 31-35. CEUR Workshop Proceedings 842, CEUR-WS.org 2012

Abstract

Even though search systems are very efficient in retrieving world-wide information, they can not capture some peculiar aspects and features of user needs, such as subjective opin- ions and recommendations, or information that require local or domain specific expertise. In this kind of scenario, the hu- man opinion provided by an expert or knowledgeable user can be more useful than any factual information retrieved by a search engine.

In this paper we propose a model-driven approach for the specification of crowd-search tasks, i.e. activities where real people – in real time – take part to the generalized search process that involve search engines. In particular we define two models: the“Query Task Model”, representing the meta- model of the query that is submitted to the crowd and the associated answers; and the“User Interaction Model”, which shows how the user can interact with the query model to fulfill her needs. Our solution allows for a top-down design approach, from the crowd-search task design, down to the crowd answering system design. Our approach also grants automatic code generation thus leading to quick prototyping of search applications based on human responses collected over social networking or crowdsourcing platforms.

A Framework for Crowdsourced Multimedia Processing and Querying

Alessandro Bozzon, Ilio Catallo, Eleonora Ciceri, Piero Fraternali, Davide Martinenghi, Marco Tagliasacchi
Workshop Paper Proceedings of the First International Workshop on Crowdsourcing Web Search, Lyon, France, April 17, 2012. Pages 42-47. CEUR Workshop Proceedings 842, CEUR-WS.org 2012

Abstract

This paper introduces a conceptual and architectural framework for addressing the design, execution and verification of tasks by a crowd of performers. The proposed framework is substantiated by an ongoing application to a problem of trademark logo detection in video collections. Preliminary results show that the contribution of crowds can improve the recall of state-of-the-art traditional algorithms, with no loss in terms of precision. However, task-to-executor matching, as expected, has an important influence on the task performance.

The CUBRIK project: human-enhanced time-aware multimedia search

Piero Fraternali, Marco Tagliasacchi, Davide Martinenghi, Alessandro Bozzon, Ilio Catallo, Eleonora Ciceri, Francesco Saverio Nucci, Vincenzo Croce, Ismail Sengör Altingövde, Wolf Siberski, Fausto Giunchiglia, Wolfgang Nejdl, Martha Larson, Ebroul Izquierdo, Petros Daras, Otto Chrons, Ralph Traphöner, Björn Decker, John Lomas, Patrick Aichroth, Jasminko Novak, Ghislain Sillaume, Fernando Sánchez-Figueroa, Carolina Salas-Parra.
Workshop Paper Proceedings of the 21st international conference companion on World Wide Web. Pages 259-262. ACM New York, NY, USA

Abstract

The Cubrik Project is an Integrated Project of the 7th Framework Programme that aims at contributing to the multimedia search domain by opening the architecture of multimedia search engines to the integration of open source and third party content annotation and query processing components, and by exploiting the contribution of humans and communities in all the phases of multimedia search, from content processing to query processing and relevance feedback processing. The CUBRIK presentation will showcase the architectural concept and scientific background of the project and demonstrate an initial scenario of human-enhanced content and query processing pipeline.

A Framework for Integrating, Exploring, and Searching Location-Based Web Data

Alessandro Bozzon, Stefano Ceri, Marco Brambilla, Silvia Quarteroni
Journal Paper IEEE Internet Computing, vol.15, no. 6, pp. 24-31, Nov.-Dec. 2011.

Abstract

This article presents the adaptation of a general search computing framework for exploratory search over Web data as suggested by the specificity of location-based data services. The result is a conceptual model of geographic entities, the spatial functions operating on them, and a special-purpose exploratory interface that lets users search combinations of georeferenced objects directly on a map. Such modifications help the general framework provide ranked extraction of relevant objects and their combinations, custom ranking functions, and cost-based access to location-based services.

Diversification for multi-domain result sets (Poster)

Alessandro Bozzon, Marco Brambilla, Piero Fraternali, Marco Tagliasacchi
Conference Paper Proceedings of the 20th ACM international conference on Information and knowledge management. Pages 1905-1908. ACM New York, NY, USA

Abstract

Multi-domain search answers to queries spanning multiple entities, like "Find an affordable house in a city with low criminality index, good schools and medical services", by producing ranked sets of entity combinations that maximize relevance, measured by a function expressing the user's preferences. Due to the combinatorial nature of results, good entity instances (e.g., inexpensive houses) tend to appear repeatedly in top-ranked combinations. To improve the quality of the result set, it is important to balance relevance (i.e., high values of the ranking function) with diversity, which promotes different, yet almost equally relevant, entities in the top-k combinations. This paper explores two different notions of diversity for multi-domain result sets, compares experimentally alternative algorithms for the trade-off between relevance and diversity, and performs a user study for evaluating the utility of diversification in multi-domain queries.

The Anatomy of a Multi-domain Search Infrastructure (Invited)

Stefano Ceri, Alessandro Bozzon, Marco Brambilla
Conference Paper Proceedings of the 11th International Conference, ICWE 2011, Paphos, Cyprus, June 20-24, 2011. Pages 1-12. Lecture Notes in Computer Science, Volume 6757, Springer Berlin Heidelberg

Abstract

Current search engines do not support queries that require a complex combination of information. Problems such as “Which theatre offers an at least-three-stars action movie in London close to a good Italian restaurant” can only be solved by asking multiple queries, possibly to different search engines, and then manually combining results, thereby performing “data integration in the brain.” While searching the Web is the preferred method for accessing information in everyday’s practice, users expect that search systems will soon be capable of mastering complex queries. However, combining information requires a drastic change of perspective: a new generation of search computing systems is needed, capable of going beyond the capabilities of current search engines. In this paper we show how search computing should open to modular composition, as many other kinds of software computations. We first motivate our work by describing our vision, and then describe how the challenges of multi-domain search are addressed by a prototype framework, whose internal “anatomy” is disclosed.

Graph-Based Search over Web Application Model Repositories

Bojana Bislimovska, Alessandro Bozzon, Marco Brambilla, Piero Fraternali
Conference Paper Proceedings of the 11th International Conference, ICWE 2011, Paphos, Cyprus, June 20-24, 2011. Pages 90-104. Lecture Notes in Computer Science, Volume 6757, Springer Berlin Heidelberg

Abstract

Model Driven Development may attain substantial productivity gains by exploiting a high level of reuse, across the projects of a same organization or public model repositories. For reuse to take place, developers must be able to perform effective searches across vast collections of models, locate model fragments of potential interest, evaluate the usefulness of the retrieved artifacts and eventually incorporate them in their projects. Given the variety of Web modeling languages, from general purpose to domain specific, from computation independent to platform independent, it is important to implement a search framework capable of harnessing the power of models and of flexibly adapting to the syntax and semantics of the modeling language. In this paper, we explore the use of graph-based similarity search as a tool for expressing queries over model repositories, uniformly represented as collections of labeled graphs. We discuss how the search approach can be parametrized and the impact of the parameters on the perceived quality of the search results.

Exploratory Multi-domain Search on Web Data Sources with Liquid Queries (Demo)

Davide Francesco Barbieri, Alessandro Bozzon, Marco Brambilla, Stefano Ceri, Chiara Pasini, Luca Tettamanti, Salvatore Vadacca, Riccardo Volonterio, Srđan Zagorac
Conference Paper Proceedings of the 11th International Conference, ICWE 2011, Paphos, Cyprus, June 20-24, 2011. Pages 363-366. Lecture Notes in Computer Science, Volume 6757, Springer Berlin Heidelberg

Abstract

We demonstrate Liquid Queries, a novel user interaction paradigm for exploratory multi-domain search upon structured information collected from heterogeneous data sources. Liquid Queries support an exploratory search approach by providing a set of interaction primitives for multi-domain query formulation, result visualization and query refinement, with commands for perusing the result set, changing the visualization of data based on their type (e.g., geographical) and interacting with the remote search services.

Model-Based Dynamic and Adaptive Visualization for Multi-domain Search Results (Poster)

Alessandro Bozzon, Marco Brambilla, Luca Cioria, Piero Fraternali, Maristella Matera
Conference Paper Proceedings of the 11th International Conference, ICWE 2011, Paphos, Cyprus, June 20-24, 2011. Pages 367-370. Lecture Notes in Computer Science, Volume 6757, Springer Berlin Heidelberg

Abstract

Search systems are becoming increasingly sophisticated in their capacity of building result sets that are not mere lists of documents but articulated combinations of concepts retrieved from different domains. This paper investigates the models for the result sets and the visualization spaces, and model-to-model transformations to dynamically suggest optimized visualizations for multi-domain search results.

A Constraint Programming Approach to Automatic Layout Definition for Search Results (Poster)

Alessandro Bozzon, Marco Brambilla, Laura Cigardi, Sara Comai
Conference Paper Proceedings of the 11th International Conference, ICWE 2011, Paphos, Cyprus, June 20-24, 2011. Pages 371-374. Lecture Notes in Computer Science, Volume 6757, Springer Berlin Heidelberg

Abstract

In this paper we describe a general framework based on constraint programming techniques to address the automatic layout definition problem for Web search result pages, considering heterogeneous result items types (e.g., web links, images, videos, maps, etc.). Starting from the entity type(s) specified in the search query and the result types deemed more relevant for the given entity type, we define an optimization problem and a set of constraints that grant the optimal positioning of results in the page, modeled as a grid with assigned weights depending on the visibility.

Search computing: multi-domain search on ranked data (Demo)

Alessandro Bozzon, Daniele Braga, Marco Brambilla, Stefano Ceri, Francesco Corcoglioniti, Piero Fraternali, Salvatore Vadacca
Conference Paper Proceedings of the 2011 ACM SIGMOD International Conference on Management of data. Pages 1267-1270. ACM New York, NY, USA

Abstract

We demonstrate the Search Computing framework for multi-domain queries upon ranked data collected from Web sources. Search Computing answers to queries like "Find a good Jazz concert close to a specified location, a good restaurant and a hotel at walking distance" and fills the gap between generic and domain-specific search engines, by proposing new methods, techniques, interfaces, and tools for building search-based applications spanning multiple data services. The main enabling technology is an execution engine supporting methods for rank-join execution upon ranked data sources, abstracted and wrapped by means of a unifying service model. The demo walks through the interface for formulating multi-domain queries and follows the steps of the query engine that builds the result, with the help of run-time monitors that clearly explain the system's behavior. Once results are extracted, the demonstration shows several approaches for visualizing results and exploring the information space.

Exploratory search in multi-domain information spaces with liquid query (Demo)

Alessandro Bozzon, Marco Brambilla, Stefano Ceri, Piero Fraternali, Salvatore Vadacca
Conference Paper Proceedings of the 20th international conference companion on World wide web. Pages 189-192. ACM New York, NY, USA

Abstract

Search Computing (SeCo) aims at building search applications that bridge the gap between general-purpose and vertical search engines. SeCo queries extract ranked information about several interconnected domains, such as "hotels", "restaurants" or "concerts", by interacting with Web data sources which are wrapped as search services; an example of query is: "Find a good Jazz concert close to the user's current location, together with close-by good restaurants and hotels". The SeCo system supports the deployment of search applications, by providing a generic software architecture and the tools for service and query registration, for query formulation and execution, and for result browsing.

In this demo paper, we focus on the Liquid Query (LQ) interface which supports the iteration over query formulation, result visualization and query refinement, with commands for perusing the result set, changing the visualization of data based on their type (e.g., geographical or temporal) and interacting with the remote search services. It also supports an exploratory search approach, where the user starts by accessing one data source (e.g., an event listing for finding interesting concerts), then is assisted in progressively joining other correlated sources in an interactive exploration of the search space. The exploration paths can be chosen on the fly and the navigation history can be browsed back and forth for cross-checking the retrieved options.

A Conceptual Framework for Linked Data Exploration

Alessandro Bozzon, Marco Brambilla, Emanuele Della Valle, Piero Fraternali, Chiara Pasini
Workshop Paper Workshops, Doctoral Symposium, and Tutorials, Held at ICWE 2011, Paphos, Cyprus, June 20-21, 2011. Revised Selected Papers. Pages 109-118. Lecture Notes in Computer Science, Volume 7059. Springer Berlin Heidelberg

Abstract

An increasing number of open data sets is becoming available on the Web as Linked Data (LD), many efforts has been devoted to show the potential of LD applications from the technical point of view. However, less attention has been paid to the analysis of the information seeking requirements from the user point of view. In this paper we examine the Information Seeking Process and we propose a general framework that address all its requirements in the context of LD-based applications. We support seamless integration of both Linked and non-Linked data sources and we allow designers to define complex, rank-aware result construction and exploration rules based on rank aggregation and multiple many-to-many data navigation.

Understanding Web Data Sources for Search and Exploration

Alessandro Bozzon, Marco Brambilla, Stefano Ceri, Emanuele Della Valle, Silvia Quarteroni
Workshop Paper Workshop on Discovering Meaning On the Go in Large Heterogeneous Data 2011 (LHD-11). Held at the Twenty-second International Joint Conference on Artificial Intelligence (IJCAI-11). Barcelona, Spain, July 16, 2011

Abstract

An increasing number of open data sets is becoming available on the Web as Linked Data (LD), many efforts has been devoted to show the potential of LD applications from the technical point of view. However, less attention has been paid to the analysis of the information seeking requirements from the user point of view. In this paper we examine the Information Seeking Process and we propose a general framework that address all its requirements in the context of LD-based applications. We support seamless integration of both Linked and non-Linked data sources and we allow designers to define complex, rank-aware result construction and exploration rules based on rank aggregation and multiple many-to-many data navigation.

Dynamic Visualizations for Multi-Domain Search Results

Alessandro Bozzon, Luca Cioria, Piero Fraternali, Maristella Matera
Workshop Paper Proceedings of the First International Workshop on Searching and Integrating New Web Data Sources - Very Large Data Search, Seattle, WA, USA, September 2, 2011. CEUR Workshop Proceedings 880, CEUR-WS.org 2011

Abstract

Search systems are becoming increasingly sophisticated in their capacity of building results that are not mere lists of documents but articulated sets of concepts retrieved from different domains. As the search result sets exhibit more structure, techniques are required to visualize the retrieved objects in a way that facilitates the immediate understanding of their properties and relationships. This paper investigates the use of models to represent both result sets and visualization spaces, and of model-to-model transformations to dynamically suggest an optimized result visualization for multi-domain search.

Search Computing Meets Data Extraction

Alessandro Bozzon, Tim Furche, Giorgio Orsi, Chiara Pasini, Luca Tettamanti, Salvatore Vadacca
Workshop Paper Proceedings of the First International Workshop on Searching and Integrating New Web Data Sources - Very Large Data Search, Seattle, WA, USA, September 2, 2011. CEUR Workshop Proceedings 880, CEUR-WS.org 2011

Abstract

Thanks to the Web, access to an increasing wealth and variety of information has become near instantaneous. To make informed decisions, however, we often need to access data from many different sources and integrate different types of information. Manually collecting data from scores of web sites and combining that data remains a daunting task. The ERC projects SeCo (Search Computing) and DIADEM (Domain-centric Intelligent Automated Data Extraction Methodology) address two aspects of this problem: SeCo supports complex search processes drawing on data from multiple domains with a user interface capable of refining and exploring the search results. DIADEM aims to automatically extract structured data from a domain’s websites. In this paper, we outline a first approach for integrating SeCo and DIADEM. We discuss how to use the DIADEM methodology to automatically turn nearly any website from a given domain into a SeCo search service. We describe how such services can be registered and exploited by the SeCo framework in combination with services from other domains (and possibly developed with other methodologies)

Search Computing: Managing Complex Search Queries

Stefano Ceri, Adnan Abid, Mamoun Abu Helou, Davide Barbieri, Alessandro Bozzon, Daniele Braga, Marco Brambilla, Alessandro Campi, Francesco Corcoglioniti, Emanuele Della Valle, Davide Eynard, Piero Fraternali, Michael Grossniklaus, Davide Martinenghi, Stefania Ronchi, Marco Tagliasacchi, Salvatore Vadacca.
Journal Paper IEEE Internet Computing, vol.14, no. 6, pp. 14-22, November/December 2010.

Abstract

Search computing focuses on building answers to complex search queries (for example, "Where can I attend an interesting conference in my field near a sunny beach?") by interacting with a constellation of cooperating search services, and using result ranking and joining as the dominant factors for service composition. The service computing paradigm has so far been neutral to the specific features of search applications and services. To address this weakness, search computing advocates a new approach in which search, join, and ranking are the central aspects for service composition.

Engineering rich internet applications with a model-driven approach

Piero Fraternali, Sara Comai, Alessandro Bozzon, Giovanni Toffetti Carughi
Transaction Paper ACM Transactions on the Web (TWEB). Volume 4 Issue 2, April 2010. Article No. 7. ACM New York, NY, USA

Abstract

Rich Internet Applications (RIAs) have introduced powerful novel functionalities into the Web architecture, borrowed from client-server and desktop applications. The resulting platforms allow designers to improve the user's experience, by exploiting client-side data and computation, bidirectional client-server communication, synchronous and asynchronous events, and rich interface widgets. However, the rapid evolution of RIA technologies challenges the Model-Driven Development methodologies that have been successfully applied in the past decade to traditional Web solutions. This paper illustrates an evolutionary approach for incorporating a wealth of RIA features into an existing Web engineering methodology and notation. The experience demonstrates that it is possible to model RIA application requirements at a high-level using a platform-independent notation, and generate the client-side and server-side code automatically. The resulting approach is evaluated in terms of expressive power, ease of use, and implementability.