Jump to content

Submissions/Mass collaboration data projects and policies

From Wikimania 2012 • Washington, D.C., USA

This is a rejected submission for Wikimania 2012.

Submission no.


Title of the submission
Mass collaboration data projects and policies
Type of submission (workshop, tutorial, panel, presentation)
Author of the submission
Tyng-Ruey Chuang
E-mail address
Tyng-Ruey Chuang
Country of origin
Affiliation, if any (organization, company etc.)
Academica Sinica
Personal homepage or blog
Abstract (at least 300 words to describe your proposal)

Collaborations on content generation and reuse can both produce and consume various types of resources, including, among others, text, code, dataset, public domain repository, activity log and community record. These practices are increasingly multi-modal and involve many actors. How do people frame jointly created resources to allow for reuse and to encourage collaboration? Often there is an anxiety in balancing freedom (about content reuse) and fairness (no free rider). For many projects that produce copyrightable works, public licenses such as the GNU GPL and the CC BY-SA license have been instrumental in maintaining a boundary of collaboration and in regulating the use of joint output.

For data-intensive projects, we are in an early stage of framing the various data sharing issues. Rights about datasets are not homogeneous across jurisdiction boundaries, and there exist different practices. Some projects consider their datasets to be in the public domain. Some projects require attribution or citation but otherwise their datasets are free to use. Some insist on data integrity while others may impose share-alike conditions on data reuse. In cases where public licenses are used to release datasets, some projects may at the same time ask for contributor agreements and/or specify terms of service. It seems there is a large space for discussions, especially on the the practical means (legal tools or not) for governing the production and reuse of data for collaborative projects.

However, mass collabration around creating and curating data is not a new phenomena, ranging from Project Scoresheet to MusicBrainz to Freebase to OpenStreetMap to DBpedia and more. Furthermore, Linked Data is making it more possible than ever for more latent forms of mass collaboration around data to occur, and for mass collaboration projects to ingest, curate, improve, etc external datasets.

By organizing this panel, we hope to elicit discussions on data sharing issues, and to help develop conceptual tools for data-intensive projects. We propose to do the following:

  • a background discussion about subject matter i.e. what type of work is copyrightable. We will discuss how the language of "authors" and their "writings" in the copyright clause in the US Constitution have expanded to cover various types of work but at the same time leaving out others, what the situation is elsewhere in the world, and how various public licenses address the issue.
  • an informal survey of various data communities on their data sharing practices: Observations on how they maintain the boundaries of collaborations, what tools are used to constrain/encourage data sharing, and what actions they take in face of non-conformity. Also how their objectives differ, e.g., to replace a heretofore proprietary dataset, to create a new dataset for a particular project, field, or the universe, to exploit datasets created as a side effect of mass collaboration, and others.
  • some initial thoughts on sharing content collections of a mixture nature (including e.g. datasets and copyrightable works): What restrictions ones may reasonably expect on imposing on users? Why and how?
  • a review of some of the practices and tools in collaborative data collection, data extraction from content collections, and data aggregating etc. What are to be expected of such practices, and what are to be built into such tools to support data sharing and to encourage collaboration? What of the above should Wikidata be congizant of (or not!)?


GLAM: Galleries, Libraries, Archives, and Museums & cultural outreach, but could fit as well in WikiCulture and Community or Wikis and the Public Sector
Length of presentation/talk
60 Minutes
Will you attend Wikimania if your submission is not accepted?
Some of the panelists will attend, but probably not all.
Slides or further information (optional)
Special request as to time of presentations

Interested attendees

If you are interested in attending this session, please sign with your username below. This will help reviewers to decide which sessions are of high interest. Sign with four tildes. (~~~~).

  1. Tbayer (WMF) (talk) 07:47, 19 March 2012 (UTC)[reply]
  2. Mindspillage (talk) 17:56, 19 March 2012 (UTC)[reply]
  3. Juttavd (talk) 00:10, 20 March 2012 (UTC)[reply]
  4. Daniel Mietchen - WiR/OS (talk) 09:03, 22 March 2012 (UTC)[reply]
  5. Pundit (talk) 22:29, 4 April 2012 (UTC)[reply]
  6. Your name here!