In the world of open source software, licenses like GPL, LGPL, MIT, etc., are generally viewed as a good thing, as they allow the authors of the software to place limited restrictions on the re-use of software according to their preference, whilst still being able to publish the source code. Similarly, for other creative works and open access publishing, the Creative Commons licenses are generally viewed as beneficial, because they allow authors to protect the integrity of their work if desired along with their right to attribution, but not otherwise limit access to or re-use of their work.

So what about scientific data? In MalariaGEN, we are developing policies for “community projects” where partners from independent research institutions around the world to submit samples for sequencing. Ultimately, we would like to make all of the data derived from sequencing those samples available to the scientific research community, but we would also like to protect our partners investment in collecting those samples by ensuring they are attributed when data are re-used. So, I thought, surely the best way to do this is to publish the data under a CC-like license, right?

It turns out this is not the current consensus. Science Commons have published a Protocol for Implementing Open Access Data, which (in section 5) has a good explanation of why using intellectual property rights (i.e., licenses) to enforce norms of attribution or share-alike is a bad idea. So the protocol states that:

[…] to facilitate data integration and open access data sharing, any implementation of this protocol MUST waive all rights necessary for data extraction and re-use […] and MUST NOT apply any obligations on the user of the data or database such as “copyleft” or “share alike”, or even the legal requirement to provide attribution.

This is consistent with policies adopted by major scientific data publishers like the European Nucleotide Archive (ENA), e.g.:

The INSD will not attach statements to records that restrict access to the data, limit the use of the information in these records, or prohibit certain types of publications based on these records. Specifically, no use restrictions or licensing requirements will be included in any sequence data records, and no restrictions or licensing fees will be placed on the redistribution or use of the database by any party.

However, the Science Commons protocol also says that:

Any implementation SHOULD define a non-legally binding set of citation norms in clear, lay-readable language.