More from TU:Librarian

Posts tagged rda

Delightful data days

I spent a few days in Paris, France, with my research data colleagues, almost 600 participants from 38 countries, who gathered for the 6th RDA Plenary. This RDA (research data alliance) focused on the need to work with enterprises, and had as underlying theme the climate change.

That was the reason that Barbara Ryan (Secretariat Director, Group on Earth Observations) held a keynote on the first day. She was not just focusing on the climate change per se, but explained how she managed to get their data open, and the effect that this has had on usage. “Countries have borders, earth observations have not.”

We were all impressed with the statement that Axelle Lemaire (Minister of State for Digital Technology, French Ministry of Economy, Industry and Digital Technology) made at the start of the conference. She preferred to use the metaphor of light instead of oil, when talking about data. Data is not a fossil source that might run dry, data is around in many forms, sometimes a bit diffused, but crucial and it needs to be shared to create value. She told us that France will launch a public consultation on 26 September about “the Digital Bill”. A delightful presentation.

On Wednesday evening we had a social dinner on a boat

On Wednesday evening we had a social dinner on a boat.

At the Plenary day I attended (there were three days in total) on 23 September, I was especially curious to see how working groups that I attended before, had progressed. So I attended the Publishing Data Workflows and the Data Citation Groups. The first group gave us a link to their article, and sample cases where either Dataverse, Dryad or figshare are used in the publisher’s data workflow. The future work will concentrate on moving forward in the research process, and analyse how processes for data publishing might work there. The working group invites everybody to give their best practices, thoughts and comments.

I think that we as libraries should realize that this is indeed what publishers are doing now (just also notice the press release announced at the RDA meeting about Mendeley Data and DANS). If we support our researchers with their data management plans and data stewardship, we can advise them how to keep, store and share their data, without giving the content away. I thought that the remark by William Gunn from Mendeley on the workshop a day before was reassuring “All types of content providers need to focus on value-added services and not paywalls”.

The Data Citation Working Group will shortly report on their 14 recommendations. The idea of RDA was that working groups only work for 18 months on a certain topic and that the group dissolves, and new groups emerge again. The difficulty here is that people like to continue their work, either because they feel committed to their legacy, or because there are many more ideas or recommendations to explore or make. New for me in this session was the “query store” as a middle man (you need to be able to reproduce your queries, so you give them a persistent identifier, but you also need to be able to retrieve the same data with that query, so you version your data with a timestamp). I also learned that data can be watermarked or carry fingerprints, as a protection layer (this related to data from social insurance providers for doctors and hospitals). Another term often used, was a “snapshot”: a version is a snapshot of your database. And I think it was Stefan Proll (but perhaps was it somebody who asked him some questions) said: “If users do not cite your data, cite your users”.

It was a nice dinner, and I never knew that there were that many dinner boats on the Seine.

It was a nice dinner, and I never knew that there were that many dinner boats on the Seine.

I already referred to the workshop on the day preceding the RDA, that was on e-Infrastructures & RDA for data intensive science. There was some overlap between these two days. One I did not mind at all. A very nice tool, called RD Switchboard, presented by Amir Aryani from ANDS (Australia). This switchboard is connecting datasets on the basis of co-authorship or other collaboration (e.g. via funding). Paolo Manghi showed that they already work together with the RD Switchboard by finding connections via the OpenAire database, between publications and projects and publications and data.

Mark Parsons, the secretary general of RDA talked (amongst other funny stuff) on infrastructures during the opening session of the preceding day. How we went from systems, to networks to networked infrastructures. Infrastructures are about bridges, both social and technical, and that is what RDA wants to do, creating bridges, and be open! “Preserve the freedom to tinker, that is why choice for open source is important.”

My Paris RDA trip started even a day before that, with the persistent identifiers workshop, organized by DataCite and ePIC. ePIC stands for persistent identifiers for eResearch, and is working on data in the full research cycle (what they call referrable data), whereas DataCite provides identifiers to citeable data. At the workshop there were presentations about identifiers such as ark, doi, handle, orcid and isni. For domain-specific work identifiers are often also needed, Anne Cambon-Thomsen started a journal for descriptions on Bioresources and Kerstin Lehnert introduced the igsn, the geosample number.

And we are not yet there, we want to use identifiers for more physical objects, we should always make sure that we refer to the pid in the metadata, and according to Peter Wissenburg, we should also use identifiers for the metadata. It is obvious that the most important thing is that these persistent identifiers are linked across platforms, and that we have an open scholarly infrastructure. A project about this, has just started, “Technical and Human Infrastructure for Open Research”: THOR. Tobias Weigl even wanted to bring it further: “We need an operational transition process. Go from one pid to the other. That is not possible yet.”

New for me was in the presentation by Laura Paglioni from ORCID that they will come with review information in your ORCID profile, and she showed that there is already a dataflow between CrossRef, DataCite and ORCID.

So even though I could not attend the full Plenary, enough inspiration as a take-away!

International Data Week

This time I am wrapping up the “International Data week” in Amsterdam, with the RDA 4th plenary (Reaping the fruits) as main event on 22-24 September 2014, and a range of satellite events on data were taking place in the same week. Just a (very) short impression!

Robert-Jan Smits kicked off the RDA meeting on Monday, where 520 attendants were present, by saying that only 10-30% of scientific articles can be reproduced.  He urged the community to change their culture, and “treat your data as you treat your publications”.

The video by Neelie Kroes contained a few nice phrases, e.g. “Open science depends on open minds, and it can grow if we build it upon trust”.

Barend Mons held a very entertaining keynote on “Bringing Data to Broadway”, and introduced his FAIR play, to make research findable accessible, interoperable and reusable. Barend referred to his Data FAIRPORT. Do not say open all the time, perhaps call it fair science (I will give this suggestion at the end of the EC public consultation on Science 2.0!).

He showed us that data loss is real and significant, while data growth is staggering. We should realize how important data stewardship is: Educate, reward and keep data scientists.  Professionalize data stewardship! 5% of research funding should go to data stewardship,  it is really worth the money. So award the data steward, introduce a research object impact factor. And do not forget: “Knowledge is like laughter, it increases when shared”.

I could only attend this first day partially and then the third day. The RDA always holds a lot of parallel sessions, similar to the previous plenaries, where the interest groups and working groups talk about their challenges and progress.

The working group on workflows (part of the interest group Publishing Data) is in the midst of a workflow analysis, and they called for people to look at their Excel sheet, add new workflows or columns to address. A few examples of workflows were presented, Martina Stockhause opened a discussion on versions of data, where her suggestion was to have a high-level persistent identifier based on a collection, and then allow for changes within. We thought that her discussion would be addressed by the group on Dynamic Data (I cannot find the correct link to this group though!).

The closing panel on the third day gave an overview of the data situation in Brasil, Japan, Canada and the US. A few interesting, some slightly contradictory, observations:

  • Should we refer to open data, or should we make a variety of how access can be arranged,  realizing that private sector wants to  exploit their data?
  • Do not create artificial silos between research and industry.
  • Data requires us to think in objects and connections,  and we should work on improving  services.
  • Beware to be “going in the rathole of sustainability”. At the end it is of course far more expensive not to invest in infrastructure.

The coming six months (to the next plenary, in San Diego) the RDA will focus on adoption, to be using and eating the fruits, and they will be clustering the interest groups and working groups. I think that this is a sensible thing to do.

One of the remarks of the panel was that you need a national infrastructure to be able to participate in a global infrastructure,  and that we should exchange best practices.  I am proud that we managed in the Netherlands to have Research Data Netherlands, a coalition where now three data archives are sharing their experience and work together on realizing sustainable data archiving.

On 24 September 2014 Research Data Netherlands (RDNL), the collaborative partnership between 3TU.Datacentrum and DANS, welcomed SURFsara.

On 24 September 2014 Research Data Netherlands (RDNL), the collaborative partnership between 3TU.Datacentrum and DANS, welcomed SURFsara.

Talking about the processes is useful and necessary, but it was very rewarding to have presentations of six researchers during the Dutch Data Prize Award on 24 September.

On Thursday the RECODE Workshop had a meeting (and there were as said much much more interesting events this week). RECODE aims to have their final conference in Athens in January 2015. People at the workshop were invited to comment on the draft recommendations document of work package 5.

The group wants to produce evidence-based policy recommendations. They have identified four stakeholder groups, funders, research institutions, data managers and publishers (question was raised whether researchers should be added as stakeholder). To give a quick idea:

  • Funders:  Develop, implement,  monitor and evaluate open access to research data. (During the panel later on, we discussed whether there was a funder that supports reusing data, that could be an addition to this short list.)
  • Research institutions: Develop data management strategies,  develop reward systems, develop training programs and support awareness-raising.
  • Data managers: Develop mission and responsibilities,  develop sustainable business models,  achieve trust worthiness of repositories and content, and develop data management services.
  • Publishers: Get policies for deposit of data and require data submissions in certified repositories.

Daniel Spichtinger (from European Commission,  DG Research and Innovation) took part in the workshop and told us about the European Commission’s pilot for open access to research data. A few things were new for me, apparently the deposit in repositories is mandatory, but there is no requirement to have it in a trusted repository. The opt-outs for opening up your data have a wide range: there may be a conflict to protect results,  a confidentiality issue or possible risk for national security, protection of personal data, and more. Another new thing for me was that apart from the selected areas (in the Excellence, Industrial Leadership or Societal Challenges programmes) all projects might go for a pilot on a voluntary basis. Further the data management plans are mandatory,  but are not part of the project evaluation,  they are required 6 months after project starts. At the end Daniel gave a nice quote: “This pilot gives you a chance to coshape policy on opening up research data.“  We also now know the take out so far (out of 3054 proposals): opt out is 24% in core areas, and 27% is the opt in, in other areas.

I am ending my post here, but our team, especially the product group Research Data Services, were of course in (almost) full-strength present, and apart from helping the main organisation DANS, sponsoring as 3TU.datacentrum (which we coordinate) the programme, we followed or contributed to Libraries for research data, Data publication, Long tail data and workshops on technique, training, policy and certification. A very busy week indeed!

© 2011 TU Delft