More from TU:Librarian

Posts tagged rda

Better data, better decisions

I first thought that this was my fourth RDA Plenary, but I think it was my fifth, from Göteborg, Dublin, Paris, Amsterdam, now to Montreal.

RDA is not a normal conference, with a division between plenary lectures and parallel sessions in blocks of thematic topics. No, it is all about birds-of-a-feather sessions, interest groups or working groups, depending on the status (approved of) and maturity of the group. You need to select what groups or sessions to attend, surely if you are not personally involved in one of these groups.

Is open science starting a revolution? I went to the museum of fine arts to see what happened in the sixties.

That is why the morning session of the second day (I missed the first day because of the DataCite Board Meeting, and a meet-up with Stephanie Gagnon from University of Montreal) was useful for me. It was a quick overview of working groups, being in the middle or at the end of their 18-month period. I bring a few things home to @tudelftlibrary colleagues, e.g.

  • Datacubes, dataarrays – is that something we are working with? For me these were new words, but I am of course not a datalibrarian.
  • Take a look at
    http://www.typeregistry.org.
  • Materials Resource Registries (to make it easier to find and share resources about materials science). Examples at
    NIST
    and
    ChiMaD.
    Note that the software can easily be used for other disciplines. Of course this draws my interest, being a materials science engineer myself.
  • David Wilcox with his research data repository interoperability group is looking for adopters.
  • Anne E Thessen is improving the metadata schema so that curation history can easily be found, and curators can get credit, and valuable work does not need to be repeated (but can be found).

After the Library session (where I presented our RISE self-assessment on behalf of our own Library Research Data Services group), I attended the Make Data Count BOF session. A lot of our colleagues from DataCite were there. Interesting and useful work, i.e., to develop a hub for all data level metrics, so that usage tracking is made easier, throughout all research communities. The first draft COUNTER Code of Practice for Research Data has been created and is open for comments. I invite everybody to give their input to this valuable work.

On the third and last day I attended the interest group on education and training for research data handling. There was an overview of available courses and training for support staff on research data handling, or for engaging or guiding the researchers. It was obvious by the eight or some brief talks that there is so much out there, that our own proposal (from Ellen Verbakel @tudelftlibrary together with Irina Kuchina @EIFL) to create a data supporter curriculum, that is based on the research life cycle, seems wise. We want to define the learning goals and competences for the data supporter. The idea is to develop a more unified education, where all the current and present education and training on handling of research data is taken into account. And by the way a tip for ourselves, we should not bypass the Library Carpentry efforts, because here real hands-on work is being stimulated. After reviewing the content of existing courses, we would like to identify and describe the modules missing in these courses. After that we will need to define what modules are mandatory in a course for data supporters. We will also consider the thought of bringing in different levels, a question asked at the workshop. Perhaps I should consider doing a Library carpenter training myself!?

The theme of the conference was “better data, better decisions”. 100% true of course, the better we describe and maintain the data we preserve, the better findable, interoperable and reusable they are, and by doing so every user, also the data producer, can make better decisions.
I mentioned my meeting with Stephanie Gagnon. Talking about better data and better decisions. I read about the big deal cancellations in this blog, and our license manager Marina Lebedeva contacted Stephanie. Being in Montreal was too much of a coincidence, and it was very nice to talk to her. Going through her presentation says it all. From downloads, citations and mentions you can end up with essential journals per discipline for your institute, and that should be the basis of your negotiations. We promised to stay in touch, to enter open access in the equation in Canada/Montreal. Montreal has a good press at the moment in relation to open science, so I am pretty hopeful we will be able to join forces.

Delightful data days

I spent a few days in Paris, France, with my research data colleagues, almost 600 participants from 38 countries, who gathered for the 6th RDA Plenary. This RDA (research data alliance) focused on the need to work with enterprises, and had as underlying theme the climate change.

That was the reason that Barbara Ryan (Secretariat Director, Group on Earth Observations) held a keynote on the first day. She was not just focusing on the climate change per se, but explained how she managed to get their data open, and the effect that this has had on usage. “Countries have borders, earth observations have not.”

We were all impressed with the statement that Axelle Lemaire (Minister of State for Digital Technology, French Ministry of Economy, Industry and Digital Technology) made at the start of the conference. She preferred to use the metaphor of light instead of oil, when talking about data. Data is not a fossil source that might run dry, data is around in many forms, sometimes a bit diffused, but crucial and it needs to be shared to create value. She told us that France will launch a public consultation on 26 September about “the Digital Bill”. A delightful presentation.

On Wednesday evening we had a social dinner on a boat

On Wednesday evening we had a social dinner on a boat.

At the Plenary day I attended (there were three days in total) on 23 September, I was especially curious to see how working groups that I attended before, had progressed. So I attended the Publishing Data Workflows and the Data Citation Groups. The first group gave us a link to their article, and sample cases where either Dataverse, Dryad or figshare are used in the publisher’s data workflow. The future work will concentrate on moving forward in the research process, and analyse how processes for data publishing might work there. The working group invites everybody to give their best practices, thoughts and comments.

I think that we as libraries should realize that this is indeed what publishers are doing now (just also notice the press release announced at the RDA meeting about Mendeley Data and DANS). If we support our researchers with their data management plans and data stewardship, we can advise them how to keep, store and share their data, without giving the content away. I thought that the remark by William Gunn from Mendeley on the workshop a day before was reassuring “All types of content providers need to focus on value-added services and not paywalls”.

The Data Citation Working Group will shortly report on their 14 recommendations. The idea of RDA was that working groups only work for 18 months on a certain topic and that the group dissolves, and new groups emerge again. The difficulty here is that people like to continue their work, either because they feel committed to their legacy, or because there are many more ideas or recommendations to explore or make. New for me in this session was the “query store” as a middle man (you need to be able to reproduce your queries, so you give them a persistent identifier, but you also need to be able to retrieve the same data with that query, so you version your data with a timestamp). I also learned that data can be watermarked or carry fingerprints, as a protection layer (this related to data from social insurance providers for doctors and hospitals). Another term often used, was a “snapshot”: a version is a snapshot of your database. And I think it was Stefan Proll (but perhaps was it somebody who asked him some questions) said: “If users do not cite your data, cite your users”.

It was a nice dinner, and I never knew that there were that many dinner boats on the Seine.

It was a nice dinner, and I never knew that there were that many dinner boats on the Seine.

I already referred to the workshop on the day preceding the RDA, that was on e-Infrastructures & RDA for data intensive science. There was some overlap between these two days. One I did not mind at all. A very nice tool, called RD Switchboard, presented by Amir Aryani from ANDS (Australia). This switchboard is connecting datasets on the basis of co-authorship or other collaboration (e.g. via funding). Paolo Manghi showed that they already work together with the RD Switchboard by finding connections via the OpenAire database, between publications and projects and publications and data.

Mark Parsons, the secretary general of RDA talked (amongst other funny stuff) on infrastructures during the opening session of the preceding day. How we went from systems, to networks to networked infrastructures. Infrastructures are about bridges, both social and technical, and that is what RDA wants to do, creating bridges, and be open! “Preserve the freedom to tinker, that is why choice for open source is important.”

My Paris RDA trip started even a day before that, with the persistent identifiers workshop, organized by DataCite and ePIC. ePIC stands for persistent identifiers for eResearch, and is working on data in the full research cycle (what they call referrable data), whereas DataCite provides identifiers to citeable data. At the workshop there were presentations about identifiers such as ark, doi, handle, orcid and isni. For domain-specific work identifiers are often also needed, Anne Cambon-Thomsen started a journal for descriptions on Bioresources and Kerstin Lehnert introduced the igsn, the geosample number.

And we are not yet there, we want to use identifiers for more physical objects, we should always make sure that we refer to the pid in the metadata, and according to Peter Wissenburg, we should also use identifiers for the metadata. It is obvious that the most important thing is that these persistent identifiers are linked across platforms, and that we have an open scholarly infrastructure. A project about this, has just started, “Technical and Human Infrastructure for Open Research”: THOR. Tobias Weigl even wanted to bring it further: “We need an operational transition process. Go from one pid to the other. That is not possible yet.”

New for me was in the presentation by Laura Paglioni from ORCID that they will come with review information in your ORCID profile, and she showed that there is already a dataflow between CrossRef, DataCite and ORCID.

So even though I could not attend the full Plenary, enough inspiration as a take-away!

International Data Week

This time I am wrapping up the “International Data week” in Amsterdam, with the RDA 4th plenary (Reaping the fruits) as main event on 22-24 September 2014, and a range of satellite events on data were taking place in the same week. Just a (very) short impression!

Robert-Jan Smits kicked off the RDA meeting on Monday, where 520 attendants were present, by saying that only 10-30% of scientific articles can be reproduced.  He urged the community to change their culture, and “treat your data as you treat your publications”.

The video by Neelie Kroes contained a few nice phrases, e.g. “Open science depends on open minds, and it can grow if we build it upon trust”.

Barend Mons held a very entertaining keynote on “Bringing Data to Broadway”, and introduced his FAIR play, to make research findable accessible, interoperable and reusable. Barend referred to his Data FAIRPORT. Do not say open all the time, perhaps call it fair science (I will give this suggestion at the end of the EC public consultation on Science 2.0!).

He showed us that data loss is real and significant, while data growth is staggering. We should realize how important data stewardship is: Educate, reward and keep data scientists.  Professionalize data stewardship! 5% of research funding should go to data stewardship,  it is really worth the money. So award the data steward, introduce a research object impact factor. And do not forget: “Knowledge is like laughter, it increases when shared”.

I could only attend this first day partially and then the third day. The RDA always holds a lot of parallel sessions, similar to the previous plenaries, where the interest groups and working groups talk about their challenges and progress.

The working group on workflows (part of the interest group Publishing Data) is in the midst of a workflow analysis, and they called for people to look at their Excel sheet, add new workflows or columns to address. A few examples of workflows were presented, Martina Stockhause opened a discussion on versions of data, where her suggestion was to have a high-level persistent identifier based on a collection, and then allow for changes within. We thought that her discussion would be addressed by the group on Dynamic Data (I cannot find the correct link to this group though!).

The closing panel on the third day gave an overview of the data situation in Brasil, Japan, Canada and the US. A few interesting, some slightly contradictory, observations:

  • Should we refer to open data, or should we make a variety of how access can be arranged,  realizing that private sector wants to  exploit their data?
  • Do not create artificial silos between research and industry.
  • Data requires us to think in objects and connections,  and we should work on improving  services.
  • Beware to be “going in the rathole of sustainability”. At the end it is of course far more expensive not to invest in infrastructure.

The coming six months (to the next plenary, in San Diego) the RDA will focus on adoption, to be using and eating the fruits, and they will be clustering the interest groups and working groups. I think that this is a sensible thing to do.

One of the remarks of the panel was that you need a national infrastructure to be able to participate in a global infrastructure,  and that we should exchange best practices.  I am proud that we managed in the Netherlands to have Research Data Netherlands, a coalition where now three data archives are sharing their experience and work together on realizing sustainable data archiving.

On 24 September 2014 Research Data Netherlands (RDNL), the collaborative partnership between 3TU.Datacentrum and DANS, welcomed SURFsara.

On 24 September 2014 Research Data Netherlands (RDNL), the collaborative partnership between 3TU.Datacentrum and DANS, welcomed SURFsara.

Talking about the processes is useful and necessary, but it was very rewarding to have presentations of six researchers during the Dutch Data Prize Award on 24 September.

On Thursday the RECODE Workshop had a meeting (and there were as said much much more interesting events this week). RECODE aims to have their final conference in Athens in January 2015. People at the workshop were invited to comment on the draft recommendations document of work package 5.

The group wants to produce evidence-based policy recommendations. They have identified four stakeholder groups, funders, research institutions, data managers and publishers (question was raised whether researchers should be added as stakeholder). To give a quick idea:

  • Funders:  Develop, implement,  monitor and evaluate open access to research data. (During the panel later on, we discussed whether there was a funder that supports reusing data, that could be an addition to this short list.)
  • Research institutions: Develop data management strategies,  develop reward systems, develop training programs and support awareness-raising.
  • Data managers: Develop mission and responsibilities,  develop sustainable business models,  achieve trust worthiness of repositories and content, and develop data management services.
  • Publishers: Get policies for deposit of data and require data submissions in certified repositories.

Daniel Spichtinger (from European Commission,  DG Research and Innovation) took part in the workshop and told us about the European Commission’s pilot for open access to research data. A few things were new for me, apparently the deposit in repositories is mandatory, but there is no requirement to have it in a trusted repository. The opt-outs for opening up your data have a wide range: there may be a conflict to protect results,  a confidentiality issue or possible risk for national security, protection of personal data, and more. Another new thing for me was that apart from the selected areas (in the Excellence, Industrial Leadership or Societal Challenges programmes) all projects might go for a pilot on a voluntary basis. Further the data management plans are mandatory,  but are not part of the project evaluation,  they are required 6 months after project starts. At the end Daniel gave a nice quote: “This pilot gives you a chance to coshape policy on opening up research data.“  We also now know the take out so far (out of 3054 proposals): opt out is 24% in core areas, and 27% is the opt in, in other areas.

I am ending my post here, but our team, especially the product group Research Data Services, were of course in (almost) full-strength present, and apart from helping the main organisation DANS, sponsoring as 3TU.datacentrum (which we coordinate) the programme, we followed or contributed to Libraries for research data, Data publication, Long tail data and workshops on technique, training, policy and certification. A very busy week indeed!

© 2011 TU Delft