Moving towards a culture of data citation

It was March 2013 that the first plenary of Research Data Alliance took place. I attended that meeting in Göteborg, Sweden. On 26-28 March 2014 Dublin, Ireland, was the venue for the third plenary. It coincided with a lot of satellite meetings, and DataCite also had its General Assembly and strategy meeting on the two days before the RDA. So I combined the two, and was present (in a way pretty straightforward being a Board Member of DataCite) at the DataCite gatherings and the first day of the RDA.

I would have loved watching a rugby game here at the Croke Park stadion / conference centre. Probably the combination of a conference and a game is not very practical. And I was told that this is not the rugby season.

It is not really possible to tell a lot then about all the things that happened at the RDA. I observed that there were a lot of interest group and working group sessions for the remainder of the conference. The first morning was a real plenary one. I thought that the introductory talk of Mark Ferguson (DG of Science Foundation Ireland) was interesting. He made a few statements that would be worth checking (I would love to have his sources!):
– The most highly-cited papers find their origin in:

  1. collaboration between academia /industry
  2. international collaboration
  3. national collaboration

“Isolated” research is at the bottom of the list. I can imagine that there a few parameters influencing this ranking, e.g. the discipline or the sort of peer groups you work with. Another statement was about the hitrate for patents, where he claimed that jointly funded projects give a better chance, and that the patent is often not attributed to the first (original) research(er). For Ferguson a reason to promote open innovation.

The panel about data policy was more a range of short presentations, which was in a way OK. I have to look at the ideas that Mercé Crosas (Director of Data Science, Harvard University) put forward. Being the initiator of Dataverse Network, she showed us their guidelines for data publishing. Moreover she referred to guidelines for connecting journals to data, where integration between journals and data is encouraged through Dataverse. That is a different use of Dataverse than I knew about.

In retrospect, but this is based on a very short (1-day) presence, I had expected more real activity and results (after 1 year) to come out of the RDA groups and workshops. The problem of course is that one can only attend one session at the time. I am eager to hear what is happening in all these groups, but it is difficult to get a “quick-and-dirty” overview.

The people I talked to were very positive about the excellent networking opportunities. Everybody you want to talk to, is at the RDA! Finally, to conclude this very short report I thought that I heard (at least) one very interesting idea at the Data Publishing Interest Group Introductory session, and that was the idea by Laure Haak (ORCID) to assign doi’s to data management plans. That could solve a missing link in the chain from project to data to publication. Simon Hodson (CODATA) who hammered at the plenary panel on the fact that at RDA it should be about putting all the available principles to practice (I could not agree more!), showed the very good cycle created by ANDS (one of the co-organisers of this plenary) of building a culture of data citation: create, use, measure and reward.


And to end with the beginning: apart from a lot of good discussions and nice get-togethers three things stood out from the DataCite meetings:
– DataCite will endorse the data citation principles that were recently published by Force11.
– DataCite has entered an agreement with Databib / Re3data. First step is that both data repository registries will merge their two projects into one service and this one service will be managed under the auspices of DataCite by the end of 2015.
– DataCite and RDA have signed a memorandum of understanding, so that both organisations can intensify their dialogue, and actively work on promoting data citation as an important element in the scholarly workflow.

I normally have no recollection at all what art is being displayed in hotel rooms or corridors. This time my attention was drawn to some of the pictures. Is it because of an intriguing scenery, the black-white of the tulip or the typical birds? I actually do not know. They are shown at the Gresham hotel, and at the Croke Park hotel, both in Dublin.

Bring in the data!

From time to time I also attend sessions that are not specifically related to library stuff. There was a library-related reason though that made me go to Eindhoven for the 3TU Conference on Innovation and Technology on December 6, 2013, and that was signing off the consortium agreement of our 3TU.datacentrum (finally!).


3TU.Federation Chair Dirk-Jan van den Berg referred to our 3tu.datacentrum as “a beacon of transparancy”.


So that gave me the opportunity to have a (brief) peek in other sessions. Our (Delft) Kees Vuik introduced the session on “Invisible mathematics: three tangible results”. I liked the “Intel stamp” that was used throughout this session: “Math Inside”. Perhaps one would not realize this but in so many topics related to e.g. optimising queuing (in a shop; for a helpdesk; or as part of a service bus); simulating maritime circumstances for large vessels; or thinking about your local electricity supply, the mathematics “inside” remains invisible, though is an essential part of the project. It reminded me of my study Materials Science. We had a similar problem, because – apart from fundamental research – in the applied scene material scientists facilitate other disciplines, essential yes, but somewhat invisible.

At the innovation market I found some other interesting stuff. What to think about LikeLines? Via a navigable heat map users can jump to interesting regions in the videos they are watching. Or the INSYGHTLab, where they work on multi-camera experiments for 3D reconstructions, to get to highly interactive screens.

I should also mention Federico Toschi – he spoke about “Fluid dynamics challenges for energy and health”, and showed us how understanding fluid dynamics is essential for health issues such as the rheology of blood in our vessels. Here we have a link to one of his datasets in our 3TU.datacentrum, which brings me back to the beginning – we can be proud to have our agreement finalized. Bring in the data!

Apples (are not the only fruit)

We just can’t get enough – talking about data

On 8 May 2013 3TU.Datacentrum launched its partnership with DANS. The establishment of the coalition Research Data Netherlands will bring together knowledge and expertise about research data. And above all it has the intention to unite research libraries, archives or other organisations that keep (trustworthy) data repositories.

Open Access has been around us for quite some time, but the past months more and more one pagers, position papers, network sessions, hearings have been written and organised.

One reason, amongst many others, of the current increased attention to Open Access, is that is has the potential to provide all stakeholders with evidence of the high standards of quality and integrity which the scientific system has traditionally imposed on itself.

That is why I quote the position paper undersigned by five Dutch universities to seriously consider the positive impact of Open Access on the use, re-use, and citations of scientific data. These five universities (Delft University of Technology, Erasmus University Rotterdam, Leiden University, TU Eindhoven and University of Twente) cooperate in multidisciplinary research that covers all societal challenges as mentioned in Horizon 2020. Universities want their research to be shared with society, so that it is available for new research, insights and innovation.

“In order to bridge the innovation divide in Europe, Open Access to data should be actively pursued, as sharing data can foster the advancement of excellent researchers, with due respect, however, for the legitimate commercial, national security and privacy interests. Open Access to research data must be encouraged to combat scientific misconduct and to foster the professionalization of researchers. Also in this Age of Big Data the rich universe of research data could be accessible,”

The momentum for this position paper, and others, was provided by the EC public consultation on Open Research Data.

For me personally it is essential that research data created in the public domain should be kept there. As publishers are changing their business and expanding it to the current research domain and evaluation metrics, we Libraries should also step up.

It is not just about finding that one apple in the jungle (citing a post a researcher and chair of one of our library committees brought to my attention), but also to bring the university “fruits” back for easy pickings 😉

Librarying, the new buzzword? A report from LIBER2013

This week I attended for the first time in my Library Life the LIBER Congress (which took place in Munich, Germany). It was a strange week, where I was of course heavily occupied with some possible governmental budget measurements, and met with a lot of my Library colleagues.

The main reason for coming over was the workshop LIBER organised on the “10 recommendations on Research Data Management, what’s next” on Wednesday 26 June, and the first face-to-face meeting of the steering committee of Scholarly Communication and Research Infrastructures. I am a member of that committee and my colleague @jprombouts, head of our Research Data Services, is member of its working group. Research (Information) Infrastructures and the Future Role of Libraries was the main theme of the Conference that took place for the remainder of that Wednesday, Thursday and Friday.

The main keynote lectures were therefore focused on this topic, and for me the two keynotes on Thursday were the most appropriate to this theme. Liz Lyon, director of UKOLN, University of Bath, showed us that universities could regard themselves as data publishers and that they should take responsibility for their own data products. Libraries can help researchers publish their data, with curation, discoverability, citation, formats and metrics. Librarians should be more and more be part of the Lab teams. Carlos Morais Pires from the European Commission made a nice comparison between engineers and librarians. “Engineers stop when things start to mean something”. Further he gave an overview of the things (formal communications / recommendation) the EC has done, and what is to be expected in relation to Horizon 2020. Geoffrey Boulton’s speech on Friday was similar to the one I happened to hear at the 10th anniversary meeting of LERU, in Bruxelles, last November. The discussion afterwards though made me sending tweets again, because I could not agree more: Talk to researchers and ask them what they need instead of going to spread the word about what you do. And see the library as a function, not an entity. Or as I phrased it myself: “librarying, it is a verb, it is active, it is dynamic, it is not a thing!

The Bavarian State Library at Munich ..

At our own workshop Jeroen informed the participants who attended this workshop (some 80 – 100 people) about our 3tu.datacentrum, our collaboration with DANS in Research Data Netherlands, and emphasized that it is important to “think big, start small and act now”. He took care of recommendations 5, 6, 7, and 8, which focused on collaboration and services. These recommendations were finalized and prioritized during the LIBER-conference in Tartu last year. Wolfram Horsten, from Oxford University Library, focused on Policy & Infrastructure (recommendations 4, 9, and 10) and showed us that it is good to start with a research data policy, and that you should have a centrally led approach (either with the library in the lead, or together with more supporting services), but you also should let the local initiatives flow. Partnering! was his final word at the discussion at the end. Rob Grim, the chair of the working group, from Tilburg University Library and IT Services, made the ten recommendations complete, with 1,2 and 3 and emphasized that libraries should work on having skilled people. He also referred to the implementation plan the working group will now start working on.

Where the Conference reception took place …

Of course there were also other session worth attending. I just pick two, to be a bit selective (as we heard also from Boulton, that is our role!).

The title “Meeting the needs of PhD candidates: Services, networks and relevance” attracted too many people for the size of the room. Eystein Gullbekk, Oslo Library, showed us their PhD on Track, which was launched one month ago. It provides modules under three tracks Review and discover / Share and publish / Evaluation and ranking. The principles they used during development of the modules were: Illustrate / Demonstrate / Explain / Provoke. Gullbekk’s second part focused on “being relevant”, where he used the principle of the actor network . Viewing a topic from different aspects, e.g. a publication can enact as apprenticeship or as accreditation, and you should realize that there can be conflicts of interest, so take different enacted realities into account and make it visible (both conflicts and resistance).

Birte Christensen-Dalsgaard, from the Royal Library Denmark, presented a successful crowdsourcing project: “Denmark seen from the air”. The Royal Library has 18 million photographs, so what to do with these, how can they be made useful? This project used the collections of photographs of farms “seen from the air”.  It started with 200 k negatives from the area “Fyn”. The idea was to get more data about the precise location of the farm. At a certain time in the project every minute a “farm was moved”.  At this moment 87% of pins have been geotagged. So as said, pretty successful! Some take-aways from Birte: Appeal to your roots; and Continue to have new material, to attract people again.

So was it a valuable experience? Yes, I would say so, the mixture of meeting people and hearing interesting stories makes it going of course. And now back to my librarying 😉

Beautiful surroundings indeed!

Research data: let the flowers grow!

Fran Berman: let the flowers grow!

From March 18-20 I joined with my colleague @jprombouts the launch event of the research data alliance. Obviously much discussion took place on governance issues. However, I also learned some stuff in Goteborg, and took a few ideas back home.

The launch was kicked off by Neelie Kroes, where she put the necessity to form this alliance forward: the EU is supporting open science, and wants to make science work better for all of us, with ownership and cooperation of scientists themselves. Another interesting contribution came from Peter Fox (Tetherless World constellation / Rensselaer Polytechnic Institute). He gave us five considerations:

  1. Work as you’ve succeeded > what would it all look like 10 years from now?
  2. It is not <just> about data
  3. It is about the alliance
  4. Be aware of vertical integration opportunity and needs
  5. The culture around data has to change

Peter told the RDA to be ready, be dynamic, be active, and urged us to bring together Head, Heart and Gut. It is difficult to avoid that I am only blogging quotes that have already been sent around via twitter! I liked e.g. Francine Berman’s remark that it is not just your data, it is other people’s data as well.

Together with Ross Wilkinson and John Wood she forms the RDA Council (to be expanded), where they represent the original founding from US, Australia and Europe. The RDA is being formed, or perhaps a better word would be moulded, by its members into the right shape. Two working groups have been approved so far, i.e., on data type registries and pid information types. The Council emphasized that RDA is about connecting data, people and disciplines. Of course the world consists of more than Australia, US and Europe, so there were also presentations about progress made in the field of research data from Canada, India, South Africa and China.

Upon arrival in Goteborg, we saw these very nice trees and benches.

We could digest real interesting content at the start of the second day. First we had Manfred Laubichler from Arizona State University telling us about the digital HPS (History and Philosophy of Science), He showed us that it is indeed not only about data, but also about the methods you need to deploy the data. Laubichler gave us an example from the Evolutionary population ecology. We saw that researcher Bradshaw changed his mind when comparing his statement in 1948 with the one in 1965. To understand why that happened you need to understand the scientific context of the whole field. So he concluded that these computational approaches require cyberinfrastructure, open and transparent (big) data and linkable repositories. Philip Bourne (from UCSD, and later in the same week also present at Force 11), taught us some lessons:

  1. It is all about trust (“trust in the data is perhaps our biggest achievement”), so listen to your community & engage them in every part of the process
  2. Data quality begats trust (support for versioning hence the copy of record, all versions accessible)
  3. It is all about people (curators are the unsung heroes)
  4. It is NOT all about institutions. No data standards body has directly influenced PDB, the protein databank
  5. It is about Openness. PDB should be more transparent about data usage .

Further interesting stuff from Bourne was that the thought that data are created are equal must end, that we need to understand how data are used, that reductionism is not a dirty word, that we should do more with the long tail, and should stop looking at funding agencies. And to conclude: “Think about the questions we wish to answer rather than simply being able to retrieve the data.”

The remainder of the launch meeting was perhaps really what is all about – established and perhaps-to-be-established working groups gathered for afternoon sessions on the second day and reported back on this the next day. We can tell at the next (mid-September in Washington) plenary RDA meeting what real actions have been taken up by all these groups, what plans are still valid and where new things are added.

Also for us both at 3TU.datacentre and TU Delft Library we need to work on our ambition and see where we can streamline this with all the RDA initiatives. Will we be able to take part in the yet-to-be-approved engagement working group , the publishing data interest group (with its subgroup on citation of dynamic data), while our DANS colleagues chair the to-be-approved certification working group and more interesting stuff is going in, e.g. in Preservation, or PID information, Terminology, etc ….? What we know for sure is that we cannot do everything, There were three things though that I brought back with me to have some further thoughts about:

  1. Should we (copying Research Data Canada) start a Research Data Netherlands initiative? Where we make sure that there is a voice for the Netherlands in several associations, alliances, working groups and that we think about an efficient workload and division in topics, disciplines?
  2. Would it be an idea (perhaps for the to-be-approved RDA Working Group on Preservation?) to start working on a retention table, so that we take the advice to work on “reductionism” into account?
  3. Is Dataverse Network (which we also will offer to our Delft, Eindhoven and Twente scientists) the thing that is “just as easy” for our scientists to use as Dropbox? Mind you, there is a FileSender option, offered by Surfnet, I am not sure whether we knew about this in Delft!

There was so much more, but I guess I should stop, but not without two more quotes:

  1. Scott Brim: We should  get the horse to drink, the desperate need is there, but it is only clear to us
  2. We could view RDA as green house to let the flowers grow.


Tweetweeeentwee: Een dag uit het leven … in het LLC

2 februari 2012: Social media cafe, 9.30 – 11.00 uur. Marketing&Communicatie, Onderwijs- en Studentenzaken en TU Delft Library bedienen samen, vandaag voor de tweede keer, het social media cafe. Er is ook een blog gemaakt. Ik vraag aan Willem, Marion en Liesbeth of deze weblog (mtlibrary) wel bij mij past. Leuk sparren, je kunt alles over social media vragen en lekkere koekjes eten. Ik krijg overigens het advies het anders te gaan doen 😉

2 februari 2012: Is het een tulp?, 17.55 uur. Sinds een paar weken hebben we de tentoonstelling Designing Universal Knowledge van Gerlinde Schuller in de hal staan. Om met z’n allen de geschiedenis te markeren, en als je het nu hebt over user-generated content of wisdom of the crowd, dit is natuurlijk een mooi voorbeeld. De meeste bijdragen zijn leuk en volgens Gerlinde ook voor haar nuttig.

2 februari 2012: The Genius of Anthonie van Leeuwenhoek, nog tot 6 februari. Naast deze tentoonstelling hebben we ook ons mystery object, waar een extra tipje van de sluier is opgelicht … wat zou het toch zijn?

2 februari 2012: de laatste echte dag van de tentamenperiode. Het wordt rustiger, en coffeecorner The Cone gaat weer dicht, en onze wiskunde- en mechanicaloket ook.

2 februari 2012: Nog meer designing universal knowledge, 17.57 uur. “People are made to be loved and cared about. And things are made to be used. The confusion in our worldwide society nowadays is that people are being used and things are being loved and cared about.”

2 februari 2012: Van project naar product: 3TU.datacentrum, 16.55 uur. In de blauwe zaal van het Library Learning Centre vieren we de overgang van project naar product. Projectleider Jeroen Rombouts draagt het stokje over aan productmanager Madeleine de Smaele. 3TU.datacentrum is the prime facility for long-term access to scientific research data. We serve (intern)national programmes and projects of the three technical universities in the Netherlands in the research themes of: Energy, Environment, Infrastructures, Mobility and Health. 


