More from TU:Librarian
- March 2017
- February 2017
- January 2017
- December 2016
- October 2016
- June 2016
- April 2016
- March 2016
- February 2016
- December 2015
- September 2015
- July 2015
- April 2015
- January 2015
- December 2014
- September 2014
- July 2014
- May 2014
- March 2014
- January 2014
- December 2013
- November 2013
- October 2013
- July 2013
- June 2013
- April 2013
- March 2013
- February 2013
- December 2012
- November 2012
- October 2012
- August 2012
- June 2012
- May 2012
- April 2012
- March 2012
- February 2012
- January 2012
- December 2011
- November 2011
- September 2011
- July 2011
- June 2011
- May 2011
- March 2011
- February 2011
- January 2011
- December 2010
- November 2010
- October 2010
- August 2010
- July 2010
- June 2010
- May 2010
- April 2010
- March 2010
- February 2010
- December 2009
- October 2009
- September 2009
- July 2009
- June 2009
- April 2009
- March 2009
- January 2009
- November 2008
- October 2008
- September 2008
Posts tagged research data
I am reporting on LIBER and on Helsinki again. So it better be good. A few days of both is a good way to pass your time. At LIBER 2016 “opening paths to knowledge” (45th edition) there were the usual topics on the agenda. The best short speech of day 1 for me was the speech during the conference dinner, by the deputy mayor. He referred to the Helsinki open data site, and called Helsinki a city of transparency. In times where populism rules, it is necessary to know your facts, and to advocate for the better argument. This is why it is so important to share your data and your knowledge. It had been a long day, and I could not make notes, so the quotes are not perfect, but I thought it was a very good dinner speech. Copying from the website: “Imagine a city where public decision-making is easy for all to follow and comment on using any digital channel. A solution to this challenge is being sought in Helsinki, which has long been working to unlock the data reserves related to municipal decision-making.”
The first day also started with data. The topic of the pre-workshop I attended was “skills for supporting research data”. There were a lot of examples of libraries starting training for staff, for researchers (at different levels), a lot of variety in topics, in forms (flipped classroom, MOOCs, offline and online mixes) and experiences. The conclusion Wolfram Horstmann made at the end was that our role regarding research data skills training is established, what remains is at what level and detail we can or want to do this. Useful links (besides of course of our own training Essentials 4 Data Support) are the overview of existing education-models by DataOne, and the MOOC developed by the University of North Carolina at Chapel Hill, and The University of Edinburgh.
Another topic of LIBER was Libraries in publishing (or should we say releasing results, as was suggested during the conference). I liked the presentation from Göttingen. Margo Bargheer and Birgit Schmidt found a few answers when preparing their paper. Research libraries are on a mission: they work on more transparency, more participation, open access and more accuracy. Libraries can help researchers to “be good, and avoid the bad”. I liked their references to the Open Science peer reviewer oath, the Singapore statement on research integrity and the answer to the question we asked ourselves in the pre-workshop (when is the right time to start training) by their training for junior scientists. Talking about outreach, on the last day we had a presentation about Altmetrics. Susanna Kirsi Nykyri and Valtteri Reino Vainikka, from Helsinki University Library, shared their experience with Plumx from Ebsco. I really appreciated their reservations and conclusions at the end. Altmetrics are not the answer for everyone, as always it is discipline-related. As a library you may have a lot of extra work, choice of the platform is essential, the success is depending on language, complete metadata, use of identifiers and source lists. ORCID seems to be of great help (though ORCID accounts also need to be updated).
Of course I could not attend every session (however, my colleague Zofia Dzwig also attended LIBER, and went to other presentations), but I was enticed to go to the “user-centred” session on day 2, and good that I did so, because this was a very nice session. The one that I highlight here is from Cambridge University Library. Sue Mehrer and Andy Priestner made an impressive presentation. Bear in mind (quoting Margaret Mead): “What people say, what people do, or say they do are entirely different things”, and try to benchmark yourself against services that people encounter in their daily life. A good idea according to Sue and Andy is tested via a MVP (minimum viable product), which gives you the opportunity to fail forward (learn and improve). Their Futurelib prototypes (70% complete) are often not brought to the final version. When I later spoke to Andy, he mentioned that this is the way it is in a time where things change so rapidly, we are living in beta forever. Their staffing is just 1,5 person. Depending on the topic, they have other employees involved and hire extra resources. All sessions made clear that innovation is dynamic, changes need to be evaluated, and users to be asked for their experiences on a regular basis. However, beware that you check what your users do (not what they say). To give also some credit to the other two presentations in this session: keep on listening, reviewing and challenging (Penny Hicks). And if you go out and ask your users, bring in an outside view, and do not present yourself as a library (Eva Dahlbäck and Martin Wincent).
And of course open science and open access were present at the congress. Ralf Schimmer had a keynote, but did not bring a new view or the “how” roadmap on his transformation paper.
A bit before the wrap-up I had to leave, thank you LIBER, organisers and particpants, for yet another conference worth attending!
I spent a few days in Paris, France, with my research data colleagues, almost 600 participants from 38 countries, who gathered for the 6th RDA Plenary. This RDA (research data alliance) focused on the need to work with enterprises, and had as underlying theme the climate change.
That was the reason that Barbara Ryan (Secretariat Director, Group on Earth Observations) held a keynote on the first day. She was not just focusing on the climate change per se, but explained how she managed to get their data open, and the effect that this has had on usage. “Countries have borders, earth observations have not.”
We were all impressed with the statement that Axelle Lemaire (Minister of State for Digital Technology, French Ministry of Economy, Industry and Digital Technology) made at the start of the conference. She preferred to use the metaphor of light instead of oil, when talking about data. Data is not a fossil source that might run dry, data is around in many forms, sometimes a bit diffused, but crucial and it needs to be shared to create value. She told us that France will launch a public consultation on 26 September about “the Digital Bill”. A delightful presentation.
At the Plenary day I attended (there were three days in total) on 23 September, I was especially curious to see how working groups that I attended before, had progressed. So I attended the Publishing Data Workflows and the Data Citation Groups. The first group gave us a link to their article, and sample cases where either Dataverse, Dryad or figshare are used in the publisher’s data workflow. The future work will concentrate on moving forward in the research process, and analyse how processes for data publishing might work there. The working group invites everybody to give their best practices, thoughts and comments.
I think that we as libraries should realize that this is indeed what publishers are doing now (just also notice the press release announced at the RDA meeting about Mendeley Data and DANS). If we support our researchers with their data management plans and data stewardship, we can advise them how to keep, store and share their data, without giving the content away. I thought that the remark by William Gunn from Mendeley on the workshop a day before was reassuring “All types of content providers need to focus on value-added services and not paywalls”.
The Data Citation Working Group will shortly report on their 14 recommendations. The idea of RDA was that working groups only work for 18 months on a certain topic and that the group dissolves, and new groups emerge again. The difficulty here is that people like to continue their work, either because they feel committed to their legacy, or because there are many more ideas or recommendations to explore or make. New for me in this session was the “query store” as a middle man (you need to be able to reproduce your queries, so you give them a persistent identifier, but you also need to be able to retrieve the same data with that query, so you version your data with a timestamp). I also learned that data can be watermarked or carry fingerprints, as a protection layer (this related to data from social insurance providers for doctors and hospitals). Another term often used, was a “snapshot”: a version is a snapshot of your database. And I think it was Stefan Proll (but perhaps was it somebody who asked him some questions) said: “If users do not cite your data, cite your users”.
I already referred to the workshop on the day preceding the RDA, that was on e-Infrastructures & RDA for data intensive science. There was some overlap between these two days. One I did not mind at all. A very nice tool, called RD Switchboard, presented by Amir Aryani from ANDS (Australia). This switchboard is connecting datasets on the basis of co-authorship or other collaboration (e.g. via funding). Paolo Manghi showed that they already work together with the RD Switchboard by finding connections via the OpenAire database, between publications and projects and publications and data.
Mark Parsons, the secretary general of RDA talked (amongst other funny stuff) on infrastructures during the opening session of the preceding day. How we went from systems, to networks to networked infrastructures. Infrastructures are about bridges, both social and technical, and that is what RDA wants to do, creating bridges, and be open! “Preserve the freedom to tinker, that is why choice for open source is important.”
My Paris RDA trip started even a day before that, with the persistent identifiers workshop, organized by DataCite and ePIC. ePIC stands for persistent identifiers for eResearch, and is working on data in the full research cycle (what they call referrable data), whereas DataCite provides identifiers to citeable data. At the workshop there were presentations about identifiers such as ark, doi, handle, orcid and isni. For domain-specific work identifiers are often also needed, Anne Cambon-Thomsen started a journal for descriptions on Bioresources and Kerstin Lehnert introduced the igsn, the geosample number.
And we are not yet there, we want to use identifiers for more physical objects, we should always make sure that we refer to the pid in the metadata, and according to Peter Wissenburg, we should also use identifiers for the metadata. It is obvious that the most important thing is that these persistent identifiers are linked across platforms, and that we have an open scholarly infrastructure. A project about this, has just started, “Technical and Human Infrastructure for Open Research”: THOR. Tobias Weigl even wanted to bring it further: “We need an operational transition process. Go from one pid to the other. That is not possible yet.”
New for me was in the presentation by Laura Paglioni from ORCID that they will come with review information in your ORCID profile, and she showed that there is already a dataflow between CrossRef, DataCite and ORCID.
So even though I could not attend the full Plenary, enough inspiration as a take-away!
This time I am wrapping up the “International Data week” in Amsterdam, with the RDA 4th plenary (Reaping the fruits) as main event on 22-24 September 2014, and a range of satellite events on data were taking place in the same week. Just a (very) short impression!
Robert-Jan Smits kicked off the RDA meeting on Monday, where 520 attendants were present, by saying that only 10-30% of scientific articles can be reproduced. He urged the community to change their culture, and “treat your data as you treat your publications”.
The video by Neelie Kroes contained a few nice phrases, e.g. “Open science depends on open minds, and it can grow if we build it upon trust”.
Barend Mons held a very entertaining keynote on “Bringing Data to Broadway”, and introduced his FAIR play, to make research findable accessible, interoperable and reusable. Barend referred to his Data FAIRPORT. Do not say open all the time, perhaps call it fair science (I will give this suggestion at the end of the EC public consultation on Science 2.0!).
He showed us that data loss is real and significant, while data growth is staggering. We should realize how important data stewardship is: Educate, reward and keep data scientists. Professionalize data stewardship! 5% of research funding should go to data stewardship, it is really worth the money. So award the data steward, introduce a research object impact factor. And do not forget: “Knowledge is like laughter, it increases when shared”.
I could only attend this first day partially and then the third day. The RDA always holds a lot of parallel sessions, similar to the previous plenaries, where the interest groups and working groups talk about their challenges and progress.
The working group on workflows (part of the interest group Publishing Data) is in the midst of a workflow analysis, and they called for people to look at their Excel sheet, add new workflows or columns to address. A few examples of workflows were presented, Martina Stockhause opened a discussion on versions of data, where her suggestion was to have a high-level persistent identifier based on a collection, and then allow for changes within. We thought that her discussion would be addressed by the group on Dynamic Data (I cannot find the correct link to this group though!).
The closing panel on the third day gave an overview of the data situation in Brasil, Japan, Canada and the US. A few interesting, some slightly contradictory, observations:
- Should we refer to open data, or should we make a variety of how access can be arranged, realizing that private sector wants to exploit their data?
- Do not create artificial silos between research and industry.
- Data requires us to think in objects and connections, and we should work on improving services.
- Beware to be “going in the rathole of sustainability”. At the end it is of course far more expensive not to invest in infrastructure.
The coming six months (to the next plenary, in San Diego) the RDA will focus on adoption, to be using and eating the fruits, and they will be clustering the interest groups and working groups. I think that this is a sensible thing to do.
One of the remarks of the panel was that you need a national infrastructure to be able to participate in a global infrastructure, and that we should exchange best practices. I am proud that we managed in the Netherlands to have Research Data Netherlands, a coalition where now three data archives are sharing their experience and work together on realizing sustainable data archiving.
Talking about the processes is useful and necessary, but it was very rewarding to have presentations of six researchers during the Dutch Data Prize Award on 24 September.
On Thursday the RECODE Workshop had a meeting (and there were as said much much more interesting events this week). RECODE aims to have their final conference in Athens in January 2015. People at the workshop were invited to comment on the draft recommendations document of work package 5.
The group wants to produce evidence-based policy recommendations. They have identified four stakeholder groups, funders, research institutions, data managers and publishers (question was raised whether researchers should be added as stakeholder). To give a quick idea:
- Funders: Develop, implement, monitor and evaluate open access to research data. (During the panel later on, we discussed whether there was a funder that supports reusing data, that could be an addition to this short list.)
- Research institutions: Develop data management strategies, develop reward systems, develop training programs and support awareness-raising.
- Data managers: Develop mission and responsibilities, develop sustainable business models, achieve trust worthiness of repositories and content, and develop data management services.
- Publishers: Get policies for deposit of data and require data submissions in certified repositories.
Daniel Spichtinger (from European Commission, DG Research and Innovation) took part in the workshop and told us about the European Commission’s pilot for open access to research data. A few things were new for me, apparently the deposit in repositories is mandatory, but there is no requirement to have it in a trusted repository. The opt-outs for opening up your data have a wide range: there may be a conflict to protect results, a confidentiality issue or possible risk for national security, protection of personal data, and more. Another new thing for me was that apart from the selected areas (in the Excellence, Industrial Leadership or Societal Challenges programmes) all projects might go for a pilot on a voluntary basis. Further the data management plans are mandatory, but are not part of the project evaluation, they are required 6 months after project starts. At the end Daniel gave a nice quote: “This pilot gives you a chance to coshape policy on opening up research data.“ We also now know the take out so far (out of 3054 proposals): opt out is 24% in core areas, and 27% is the opt in, in other areas.
I am ending my post here, but our team, especially the product group Research Data Services, were of course in (almost) full-strength present, and apart from helping the main organisation DANS, sponsoring as 3TU.datacentrum (which we coordinate) the programme, we followed or contributed to Libraries for research data, Data publication, Long tail data and workshops on technique, training, policy and certification. A very busy week indeed!
LIBER 2014 was held in Riga this year, obviously for two reasons (or perhaps three): it is the European Capital of Culture this year, the new National Library (“castle of light” opens this year, and hosted the conference). And perhaps because we could have the former President of Latvia give a wonderful speech about “the power of the word”. Three days around 350 participants gathered from 36 countries, talking about or listening to a variety of subjects, but all under the main theme of this year’s conference: “Research Libraries in the 2020 Information Landscape”. I am picking just a few topics. This year (see last year’s blog) I attended the conference with my colleague Will Roestenburg.
On Tuesday NEREUS (information hub of libraries in supporting research and education in social sciences) organised an Open Access Workshop on Open Access Policies in practice and lessons learned. Five institutes presented their open access policy, mainly focusing on research papers or proceedings and including deposit in the institutional repository. These repositories are named e.g. WRAP (at Warwick, UK), or Lirias (Leuven, Belgium) or RepositoriUM (Minho, Portugal). Main take aways from this session were that you need marketing & advocacy skills in your library, you need to think of how to position your CRIS, repository and personal pages, and you need to diversify your message, because researchers (and their disciplines) are different, and stakeholders (researcher, student, institute, public, companies) are different too. Institutional mandates come in (and prove to be) very handy to increase your success, but you still need to implement the mandate and spread the word.
On Friday there was also a track on Open Access. Inge Werner told us about the new strategy for OA publishing in Utrecht University Library: from services to partnering. Their idea is to work as a greenhouse, and after helping journals in their first phase (though this may last 6 years), to have them transferred to a commercial open access publisher. The main problem for the library is that they really need to educate the editors that publishing cannot be done for free, and although the library is still sponsoring a substantial part of the publishing costs, that will not be the case after the transfer. It is good that we (as libraries) test different models with our main shared goal: get research “reachable”.
Maurizio Lunghi presented on Thursday morning the results of the APARSEN project. Without (being able to) becoming too technical, the idea of the interoperability framework is that it connects all sorts of persistent identifiers (PI’s), without trying to make one of them redundant or obsolete. It is pictured as a ring of trust (if all PI domains expose their content on LOD, linked open data, in the same way). There is a demonstrator demo online, and I have the idea that this is a very useful development.
Innovation, Flow & Friction
Rachel Frick, Council on Library and Information Resources, USA, started off with telling us where she originates from, and had a nice keynote on Wednesday afternoon, where she referred to DPLA, the digital public library of America. How to minimize friction and maximize flow? We live in a mash-up culture, crossing national and international boundaries, and we know that the network changes everything. We should not wait until people find what we have (after we have at least digitized the most interesting stuff that is not digitally borne), but enrich Wikipedia, make our metadata part of the network and expose our dark matter to the light as true leaders and practitioners of openness ourselves.
Lorraine Joanne Beard and Nick Campbell, from the University of Manchester, UK, explained how the library links to the university strategy. They also confirmed that the library should be vocal and tell how they can help the university to reach its goal. The Eureka example that they have initiated in their Innovation group was a nice one. In a dragon’s den like event students’ ideas were selected by a professional jury and the winner got some money, and the realisation of his/her idea. Several themes emerged in the contest that were picked up. The Manchester representatives told us to put ideas in practice, and to be more risk taking.
Eva Dahlbäck, from Stockholm University Library, Sweden, told us how they have (internally developed) managed to create the web-based software Viola, which helps staff in the closed stacks to fetch any requested material from the physical stacks, with a smartphone as device.
One of the plenary lectures on Wednesday was about the e-Book Phenomenon and its impact (by Prof. Thomas Daniel Wilson, University of Borås, Sweden). What I liked (it was a pity that he could not finish his talk due to time constraints) was his remark that e-book development has the potential to make an impact on every stakeholder. His suggestion for universities was to produce open-access textbooks, because now you can tailor the textbook to the course (instead of the other way around). Examples he mentioned were the Florida Distance Learning Consortium and Intermediate Algebra (see http://collegeopentextbooks.org/), representing the very best of Open Educational Resources.
Zooniverse, figshare, distributed proofreaders, metadatagames: they are just a few examples of crowdsourcing. Elena Simperl (from University of Southampton, UK) had a lot for us to learn about it. With crowdsourcing you have a problem and solve it by an open call, using the large network of potential. You can have macrotasks (e.g. innovation), microtasks (e.g. tagging, many people at the same time in parallel), crowdfunding, or contests. Of course it is a nice opportunity to engage with your customer (though you need to understand what drives participation). As Simperl said, computers are sometimes better than humans; this is the age of social machines. Improve information technology, but do not overdo crowdsourcing. Let people do the creative work, and the machines the administration. And in her conclusion she said that creativity remains as task for (the staff of) the library, and we should be glad that you “free up” time to spend on creativity.
Research data management, what works?
This workshop in the morning of July 2 was organized by the LIBER working group / steering committee on Scholarly Communication and Research Infrastructures. I was moderating the second part and thanks to some good suggestions made by Marina Noordegraaf, we had a very interactive session about training and skills, and encouraged people to start research dating. In short the main take-away messages were that you need to remember that the research groups are not all the same, that you need to be brave (again!) and go out to the researchers, and that we should take advantage of our own network, and learn from each other.
Arlette Piquet from ETH Libraries and Collections, Zurich, Switzerland showed the next day how they are dealing with data curation. Starting with a research survey in 2011, they have defined a timepath and approach, where they have decided to work from one solution, being Ex Libris Rosetta (including administrative data).
On Tuesday morning we had some time to stroll around in (a very rainy) Riga. We visited the Dome or Riga Cathedral, which is very famous for its organ (for which Frans Liszt, although he has never been there, wrote a chorale “Nun danket alle Gott”). Especially the old cloistral corridors with Riga heritage was worth our visit. Afterwards we drank a coffee at a lovely place, called Sweet Day Café.
It was March 2013 that the first plenary of Research Data Alliance took place. I attended that meeting in Göteborg, Sweden. On 26-28 March 2014 Dublin, Ireland, was the venue for the third plenary. It coincided with a lot of satellite meetings, and DataCite also had its General Assembly and strategy meeting on the two days before the RDA. So I combined the two, and was present (in a way pretty straightforward being a Board Member of DataCite) at the DataCite gatherings and the first day of the RDA.
It is not really possible to tell a lot then about all the things that happened at the RDA. I observed that there were a lot of interest group and working group sessions for the remainder of the conference. The first morning was a real plenary one. I thought that the introductory talk of Mark Ferguson (DG of Science Foundation Ireland) was interesting. He made a few statements that would be worth checking (I would love to have his sources!):
– The most highly-cited papers find their origin in:
- collaboration between academia /industry
- international collaboration
- national collaboration
“Isolated” research is at the bottom of the list. I can imagine that there a few parameters influencing this ranking, e.g. the discipline or the sort of peer groups you work with. Another statement was about the hitrate for patents, where he claimed that jointly funded projects give a better chance, and that the patent is often not attributed to the first (original) research(er). For Ferguson a reason to promote open innovation.
The panel about data policy was more a range of short presentations, which was in a way OK. I have to look at the ideas that Mercé Crosas (Director of Data Science, Harvard University) put forward. Being the initiator of Dataverse Network, she showed us their guidelines for data publishing. Moreover she referred to guidelines for connecting journals to data, where integration between journals and data is encouraged through Dataverse. That is a different use of Dataverse than I knew about.
In retrospect, but this is based on a very short (1-day) presence, I had expected more real activity and results (after 1 year) to come out of the RDA groups and workshops. The problem of course is that one can only attend one session at the time. I am eager to hear what is happening in all these groups, but it is difficult to get a “quick-and-dirty” overview.
The people I talked to were very positive about the excellent networking opportunities. Everybody you want to talk to, is at the RDA! Finally, to conclude this very short report I thought that I heard (at least) one very interesting idea at the Data Publishing Interest Group Introductory session, and that was the idea by Laure Haak (ORCID) to assign doi’s to data management plans. That could solve a missing link in the chain from project to data to publication. Simon Hodson (CODATA) who hammered at the plenary panel on the fact that at RDA it should be about putting all the available principles to practice (I could not agree more!), showed the very good cycle created by ANDS (one of the co-organisers of this plenary) of building a culture of data citation: create, use, measure and reward.
And to end with the beginning: apart from a lot of good discussions and nice get-togethers three things stood out from the DataCite meetings:
– DataCite will endorse the data citation principles that were recently published by Force11.
– DataCite has entered an agreement with Databib / Re3data. First step is that both data repository registries will merge their two projects into one service and this one service will be managed under the auspices of DataCite by the end of 2015.
– DataCite and RDA have signed a memorandum of understanding, so that both organisations can intensify their dialogue, and actively work on promoting data citation as an important element in the scholarly workflow.
Flying back to Amsterdam after two days of attending the APE Conference and one day of attending a Board Meeting of DataCite in Berlin, I try to capture what APE has brought me this year. It has been 4 or perhaps 5 years that I attended this conference. Each year it is organized by Arnoud de Kemp and this was its 9th edition.
I will not report in chronological order, but just take a few strands out. The topic of the meeting was “Redefining the Scientific Record, The Future of the Article, Big Data & Metrics”, and participants were (mainly) publishers, some researchers involved in funding or publishing, and library, governmental or funding agency representatives. A lot of Dutch people attended APE2014, of course also due to the keynote speech of Sander Dekker, our State Secretary, of the Ministry of Education, Culture and Science, at the start of the conference.
If you want to view any or all presentation(s), that is possible via the recorded live stream.
Peer review under discussion
David Black, Secretary General ICSU (International Council for Science), and from origin a researcher in organic chemistry, claimed that the interdependence between curators and creators would also remain in this digital era. According to Black authors will in future send their findings to repositories (standardized, subject-focused and international) instead of primary journals. Anybody should be allowed to add comments to papers submitted to this repository. The peer review is an open evaluation, and takes place after publication. Out of this repository secondary publications can be selected (and that could still be a role for the major publishing houses). Reputation building is not merely based on these publications, but also on local contributions, your presence at conferences and individual (personal) references. An important condition for this to be a success it that the author takes his or her responsibility for his own work (be aware of what you submit). Jan Velterop referred me via twitter to a recent blog he wrote about this, .
ScienceOpen, also present at the conference, mentioned that they are already supporting scientists and are offering public post-publication peer review.
From the publishers
APE is really a conference for and with publishers. Let me highlight two presentations from the publishers. The first one was by H. Frederick Dylla, Executive Director and CEO of the American Institute of Physics. He talked about CHORUS, that started in September 2013. CHORUS stands for ClearingHouse for the Open Research of the US and provides public access to manuscript/articles reporting on federally funded research, using existing infrastructure. Another one was from Eefke Smit. Apart from giving some nice poetry and examples, e.g. the Atlas of Digital Damages, she asked all publishing participants to make sure that they are aware they need preservation strategies for their content (outsourcing, normalisation, migration and emulation). In the Keepers Registry, as we later learned from Peter Burnhill (EDINA and Head of Edinburgh University Data Library), 22000 e-serial titles are being preserved with “archival intent”. Knowing that 113000 titles (issn’s) are registered, we have only 19% save. Eefke therefor called out to solve the identifier soup and to make sure that we are creating the connections to the future, so that they in future can make their connections back to us.
Talks that might be worthwhile to be watching if you have a spare moment would be the ones from Jaso Swedlow, Professor of quantitative Cell Biology at the University of Dundee and President of Glencoe Software, talking about OMERO: The Open Microscopy Environment. OMERO deals with (the storage of) images. Swedlow introduces the ubiquitous image problem: is it a pretty picture, a measurement or a resource? According to Swedlow his tool brings in a driver for integrity, and published trusted scientific data.
Paul Groth, from the Department of Computer Science & the Network Institute at the Free University in Amsterdam, wondered what impact really is. Policy makers are interested to know whether you are doing good science. Evidence up till now has been limited to the publication (article), and not included slides, videos, codes, data or the fact that you might have different types of story to tell (citation is not always the driver). Altmetrics catches activity in online tools and environment. Paul gave us a few examples (ImpactStory; Open Phacts – published AND discussed AND cited; LISC 2013 – where results of a workshop are saved as short wrap-ups in figshare).
Mike Taylor, from labs.elsevier.com, presented seven reasons why one could be bothered about altmetrics and gave us an insight what work-in-progress is at Elsevier. Would it not be great to have real-time information on what others are reading right now?
And just quickly to wrap this up, the “dotcoms-to-watch” session was a good addition to the other lectures. The company Kudos makes an effort to match people to the right articles. ReadCube told us about ReadCube Instant PDF (keeping users online and engaged) and ReadCube Access, an eCommerce system for libraries, with access restrictions in exchange for lower prices, rent, buy or download. The latter one has recently been launched with the University of Utah as development partner.
A moral appeal
As said one of the exciting moments was the keynote of Sander Dekker. His full speech is in full-text available at Science Guide. Dekker is convinced that the digital world will be a game-changer in the world of scientific publishing. Above all he sees open access as a moral obligation, and a matter of principle: the whole of society will benefit from free and open access. So a true challenge for all stakeholders, let us hope we will indeed be able to put the flags out! Connecting this opening keynote with the closing one by David Sweeney (from HEFCE), is interesting. Sweeney, experiencing the UK situation as a big funder, sees three things we should be doing, i.e. address the double dipping issue; do what you are allowed to do (publishers are doing their part, how about academia and funders?); and test if we really need embargo periods.
It is perhaps enough to repeat two of the Einstein quotes Bernhard A. Sabel (giving a guest lecture about The Psychology of Innovation) presented, they are always so true:
“Great spirits have always encountered violent opposition from mediocre minds”
“Everything that is really great and inspiring is created by the individual who can work in freedom”
And three to take away from Sabel:
“Old technology or lawyers are in committees”
“Get the brightest people, do not compromise on people – ever”
“Sometimes it is better to be sufficiently ignorant”
At the closing panel there were representatives from a funding agency, publishers, and research communities. The panel drew no real conclusion, but the voice of librarians was clearly not included in the panel, which is perhaps typical. Libraries were a logical partner as a license broker, and now that open access is growing to become a commodity (why not;-), we should (and will) find new challenges. Susan Reilly from LIBER used a few keywords at her lecture earlier that day: Libraries should focus on digital preservation, improve findability and integrity, and aim at resource sharing / collaboration.
Data were addressed at the conference, but no new insights for me. As DataCite representative it was clear though that we should keep on pushing the necessity to use persistent identifiers to make your data findable, citable and usable. Partners such as Brian Hole from Ubiquity Press might be able to help us. In one of his last slides he said that they help universities to set up a data repository, and by doing so give power to the university presses.
Finally I would like to repeat the reference Peter Burnhill made to the State of the Union, 2014 should be the Year of Action!
From time to time I also attend sessions that are not specifically related to library stuff. There was a library-related reason though that made me go to Eindhoven for the 3TU Conference on Innovation and Technology on December 6, 2013, and that was signing off the consortium agreement of our 3TU.datacentrum (finally!).
So that gave me the opportunity to have a (brief) peek in other sessions. Our (Delft) Kees Vuik introduced the session on “Invisible mathematics: three tangible results”. I liked the “Intel stamp” that was used throughout this session: “Math Inside”. Perhaps one would not realize this but in so many topics related to e.g. optimising queuing (in a shop; for a helpdesk; or as part of a service bus); simulating maritime circumstances for large vessels; or thinking about your local electricity supply, the mathematics “inside” remains invisible, though is an essential part of the project. It reminded me of my study Materials Science. We had a similar problem, because – apart from fundamental research – in the applied scene material scientists facilitate other disciplines, essential yes, but somewhat invisible.
At the innovation market I found some other interesting stuff. What to think about LikeLines? Via a navigable heat map users can jump to interesting regions in the videos they are watching. Or the INSYGHTLab, where they work on multi-camera experiments for 3D reconstructions, to get to highly interactive screens.
I should also mention Federico Toschi – he spoke about “Fluid dynamics challenges for energy and health”, and showed us how understanding fluid dynamics is essential for health issues such as the rheology of blood in our vessels. Here we have a link to one of his datasets in our 3TU.datacentrum, which brings me back to the beginning – we can be proud to have our agreement finalized. Bring in the data!
We just can’t get enough – talking about data
On 8 May 2013 3TU.Datacentrum launched its partnership with DANS. The establishment of the coalition Research Data Netherlands will bring together knowledge and expertise about research data. And above all it has the intention to unite research libraries, archives or other organisations that keep (trustworthy) data repositories.
Open Access has been around us for quite some time, but the past months more and more one pagers, position papers, network sessions, hearings have been written and organised.
One reason, amongst many others, of the current increased attention to Open Access, is that is has the potential to provide all stakeholders with evidence of the high standards of quality and integrity which the scientific system has traditionally imposed on itself.
That is why I quote the position paper undersigned by five Dutch universities to seriously consider the positive impact of Open Access on the use, re-use, and citations of scientific data. These five universities (Delft University of Technology, Erasmus University Rotterdam, Leiden University, TU Eindhoven and University of Twente) cooperate in multidisciplinary research that covers all societal challenges as mentioned in Horizon 2020. Universities want their research to be shared with society, so that it is available for new research, insights and innovation.
“In order to bridge the innovation divide in Europe, Open Access to data should be actively pursued, as sharing data can foster the advancement of excellent researchers, with due respect, however, for the legitimate commercial, national security and privacy interests. Open Access to research data must be encouraged to combat scientific misconduct and to foster the professionalization of researchers. Also in this Age of Big Data the rich universe of research data could be accessible,”
The momentum for this position paper, and others, was provided by the EC public consultation on Open Research Data.
For me personally it is essential that research data created in the public domain should be kept there. As publishers are changing their business and expanding it to the current research domain and evaluation metrics, we Libraries should also step up.
It is not just about finding that one apple in the jungle (citing a post a researcher and chair of one of our library committees brought to my attention), but also to bring the university “fruits” back for easy pickings 😉
This week I attended for the first time in my Library Life the LIBER Congress (which took place in Munich, Germany). It was a strange week, where I was of course heavily occupied with some possible governmental budget measurements, and met with a lot of my Library colleagues.
The main reason for coming over was the workshop LIBER organised on the “10 recommendations on Research Data Management, what’s next” on Wednesday 26 June, and the first face-to-face meeting of the steering committee of Scholarly Communication and Research Infrastructures. I am a member of that committee and my colleague @jprombouts, head of our Research Data Services, is member of its working group. Research (Information) Infrastructures and the Future Role of Libraries was the main theme of the Conference that took place for the remainder of that Wednesday, Thursday and Friday.
The main keynote lectures were therefore focused on this topic, and for me the two keynotes on Thursday were the most appropriate to this theme. Liz Lyon, director of UKOLN, University of Bath, showed us that universities could regard themselves as data publishers and that they should take responsibility for their own data products. Libraries can help researchers publish their data, with curation, discoverability, citation, formats and metrics. Librarians should be more and more be part of the Lab teams. Carlos Morais Pires from the European Commission made a nice comparison between engineers and librarians. “Engineers stop when things start to mean something”. Further he gave an overview of the things (formal communications / recommendation) the EC has done, and what is to be expected in relation to Horizon 2020. Geoffrey Boulton’s speech on Friday was similar to the one I happened to hear at the 10th anniversary meeting of LERU, in Bruxelles, last November. The discussion afterwards though made me sending tweets again, because I could not agree more: Talk to researchers and ask them what they need instead of going to spread the word about what you do. And see the library as a function, not an entity. Or as I phrased it myself: “librarying, it is a verb, it is active, it is dynamic, it is not a thing!
At our own workshop Jeroen informed the participants who attended this workshop (some 80 – 100 people) about our 3tu.datacentrum, our collaboration with DANS in Research Data Netherlands, and emphasized that it is important to “think big, start small and act now”. He took care of recommendations 5, 6, 7, and 8, which focused on collaboration and services. These recommendations were finalized and prioritized during the LIBER-conference in Tartu last year. Wolfram Horsten, from Oxford University Library, focused on Policy & Infrastructure (recommendations 4, 9, and 10) and showed us that it is good to start with a research data policy, and that you should have a centrally led approach (either with the library in the lead, or together with more supporting services), but you also should let the local initiatives flow. Partnering! was his final word at the discussion at the end. Rob Grim, the chair of the working group, from Tilburg University Library and IT Services, made the ten recommendations complete, with 1,2 and 3 and emphasized that libraries should work on having skilled people. He also referred to the implementation plan the working group will now start working on.
Of course there were also other session worth attending. I just pick two, to be a bit selective (as we heard also from Boulton, that is our role!).
The title “Meeting the needs of PhD candidates: Services, networks and relevance” attracted too many people for the size of the room. Eystein Gullbekk, Oslo Library, showed us their PhD on Track, which was launched one month ago. It provides modules under three tracks Review and discover / Share and publish / Evaluation and ranking. The principles they used during development of the modules were: Illustrate / Demonstrate / Explain / Provoke. Gullbekk’s second part focused on “being relevant”, where he used the principle of the actor network . Viewing a topic from different aspects, e.g. a publication can enact as apprenticeship or as accreditation, and you should realize that there can be conflicts of interest, so take different enacted realities into account and make it visible (both conflicts and resistance).
Birte Christensen-Dalsgaard, from the Royal Library Denmark, presented a successful crowdsourcing project: “Denmark seen from the air”. The Royal Library has 18 million photographs, so what to do with these, how can they be made useful? This project used the collections of photographs of farms “seen from the air”. It started with 200 k negatives from the area “Fyn”. The idea was to get more data about the precise location of the farm. At a certain time in the project every minute a “farm was moved”. At this moment 87% of pins have been geotagged. So as said, pretty successful! Some take-aways from Birte: Appeal to your roots; and Continue to have new material, to attract people again.
So was it a valuable experience? Yes, I would say so, the mixture of meeting people and hearing interesting stories makes it going of course. And now back to my librarying 😉
From March 18-20 I joined with my colleague @jprombouts the launch event of the research data alliance. Obviously much discussion took place on governance issues. However, I also learned some stuff in Goteborg, and took a few ideas back home.
The launch was kicked off by Neelie Kroes, where she put the necessity to form this alliance forward: the EU is supporting open science, and wants to make science work better for all of us, with ownership and cooperation of scientists themselves. Another interesting contribution came from Peter Fox (Tetherless World constellation / Rensselaer Polytechnic Institute). He gave us five considerations:
- Work as you’ve succeeded > what would it all look like 10 years from now?
- It is not <just> about data
- It is about the alliance
- Be aware of vertical integration opportunity and needs
- The culture around data has to change
Peter told the RDA to be ready, be dynamic, be active, and urged us to bring together Head, Heart and Gut. It is difficult to avoid that I am only blogging quotes that have already been sent around via twitter! I liked e.g. Francine Berman’s remark that it is not just your data, it is other people’s data as well.
Together with Ross Wilkinson and John Wood she forms the RDA Council (to be expanded), where they represent the original founding from US, Australia and Europe. The RDA is being formed, or perhaps a better word would be moulded, by its members into the right shape. Two working groups have been approved so far, i.e., on data type registries and pid information types. The Council emphasized that RDA is about connecting data, people and disciplines. Of course the world consists of more than Australia, US and Europe, so there were also presentations about progress made in the field of research data from Canada, India, South Africa and China.
We could digest real interesting content at the start of the second day. First we had Manfred Laubichler from Arizona State University telling us about the digital HPS (History and Philosophy of Science), http://digitalhps.org. He showed us that it is indeed not only about data, but also about the methods you need to deploy the data. Laubichler gave us an example from the Evolutionary population ecology. We saw that researcher Bradshaw changed his mind when comparing his statement in 1948 with the one in 1965. To understand why that happened you need to understand the scientific context of the whole field. So he concluded that these computational approaches require cyberinfrastructure, open and transparent (big) data and linkable repositories. Philip Bourne (from UCSD, and later in the same week also present at Force 11), taught us some lessons:
- It is all about trust (“trust in the data is perhaps our biggest achievement”), so listen to your community & engage them in every part of the process
- Data quality begats trust (support for versioning hence the copy of record, all versions accessible)
- It is all about people (curators are the unsung heroes)
- It is NOT all about institutions. No data standards body has directly influenced PDB, the protein databank
- It is about Openness. PDB should be more transparent about data usage .
Further interesting stuff from Bourne was that the thought that data are created are equal must end, that we need to understand how data are used, that reductionism is not a dirty word, that we should do more with the long tail, and should stop looking at funding agencies. And to conclude: “Think about the questions we wish to answer rather than simply being able to retrieve the data.”
The remainder of the launch meeting was perhaps really what is all about – established and perhaps-to-be-established working groups gathered for afternoon sessions on the second day and reported back on this the next day. We can tell at the next (mid-September in Washington) plenary RDA meeting what real actions have been taken up by all these groups, what plans are still valid and where new things are added.
Also for us both at 3TU.datacentre and TU Delft Library we need to work on our ambition and see where we can streamline this with all the RDA initiatives. Will we be able to take part in the yet-to-be-approved engagement working group , the publishing data interest group (with its subgroup on citation of dynamic data), while our DANS colleagues chair the to-be-approved certification working group and more interesting stuff is going in, e.g. in Preservation, or PID information, Terminology, etc ….? What we know for sure is that we cannot do everything, There were three things though that I brought back with me to have some further thoughts about:
- Should we (copying Research Data Canada) start a Research Data Netherlands initiative? Where we make sure that there is a voice for the Netherlands in several associations, alliances, working groups and that we think about an efficient workload and division in topics, disciplines?
- Would it be an idea (perhaps for the to-be-approved RDA Working Group on Preservation?) to start working on a retention table, so that we take the advice to work on “reductionism” into account?
- Is Dataverse Network (which we also will offer to our Delft, Eindhoven and Twente scientists) the thing that is “just as easy” for our scientists to use as Dropbox? Mind you, there is a FileSender option, offered by Surfnet, I am not sure whether we knew about this in Delft!
There was so much more, but I guess I should stop, but not without two more quotes:
- Scott Brim: We should get the horse to drink, the desperate need is there, but it is only clear to us
- We could view RDA as green house to let the flowers grow.
Blog views as per June 25, 2013: 744. After that date post was migrated to this new url.