Monday, 29 October 2007

Data management - learning from commercial sector?

Computing (25 Oct 07) has an article on data management, featuring BAE Systems as one of a series of case studies. BAE estimates "80% of networked employees were wasting an average of 30 minutes a day retrieving information , while 60% were spending an hour or more duplicating the work of others".

The article acknowledges the cultural barriers to using/sharing data and suggests policies are put in place to establish guidelines and principles, as well as training and mentoring to help develop the collaborative and information management skills required.

One of the case studies, Denton Wilde Sapte, cautions "People are so wrapped up in the technical whizz-bangs that they forget that IT is really all about information delivery".

"Organisations are recognising that some pieces of their information have more fundamental value than other parts, although that value might not be realisable today. For certain items of information its maximum value will only be achieved at some point in the future, so companies need to invest in good archiving, storage, search and retrieval systems today" Ian Charlesworth, Ovum, quoted in the article.

NIH funds research data sharing project

"Researchers at ICPSR have been awarded a two-year grant by the National Library of Medicine, National Institutes of Health (NIH) for a project entitled, Barriers and Opportunities for Sharing Research Data. The project will investigate the extent of research data sharing in the social sciences and assess whether research data sharing is related to other aspects of the scientific process including scientific publication."

Rethinking the publication process

Interesting post from Peter Murray-Rust's blog:
Considers preprints, Creative Commons license, using non-copyrighted images, managing citations.

OGC News October 2007

Latest issue of OGC News has a couple of interesting links:

- info on and link to their Spatial Data Quality survey, which will inform the Spatial Data Quality Working Group's attempts to define a framework and grammar for the certification and communication of spatial data quality

- a slideshow demonstrating the use of OGC standards for earth observation.

Wednesday, 24 October 2007

Project Management blogs

"Semantic Web vision: where are we?"

Thanks to Alan Rector for pointing this out:
The Semantic Web Vision: Where Are We?
"The aim of this article is to present a snapshot that can capture key trends in the Semantic Web, such as application domains, tools, systems, languages and techniques being used, and a projection on when organizations will put their full-blown systems into production."

"Democratization of innovation"

Thanks to Bill St Arnaud, for pointing out in CANews - the Economist recently ran an article on democratization of innovation, focusing on the way technology such as web 2.0 are resulting in a trend away from centralised R&D to more distributed, participative methods:

JISC Inform

Latest issue features article on visualisation and podcast with Prof Roy Kalawsky:
Also articles on Go-Geo and on open source.

Very topical mashup

LA Fire Dept are using this mashup to show the spread of the wildfires in LA, placing of evacuation centres and other support facilities:


Been meaning to look at this for ages....
Interesting way of sharing data - also they are launching a Private version presumably if you want to be careful who you share with. Some issues re quality tho - e.g. how could you be sure of provenance?

Monday, 15 October 2007

Web2.0 reports

Techwatch report

Results and analysis of the Web 2.0 services survey undertaken by the SPIRE project

Guardian Online : Visualisation

ManyEyes is particularly interesting

Also a really good link to an article listing really good visualisation tools, including search - I really like the Visual Thesaurus.

BBC News : Drive advance fuels terabyte era
"Hard drives currently have a one terabyte limitA single hard drive with four terabytes of storage (4TB) could be a reality by 2011, thanks to a nanotechnology breakthrough by Japanese firm Hitachi..."

Related story in New Scientist

Friday, 12 October 2007

Gartner's Top 10 strategic technologies for 2008

Thanks to Bill St Arnauld for pointing to this on his blog:
At the Gartner Expo this week, the following were discussed as the top 10 technologies organisations can't afford to ignore...

  1. Green IT
  2. Unified communications (interesting for VRE programme)
  3. Business Process Management (to support SOA)
  4. Metadata management
  5. Virtualisation 2.0
  6. Mashups and composite applications
  7. Web platform and Web-Oriented Architecture
  8. Computing fabrics
  9. Real World Web
  10. Social software

Liz Lyon presentation on data curation

Slides from recent presentation by Liz Lyon to a NERC data management workshop

Thursday, 11 October 2007

Sainsbury Review of Science and Innovation


Computing (11 October 2007): Lisa Kelly "Web 2.0 taps the wisdom of crowds":

"Crowdsourcing is an internet-enabled upgrade of the original focus group concept, according to Dell vice president Bob Pearson". The article goes on to show how several big companies are using crowdsourcing to develop products including L'Oreal, Kimberly Clark, Dell.

Wikipedia entry on crowdsourcing

Software on demand

Computing (11 October 2007) this week features a story by Tom Young, Online software is in demand, which talks about new products launched recently, which "are hosted and accessed in real time rather than being installed on in-house systems". Adobe, IBM, Google, Yahoo are all either developing or releasing products, in an attempt to compete with Microsoft's dominance. The software-on-demand model offers a number of benefits around updates, licensing, virus protection, flexibility.

Tuesday, 9 October 2007

Sunday, 7 October 2007


Thanks Frederique for pointing to this - Chris Mackie mentioned this in a meeting earlier in the year but I hadn't followed it up since...

From their website:

"SEASR (Software Environment for the Advancement of Scholarly Research) is being developed by the National Center for Supercomputing Applications in cooperation with the Graduate School of Library and Information Science at the University of Illinois at Urbana-Champaign.

SEASR aims to:

  • assist scholars in accessing and analyzing existing large information sources more readily and with greater refinement;
  • give scholars increased portability of large information stores for on-demand computing; and
  • empower collaboration among researchers by enhancing and innovating scholarly communities’ and their resources’ virtual research environments.

How will we do it? The SEASR development team will construct software bridges to move information from the unstructured and semi-structured data world to the structured data world by leveraging two well-known research and development frameworks: NCSA’s Data-To-Knowledge (D2K) and IBM’s Unstructured Information Management Architecture (UIMA). SEASR will focus on developing, integrating, deploying, and sustaining a set of reusable and expandable software components and a supporting framework, benefiting a broad set of data-mining applications for scholars in the humanities.

SEASR’s technical goals include supporting:

  • the development of a state-of-the-art software environment for unstructured data management and analysis of digital libraries, repositories and archives, as well as educational platforms; and
  • the continued development, expansion, and maintenance of end-to-end software system: user interfaces, workflow engines, data management, analysis and visualization tools, collaborative tools, and other software integrated into a complete environment."

Internet evolution

"The site for news, analysis and opinion about the future of the Internet"

The Future of Scholarly Communication : workshop report

Report of a joint NSF/JISC workshop:

Some highlights:
  • Access to research : "Success stories, such as TREC for information retrieval research [Voorhees] or the Human Genome Project [HGP], have devoted substantial expertise to creating the necessary infrastructure and managing the datasets with a very clear understanding of how they fit the research practices in their fields."
  • Access to research : "Cyberscholarship needs superdata centers, which combine the storage and organization of vast amounts of data with substantial computing power to analyze it. Building such centers requires investment and long-term commitment on the part of an organization or discipline. While equipment can be purchased, expertise takes longer to establish. Superdata centers and the researchers who use them will need several years before they become truly effective."
  • Value-added services : "As our systems grow more sophisticated, we will see applications that support not just links between authors and papers but relationships between users, data and information repositories, and communities. What is required is a mechanism to support these relationships that leads to information exchange, adaptation and recombination – which, in itself, will constitute a new type of data repository."
  • Also a reference to the need for summarisation on p 10 referencing humanities research in particular.
The CyberScholarship roadmap includes automated metadata generation; provenance establishment; source validation; annotation tools; and contextual semantics.

NSF calls

Sustainable Digital Data Preservation and Access Network Partners (DataNet)

Cyber-Enabled Discovery and Innovation (CDI)

Friday, 5 October 2007

Participative web conference

The blog from the Participative Web conference this week has been a really good read. Richard Ackerman has posted on citizen science - "It is clear that the rapid pace of change is pushing those involved with science infrastructure to think about ways to interact with a broader public, to take advantage of the energy and creativity of the general population, promoting greater understanding of and participation in science. "

Richard has also given a summary of a talk from Andrew Herbert from Microsoft - interesting points include: how sensor networks will enable real-world data to be used in simulations; how to get the right balance of skills in the research workforce.

There's also some handy guides to some of the key themes of the conference, including eScience. The blog links to transcripts as well as the webcast, and to some videos (including two relevant to eScience, by Andrew Herbert and Walter Stewart).

GRH makes the news!

Not particularly relevant to eResearch but had to include it..! Gloucestershire Royal Hospital makes the news in New Scientist:
"Brian Witcombe, a radiologist at Gloucestershire Royal NHS Trust received the Ig Nobel prize in medicine for his study of sword swallowing and its side effects."
Ig Nobel home page:

Semantic image retrieval features a story New search tool gets the picture about a new search tool developed by Southampton Uni. The story mentions the limitations of the tool, e.g. that it is difficult to expand and may not cope with the variety of images on the web; but also the strengths e.g. dealing with language, producing more discriminating search results.

Thursday, 4 October 2007


This week's Computing (4 Oct) mentions 5 information management technologies to watch out for in the next 3 years:

Wednesday, 3 October 2007

Role of libraries

An interesting post yesterday on Science Library Pad:
It mentions how librarians should accept that some services might be better done through technology or even by other organisations. Instead, they should focus on where they can really add value e.g. managing scientific data, curating digital information like blog posts. The post mentions a recent event organised jointly by University of Washington Libraries and Microsoft, Global Research Library 2020.

Tuesday, 2 October 2007

Tips for conference bloggers

Really useful tips on blogging from a conference


From OGF Grid Connections Oct 07 newsletter:

"In terms of user communities, OGF is pursuing a collaboration with the Open Geospatial Consortium (OGC). OGC has a suite of tools for managing and presenting geospatial data -- anything that goes on a map -- and wants very much to extend their tools with the capability for distributed resource management, i.e., grids. I should also note that there is a Web 2.0 workshop at OGF-21 that covers social networking, semantic grids, and sensors. The fact that half of all Web 2.0 services registered at are geospatially related, and that Google is sending KML through the OGC standardization rocess, indicates that there is a huge potential for grids in this arena. "

OGF21 is later this month - worth seeing what comes out of the following workshops:

Web 2.0 - features presentations on research and commercial applications of Web 2.0 technology including HPC, Cyberinfrastructure, Semantic Research, Social Networking
Geospatial - a collaboration with the OGC, covering topics such as grid-enabling the OGC's Web Processing Service and a NSF proposal on Community-based Data Interoperability Networks
GridNet2 - highlighting the work of the UK eScience at the OGF and in related standards bodies

Monday, 1 October 2007

Blue Ribbon Task Force on Sustainable Digital Preservation and Access

"JISC is supporting an international initiative, led by US-based organisations the National Science Foundation (NSF) and the Andrew W. Mellon Foundation, to address the issue of economic sustainability in digital preservation.
A Task Force to be co-chaired by Fran Berman, director of the San Diego Supercomputer Center at the University of California and a pioneer in data ‘cyberinfrastructure’, and Brian Lavoie, an economist and research scientist with OCLC, will receive support from the Library of Congress, the National Archives and Records Administration and the Council on Library and Information Resources, along with JISC.
The Blue Ribbon Task Force on Sustainable Digital Preservation and Access is expected to meet over the next two years to gather testimony from experts in preparation for the Task Force's Final Report. Though significant progress has been made to overcome the technical challenges of achieving persistent access to digital resources, the economic challenges remain."