Friday, 14 December 2007
"Research today is data-intensive, compute-intensive, collaborative and multidisciplinary. Researchers are becoming "extreme information workers", looking for subtle signals in great volumes of data. Technologies are emerging that enable a step change in handling scientific data: visualisation, analysis and processing, and also data management and preservation.
Where we have failed, so far, is in making it as easy to use the technology for handling these challenges as it is to use the web. Some communities, like particle physicists, are quite happy with technically complex middleware, but most are unwilling to work with these sorts of tools.
The new technologies of the social web may be the key to empowering researchers in the new data-centric world. It’s already happening in some communities. [...]
Beyond the lab, academic publishing is on the verge of a revolution. [...]
In the new world of e-Research, everything is connected in the cloud – the web-based applications and data stores out there in the internet. There will increasingly be services and tools in the cloud, accessed through simple interfaces via a web browser. [...]"
Art or science? describing the key skills and competencies a project manager needs to succeed e.g. managing uncertainty, managing expectations
What an elephant is like on making the most of project meetings http://www.bcs.org/server.php?show=ConWebDoc.16426
"There is a renewed focus on campus infrastructure to support research programs. Developments include: policy, technical and economic influences that are leading to a partial re-centralization of computing functions; radically new high performance network and distributed computing technologies; a rethinking of storage functionality and economics; requirements for long-term data management, curation and preservation; and growing faculty demands for informatics support services. An additional dimension of these needs involves information and technology intensive collaborations among groups at multiple campuses (sometimes characterized as collaboratories or virtual organizations). Complementing the organizationally oriented work on e-research already described, CNI is also concerned with the institutional and cross-institutional rdevelopment of technical infrastructure, with a particular focus on large-scale storage and data management, and on collaboration tools and environments."
The idea of an Executive Roundtable is an interesting way to engage senior stakeholders:
"The Executive Roundtable assembles executive teams (usually the chief librarian and chief information technology officer) from about ten institutions for a focused two-to-three hour discussion of a specific topic of interest on the morning of the first day of the Task Force meeting. Past topics have included institutional repositories, learning management system strategies, identity management, learning spaces, funding innovation, and infrastructure to support research, which brought together vice presidents or vice provosts of research, in addition to the usual Roundtable organizational representatives from libraries and information technology."
"In the 2007-2008 program year CNI will continue to engage e-research developments both in the sciences and the humanities. The US National Science Foundation is launching major programs addressing data curation (the DataNet initiative, and also the Community-based Data Interoperability Networks program), and we will be highlighting these in our Task Force meetings."
"CNI is concerned with questions about availability of data related to scholarly work, and has been engaged in a number of discussions around open access, open science, and open data as they relate to this question, as well as discussions about disciplinary norms for data sharing. We will also continue to explore and document the ways in which data and computationally intensive scholarship are altering the nature of scholarly communication; the issues here include the legal and technical barriers to large-scale text and data mining; appropriate organizational, policy and technical strategies for linking articles and underlying data; and ways to construct scholarly works that are amenable to various combinations of human and machine use."
"As part of our ongoing exploration of the institutional implications of the emergence of e-science and e-research, we will continue to look at organizational and staffing questions. These include: how to appropriately combine and balance centralized and departmental support resources to most effectively support faculty and students; new information technology/library collaborations required by the e-research environment; and the staffing needs of data curation programs. In this endeavor we will work closely with ARL, where an e-science task force has recently mapped out a number of similar questions from a library perspective, and with the EDUCAUSE Cyberinfrastructure Task Force."
Thursday, 13 December 2007
The Andrew W Mellon Foundation announced its Awards for Technology Collaboration (http://matc.mellon.org/winners/2007-matc-awardees-announced/) which "honor not‐for‐profit organizations for leadership in the collaborative development of open source software tools with application to scholarship in the arts and humanities, as well as cultural-heritage not‐for‐profit activities".
Monday, 10 December 2007
"At the University of California, San Diego (UCSD), the University Libraries are [...] working collaboratively with the San Diego Supercomputer Center to build an intersect of personnel, expertise, and services to provide long-term preservation of and access to research data that enables domain scientists and researchers to carry-out longitudinal complex data analysis to support interdisciplinary research. This critical partnership is providing new opportunities to the UCSD community and when linked with opportunities being developed for a University of California (UC) system-wide grid service platform, it will truly transform the way discovery and access intersect at UCSD and within the UC system."
Professor Wood talks about current work of the JSR to develop a high level strategy to deliver real results, focusing on fewer bigger projects rather than many smaller projects. The data deluge is a key concern: the amount of data generated by research expected to rise almost exponentially. There are implications for institutions, not least, the costs involved. Professor Wood described a move from libraries of physical materials to virtual data stores. Some of the areas needing clarification are: getting the middleware right; agreeing approaches to metadata; and linking datasets effectively. Professor Wood is engaged with discussions at an EU level but feels one of the key roles of JSR is to communicate the urgency of the data deluge problem.
Alongside the work of JSR, JISC is engaging with Research Councils on the infrastructure needed to support research. Professor Wood also chairs JISC Scholarly Communications group which is now looking at various media and how these may be linked in a holistic way to support researchers. From an institutional perspective, the impact of JSR (and indeed sometimes JISC) is somewhat hidden from researchers. They will have heard of ja.net, maybe even JISCmail but may be unfamiliar with JISC itself.
Regarding the future of JSR, Professor Wood sees a need to focus on larger projects, quoting the examples of the Digital Curation Centre (http://www.dcc.ac.uk/) and the National Centre for Text Mining (http://www.nactem.ac.uk/), now starting to show results. It is vital to look at what researchers need otherwise there is a risk of different groups adopting different approaches. There is also a need to engage on an international level to ensure interoperability, thus enabling international collaboration.
Professor Wood explains the need to look ahead 10 years in order to develop a vision. He outlines 4 issues in particular which JSR must tackle:
- what sort of middleware should we support as standard?
- what software development do we need to maximise the infrastructure we have?
- what are the priorities for tackling data storage and supporting/sustaining repositories?
- what training is required to enable research communities to understand what is available?
Thursday, 6 December 2007
Wednesday, 5 December 2007
Tuesday, 4 December 2007
When E-Mail Is Outsourced
This looks at some of the issues institutions are now facing (although US focused) in deciding how to move ahead with email and other services. Microsoft and Google have both marketed to the higher education sector and offer the benefits of integration. But the choice facing institutions is not simple and raises a number of questions relating to:
- role of IT services
- privacy and ownership of data
- the value of an ".edu" or ".ac.uk" email address
- capacity to innovate
- support required
- ability to influence priorities for development.
- Have eScience Programme outputs reached the level of sustainability needed? How long is a reasonable length of time to expect a step change?
- Can we have generic tools given that research itself is not generic?
- How much tinkering of tools and software do researchers actually want to do?
- Although we have a culture of sharing software, there isn't the same culture of sharing data (file sharing does not equal data sharing!). The problem is not technological but cultural
- In time, will current students bring their social networking skills into research?
- What can be learned from Athens to Shibboleth move in terms of running a development programme to inform a production programme?
- Sustainability = continuing efforts, changing culture, reducing duplication, encouraging sharing and discussion, open collaboration. Must not forget the broader sustainability agenda (e.g. HEFCE shared services programme)
- The sofware engineering needs to be sound and built on a solid framework. Academia is perhaps not geared to developing robust software and middleware; funding agencies generally haven't funded software development; career progression reward for those developing software is difficult; staff recruitment and retention issues; sustainability not even on radar screen of many HEIs and most academics
- One option is the spinoff company - in this instance, it is important to establish trust between company and university. Takes time to get technology to market. The DTI/Technology Standards Board follow on is a great bridge. Keep the team together as far as possible
- The team needs a mix of scientific, domain, financial and business experience
- Sustainability depends on users but need to promote long term view (vision of integrating compute, data and collaboration is not easy for researchers in a hurry with a short term view); new ways of working takes researchers out of their comfort zones
- If you want to continue to innovate, maintaining what you have becomes more difficult – issues of scalability, competition for support. There is a tension between maintaining an infrastructure and innovating
- Sustainability lessons – work with user community; constantly innovate and deliver; develop modular easy to use software; strong promotion, personal ownership; vision - many new ideas
- Innovation has 2 strands – new technical capability, new science communities
- Is there a role for Full Economic Costing in sustainability?
- Need to get across that software and data are facilities and therefore need managing
- What is the role of institutions in helping to sustain a project?
Slides will be available from http://www.jisc.ac.uk/whatwedo/programmes/programme_einfrastructure/modelsofsustainability
Friday, 30 November 2007
Tuesday, 27 November 2007
Rob Lemmens from International Institute for Geo-Information Science and Earth Observation talked about end-user tools. He outlined the different approaches of corporate/national Spatial Data Infrastructures (SDIs) which is a centralised approach and Web 2.0 which is community driven. SDIs are based on stricter rules for annotation and accuracy tends to be higher than Web 2.0 tools, although this is changing. Rob outlined the need for a semantic interoperability framework (combination of ontologies, their relationships and methods for ontology-based description of info sources - data sets, services etc) and a semantic interoperability infrastructure (comprises framework and the tools to maintain and use the framework as well as the information sources produced within this framework). Rob's presentation also included a slide outlining the characteristics of an ontology which was a good representation and a demonstration of ontology visualisation (same tool which ASSERT is using for clustering?). Rob concluded by summarising what the geospatial community can learn and take from Web 2.0, for example tagging/tag clouds, tools for building ontologies (community tagging e.g Google Image Labeller), instant feedback (e.g. password strength bars when selecting a new password) - on the negative side, community-driven tagging can lead to weak semantics. Rob suggests combining the best of both SDI and Web 2.0 worlds - map the SDI and Web2.0 ontologies to create dynamic annotations of geo sources, thus improving discovery.
Ulrich Bugel from Fraunhofer Institut IITB presented on ontology based discovery and annotation of resources in geospatial applications. Ulrich talked about the ORCHESTRA project (http://www.eu-orchestra.org/) which aims to design and implement an open service-oriented architecture to improve interoperability in a risk management setting (e.g. how big is the risk of a forest fire in a certain region of the Pyrenees in a given season?). This question has spatial references (cross-border, cross-administration); temporal references (time series and prognostics); thematic reference (forest fire); and conceptual reference (what is risk?). ORCHESTRA will build a service network to address these sorts of question. Interoperability is discussed on 3 levels - syntactic (encodings), structural (schemas, interfaces), semantic (meaning). The project has produced the Reference Model for the ORCHESTRA Architecture (RM-OA), drawing on standards from OGC, OASIS, W3C, ISO 191xx, ISO RM-ODP. Many iterations of the Reference Model which led to Best Practice status at OGC. The ORCHESTRA Architecture comprises a number of semantic services: Annotation Service automatically generates meta-information from sources and relates them to elements of an ontology; Ontology Access Service enabling high-level access and queries to ontologies; Knowledge Base Service; Semantic Catalogue Service.
Ian Holt from Ordnance Survey presented on geospatial semantics research at OS. OS has one of the largest geospatial databases, unsurprisingly, with 400 million features and over 2000 concepts. Benefits of semantics research: quality control, better classification; semantic web enablement, semi-automated data integration, data and product repurposing; data mining - i.e. benefits to OS and to customers. OS has developed a topographic domain ontology which provides a framework for specifying content. www.ordnancesurvey.co.uk/ontology. Developed ontologies for hydrology; administrative geography; buildings and places. Working on addresses; settlements; and land forms. Supporting modules on mereology, spatial relations, network topology. Conceptual ontology- knowledge represented in a form understandable by people vs computational topology - knowledge represented in a form understandable by computers. A controlled natural language called Rabbit has been developed - structured English, compilable to OWL. OS is also part of the OWL 1.1. task force to develop a controlled natural language syntax. A project currently underway developing plug in for Protege with Leeds University - allows natural language descriptions and in the back end, will translate into an OWL model. The first release is scheduled for December with further release planned for March 08. Ian also talked about experimental work to semantically describe gazetteers - an RDF version (downloadable?) to represent the data and OWL ontology to describe the concepts. This work includes administrative regions and work underway to include cities etc. Through their work, OS has experienced some problems with RDF - e.g. may degrade performance (they have >10 billion triples); how much is really needed?. Ian described some work on semantic data integration e.g. "find all addresses with a taxable value over £500,000 in Southampton" so looking at how to merge ontologies (i.e. creating another ontology rather than interoperability between the two). Ian briefly covered some lessons learned - ontologies are never perfect and can't offer complete descriptions of any domain; automatic tools are used as far as possible. Ian also describe work on linking ontologies to databases using D2RQ which maps SPARQL queries to SQL, creating "virtual" RDF. Conclusions : domain experts need to be at the centre of the process; technology transfer is difficult - benefits of semantics in products and applications must be clarified.
Alun Preece from Cardiff University presented on an ontology-based approach to assigning sensors to tasks. The idea is to bridge the gap between people out in the field needing to make decisions (e.g. disaster management) and the data/information produced from networks of sensors and other sources. Issues tackled: data orchestration (determine, locate, characterise resources required); reactive source deployment (repurpose, move, redeploy resources); push/pull data delivery. The approach is ontology-centric and involves semantic matchmaking. Work on proof of concept - SAM (Sensor Assignment for Missions) software prototype and integration with a sensor network. This work is funded by US/UK to support military application - intelligence, surveillance and reconaissance (ISR) requirements. The work uses ontologies to specify ISR requirements of a mission (e.g. night surveillance, intruder detection) and to specify the ISR capabilities provided by different asset types. Uses semantic reasoning to compare mission requirements and capabilities and to decide if requirements are satisfied. For example, if a mission requires Unmanned Aerial Vehicles (UAV), the ontology would specify different types of UAV and the requirements of the mission (e.g. high altitude to fly above weather, endurance) and the semantic matchmaking (exact, subsuming, overlapping, disjoint) then leads to a preferred choice. The project has engaged with domain experts to get the information into the ontology and to share conceptualisations. Alun showed the Mission and Means Framework Ontology which is a high-level ontology which is fleshed out with more specific concepts.
Slides from the workshop will be uploaded to http://www.nesc.ac.uk/action/esi/contribution.cfm?Title=832
Wednesday, 21 November 2007
"Most clearly among our three case studies, the area of Web services demonstrates the manner in which interoperability can stimulate large-scale innovation."
Friday, 16 November 2007
"Bill Pike (Pacific Northwest National Laboratory), in his presentation on integrating knowledge models into the scientific analysis process [...] described the challenge of trying to capture scientific knowledge as it is created, with workflow models that describe the process of discovery. In this way, the knowledge of what was discovered can be connected with
the knowledge of how the discovery was made."
"If future generations of scientists are to understand the work of the present, we have to make sure they have access to the processes by which our knowledge is being formed. The big problem is that, if you include all the information about all the people, organisations, tools, resources and situations that feed into a particular piece of knowledge, the sheer quantity of data will rapidly become overwhelming. We need to find ways to filter this knowledge to create sensible structures... "
"One method for explicitly representing knowledge was presented by Alberto Canas (Institute for Human and Machine Cognition). The concept maps that he discussed are less ambiguous than natural language, but not as formal as symbolic logic. Designed to be read by humans, not machines, they have proved useful for finding holes and misconceptions in knowledge, and for understanding how an expert thinks. These maps are composed of concepts joined up by linking phrases to form propositions: the logical structure expressed in these linking phrases is what distinguishes concept maps from similar-looking, but less structured descriptions such as "mind maps". "
"..analyst Gartner predicts that four out of five companies will have taken the SOA route by 2010...SOA involves a fundamental change to the way firms think about IT - namely, as a series of interoperable business services, rather than as discrete IT systems."
The article also quotes Nick Masterton-Jones, IT Director of Vocalink: "I think SCA is something we're going to see a lot more of in the coming three years" SCA is Service component architecture "an open SOA promoted by major Java vendors to bridge the gap between people who understand the business domain and people who understand system design".
Monday, 12 November 2007
OECD Principles (2007): http://www.oecd.org/document/55/0,3343,en_2649_37417_38500791_1_1_1_37417,00.html
RIN's Stewardship of Digital Research Data (2007): http://www.rin.ac.uk/data-principles
MRC's Guidelines on data sharing: http://www.mrc.ac.uk/PolicyGuidance/EthicsAndGovernance/DataSharing/PolicyonDataSharingandPreservation/index.htm
BBSRC's Guidelines on data sharing: http://www.bbsrc.ac.uk/support/guidelines/datasharing/context.html
Plus some interesting outputs from JISC-funded projects:
- Liz Lyon's Dealing with Data report: http://www.jisc.ac.uk/whatwedo/programmes/programme_digital_repositories/project_dealing_with_data.aspx. A very comprehensive overview with a list of recommendations which is now being reviewed by JISC.
- GRADE project : http://edina.ac.uk/projects/grade/Grade_reportRSSv2.pdf. Found that researchers most commonly use USB stick and email to share small datasets. Also noted that as well as enabling sharing/preservation, a national repository would enable UK to contribute to European and other international intiatives.
- DISC-UK Datashare : http://www.disc-uk.org/docs/state-of-the-art-review.pdf. One interesting finding reported is from Australian colleagues who found that a single repository wasn't proving effective and they subsequently moved towards two distinct repositories: one to enable collaboration on work-in-progress and one for published outputs/datasets.
There's a lot in these links about the wider context; how things look now; barriers to data sharing (e.g. trust, IPR, time); discussion on possible solutions (e.g. social software models, reward/recognition, mandates).
Friday, 9 November 2007
- Savas Parastatidis from Microsoft talking about "the cloud" http://www.ogf.org/OGF21/materials/1031/2007.10.15%20-%20OGF%20-%20Web%202.0-Cloud%20Era%20and%20its%20Impact%20on%20how%20we%20do%20Research.pdf
- Dave de Roure talking about the JISC-funded myExperiment (VRE2) http://www.ogf.org/OGF21/materials/1030/OGF21myExperiment.ppt
And of course not forgetting the geospatial stuff...
"1.) Integrate OGC's Web Processing Service (WPS) with a range of "back-end" processing environments to enable large-scale processing. The WPS could also be used as a front-end to interface to multiple grid infrastructures, such as TeraGrid, NAREGI, EGEE, and the UK's National Grid Service. This would be an application driver for both grid and data interoperability issues.
2.) Integration of WPS with workflow management tools. OGF’s SAGA draft standard is where multiple WPS calls could be managed.
3.) Integration of OGC Federated Catalogues/Data Repositories with grid data movement tools. OGF’s GridFTP is one possibility that supports secure, third-party transfers that are useful when moving data from a repository to a remote service.
However, the real goal is not just to do science, but to greatly enhance things like operational hurricane forecasting, location-based services, and anything to do with putting data on a map. WPS is just a starting point for the collaboration. As the two organizations engage and build mutual understanding of technical requirements and approaches, many other things will be possible. "
Thursday, 8 November 2007
Key issues and lessons
- Projects should use wikis/websites to enable tracking of work through the development lifecycle
- Be prepared to adapt templates for project documentation
- As a Programme Manager, you may need more regular/frequent engagement with projects - the 6-monthly progress report is not going to be sufficient
Useful links (in no particular order)
"The LIFE Project has recently published a revised model for lifecycle costing of digital objects." The project team is looking for comments via the project blog.
More info at:
"King's College London is pleased to announce the establishment of the KCL Centre for e-Research. Based in Information Systems and Services, the Centre will lead on building an e-research environment and data management infrastructure at King's, seeking to harness the potential of IT to enhance research and teaching practice across the College. The Centre also has a remit to make a significant contribution to national, European and international agendas for e-research, and in particular to carry forward in a new context the work of the AHDS across the arts and humanities.
Planning for the new Centre began on 1st October 2007 and a major launch event is planned for Spring 2008. Further information and news about the Centre and its activities will be released over the coming months."
Wednesday, 7 November 2007
- a news item, Search and aggregators set to dominate, on the recent Outsell Information Industry Outlook report:
"Watson Healy said 2008 would be 'year of the wiki', with Web 2.0 technology replacing complex portals and knowledge management, and that 'a critical mass of information professionals would take charge of wikis, blogs or other 2.0 technologies on behalf of their organisations".
- an item, PubMed recasts rules for open access re-use, on the new guidelines recently agreed by the UK PubMed Central Publishers Panel:
"Under the terms of the statement of principles, open access (OA) published articles can be copied and the text data mined for further research, as long as the original author is fully attributed".
Tuesday, 6 November 2007
There've been several publications from this programme of work recently:
- User needs study: How JISC could support Business and Community Engagement
- Evaluation report: JISC Services and the third stream
- Final report: Study of Customer Relationship Management issues in UK HE institutions
- Study: The use of publicly-funded infrastructure, services, and intellectual property for BCE
- Business and Community Engagement: An overview of JISC activitiesPortable Document Format
Friday, 2 November 2007
Good to see NaCTeM :-) A good overview of the current services and a run-through their roadmap:
"NaCTeM's text mining tools and services offer numerous benefits to a wide range of users. These range from considerable reductions in time and effort for finding and linking pertinent information from large scale textual resources, to customised solutions in semantic data analysis and knowledge management. Enhancing metadata is one of the important benefits of deploying text mining services. TM is being used for subject classification, creation of taxonomies, controlled vocabularies, ontology building and Semantic Web activities. As NaCTeM enters into its second phase we are aiming for improved levels of collaboration with Semantic Grid and Digital Library initiatives and contributions to bridging the gap between the library world and the e-Science world through an improved facility for constructing metadata descriptions from textual descriptions via TM."
Other interesting snippets:
- SURFshare programme covering the research lifecycle http://www.surffoundation.nl/smartsite.dws?ch=ENG&id=5463
- a discussion on the use of Google as a repository : "Repositories, libraries and Google complement each other in helping to provide a broad range of services to information seekers. This union begins with an effective advocacy campaign to boost repository content; here it is described, stored and managed; search engines, like Google, can then locate and present items in response to a search request. Relying on Google to provide search and discovery of this hidden material misses out a valuable step, that of making it available in the first instance. That is why university libraries need Google and Google needs university libraries."
- feedback from ECDL conference, including a workshop on a european repository ecology, featuring a neat diagram showing how presentations are disseminated after a conference using a mix of web2.0, repositories and journals http://www.ariadne.ac.uk/issue53/ecdl-2007-rpt/#10
Wednesday, 31 October 2007
Monday, 29 October 2007
The article acknowledges the cultural barriers to using/sharing data and suggests policies are put in place to establish guidelines and principles, as well as training and mentoring to help develop the collaborative and information management skills required.
One of the case studies, Denton Wilde Sapte, cautions "People are so wrapped up in the technical whizz-bangs that they forget that IT is really all about information delivery".
"Organisations are recognising that some pieces of their information have more fundamental value than other parts, although that value might not be realisable today. For certain items of information its maximum value will only be achieved at some point in the future, so companies need to invest in good archiving, storage, search and retrieval systems today" Ian Charlesworth, Ovum, quoted in the article.
- info on and link to their Spatial Data Quality survey, which will inform the Spatial Data Quality Working Group's attempts to define a framework and grammar for the certification and communication of spatial data quality
- a slideshow demonstrating the use of OGC standards for earth observation.
Wednesday, 24 October 2007
The Semantic Web Vision: Where Are We?
"The aim of this article is to present a snapshot that can capture key trends in the Semantic Web, such as application domains, tools, systems, languages and techniques being used, and a projection on when organizations will put their full-blown systems into production."
Interesting way of sharing data - also they are launching a Private version presumably if you want to be careful who you share with. Some issues re quality tho - e.g. how could you be sure of provenance?
Tuesday, 16 October 2007
Monday, 15 October 2007
Results and analysis of the Web 2.0 services survey undertaken by the SPIRE project
ManyEyes is particularly interesting
Also a really good link to an article listing really good visualisation tools, including search - I really like the Visual Thesaurus.
"Hard drives currently have a one terabyte limitA single hard drive with four terabytes of storage (4TB) could be a reality by 2011, thanks to a nanotechnology breakthrough by Japanese firm Hitachi..."
Related story in New Scientist http://technology.newscientist.com/article.ns?id=dn12755&feedId=online-news_rss20
Friday, 12 October 2007
Thanks to Bill St Arnauld for pointing to this on his blog:
At the Gartner Expo this week, the following were discussed as the top 10 technologies organisations can't afford to ignore...
- Green IT
- Unified communications (interesting for VRE programme)
- Business Process Management (to support SOA)
- Metadata management
- Virtualisation 2.0
- Mashups and composite applications
- Web platform and Web-Oriented Architecture
- Computing fabrics
- Real World Web
- Social software
Thursday, 11 October 2007
"Crowdsourcing is an internet-enabled upgrade of the original focus group concept, according to Dell vice president Bob Pearson". The article goes on to show how several big companies are using crowdsourcing to develop products including L'Oreal, Kimberly Clark, Dell.
Wikipedia entry on crowdsourcing
Tuesday, 9 October 2007
Sunday, 7 October 2007
From their website:
"SEASR (Software Environment for the Advancement of Scholarly Research) is being developed by the National Center for Supercomputing Applications in cooperation with the Graduate School of Library and Information Science at the University of Illinois at Urbana-Champaign.
SEASR aims to:
- assist scholars in accessing and analyzing existing large information sources more readily and with greater refinement;
- give scholars increased portability of large information stores for on-demand computing; and
- empower collaboration among researchers by enhancing and innovating scholarly communities’ and their resources’ virtual research environments.
How will we do it? The SEASR development team will construct software bridges to move information from the unstructured and semi-structured data world to the structured data world by leveraging two well-known research and development frameworks: NCSA’s Data-To-Knowledge (D2K) and IBM’s Unstructured Information Management Architecture (UIMA). SEASR will focus on developing, integrating, deploying, and sustaining a set of reusable and expandable software components and a supporting framework, benefiting a broad set of data-mining applications for scholars in the humanities.
SEASR’s technical goals include supporting:
- the development of a state-of-the-art software environment for unstructured data management and analysis of digital libraries, repositories and archives, as well as educational platforms; and
- the continued development, expansion, and maintenance of end-to-end software system: user interfaces, workflow engines, data management, analysis and visualization tools, collaborative tools, and other software integrated into a complete environment."
- Access to research : "Success stories, such as TREC for information retrieval research [Voorhees] or the Human Genome Project [HGP], have devoted substantial expertise to creating the necessary infrastructure and managing the datasets with a very clear understanding of how they fit the research practices in their fields."
- Access to research : "Cyberscholarship needs superdata centers, which combine the storage and organization of vast amounts of data with substantial computing power to analyze it. Building such centers requires investment and long-term commitment on the part of an organization or discipline. While equipment can be purchased, expertise takes longer to establish. Superdata centers and the researchers who use them will need several years before they become truly effective."
- Value-added services : "As our systems grow more sophisticated, we will see applications that support not just links between authors and papers but relationships between users, data and information repositories, and communities. What is required is a mechanism to support these relationships that leads to information exchange, adaptation and recombination – which, in itself, will constitute a new type of data repository."
- Also a reference to the need for summarisation on p 10 referencing humanities research in particular.
Cyber-Enabled Discovery and Innovation (CDI)
Friday, 5 October 2007
Richard has also given a summary of a talk from Andrew Herbert from Microsoft - interesting points include: how sensor networks will enable real-world data to be used in simulations; how to get the right balance of skills in the research workforce.
There's also some handy guides to some of the key themes of the conference, including eScience. The blog links to transcripts as well as the webcast, and to some videos (including two relevant to eScience, by Andrew Herbert and Walter Stewart).
"Brian Witcombe, a radiologist at Gloucestershire Royal NHS Trust received the Ig Nobel prize in medicine for his study of sword swallowing and its side effects."
Ig Nobel home page: http://improbable.com/ig/
Thursday, 4 October 2007
This week's Computing (4 Oct) mentions 5 information management technologies to watch out for in the next 3 years:
- consistency and interoperability via emerging standards: JSR, XQuery, JDBC, SDO
- UIMA: http://domino.research.ibm.com/comm/research_projects.nsf/pages/uima.index.html
- automated email archiving and retrieval systems
- extended enterprise search facilities
- unified interface for information management
Wednesday, 3 October 2007
It mentions how librarians should accept that some services might be better done through technology or even by other organisations. Instead, they should focus on where they can really add value e.g. managing scientific data, curating digital information like blog posts. The post mentions a recent event organised jointly by University of Washington Libraries and Microsoft, Global Research Library 2020.
Tuesday, 2 October 2007
OGF21 is later this month - worth seeing what comes out of the following workshops:
Web 2.0 - features presentations on research and commercial applications of Web 2.0 technology including HPC, Cyberinfrastructure, Semantic Research, Social Networking
Geospatial - a collaboration with the OGC, covering topics such as grid-enabling the OGC's Web Processing Service and a NSF proposal on Community-based Data Interoperability Networks
GridNet2 - highlighting the work of the UK eScience at the OGF and in related standards bodies
Monday, 1 October 2007
A Task Force to be co-chaired by Fran Berman, director of the San Diego Supercomputer Center at the University of California and a pioneer in data ‘cyberinfrastructure’, and Brian Lavoie, an economist and research scientist with OCLC, will receive support from the Library of Congress, the National Archives and Records Administration and the Council on Library and Information Resources, along with JISC.
The Blue Ribbon Task Force on Sustainable Digital Preservation and Access is expected to meet over the next two years to gather testimony from experts in preparation for the Task Force's Final Report. Though significant progress has been made to overcome the technical challenges of achieving persistent access to digital resources, the economic challenges remain."
Friday, 28 September 2007
"Digital data are increasingly both the products of research and the starting point for new research and education activities. The ability to re-purpose data – to use it in innovative ways and combinations not envisioned by those who created the data – requires that it be possible to find and understand data of many types and from many sources. Interoperability (the ability of two or more systems or components to exchange information and to use the information that has been exchanged) is fundamental to meeting this requirement. This NSF crosscutting program supports community efforts to provide for broad interoperability through the development of mechanisms such as robust data and metadata conventions, ontologies, and taxonomies."
"PCs sold in the European Union (EU) should not come with an operating system already installed on them, according to a new report. The publication created by the Globalisation Institute and submitted to the European commission (EC) suggests that it is not in the interests of consumers to keep selling systems that are bundled with Windows."
Some of the specific issues considered in the report are:
- Audit and certification: including a brief overview of recent work and current instruments. The recommendation is to use the lightweight version of the DRAMBORA toolkit due this month, as part of an annual self-audit cycle. It is acknowledged that short-term staff contracts and funding cycles have a detrimental effect on the organisational aspects which need to be in place to ensure sustainability. A further recommendation is to explore LOCKSS or CLOCKSS "to engage the crystallography community in the preservation of its valuable data"
- Open Archival Information System (OAIS) standard: the report recommends eBank develop a formal deposit, ingest, validation and dissemination policy and that work on Representation Information looks wider than just the eCrystals repository but looks at the whole crystallography domain
- Metadata: The report recommends further exploration of provenance information as currently versioning is the only type of information stored; and also how preservation metadata can be generated, extracted and maintained automatically
Thursday, 27 September 2007
CODATA Data Science journal - "Open Data for Global Science"
CT Watch Aug 07 - "The Coming Revolution in Scholarly Communications and Cyberinfrastructure"
Tuesday, 25 September 2007
Also useful is their Research & Innovation Facts and Figures which includes income by subject area (the clear leader is clinical medicine) and trends in government expenditure on R&D
Cyberinfrastructure, Data, and Libraries, Part 1 : A Cyberinfrastructure Primer for Librarians http://www.dlib.org/dlib/september07/gold/09gold-pt1.html
Cyberinfrastructure, Data, and Libraries, Part 2 : Libraries and the Data Challenge: Roles and Actions for Libraries http://www.dlib.org/dlib/september07/gold/09gold-pt2.html
I really like the Gartner's Hype Cycle of Emerging Technologies quoted on the page. A quick search turned up a 2007 version but not available for free :-(
Added 15/10/07: Yahoo also have a mashup service, MapMixer.
Friday, 21 September 2007
Anyway, Random House have included excerpts of The Long Tail on the web here.
Thursday, 20 September 2007
Wednesday, 19 September 2007
Article in International Herald Tribune about Microsoft's failed attempt to get their open document format, Office Open XML, recognised as an international standard.
OS MasterMap goes online for universities and colleges across Britain
Tens of thousands of students, staff and researchers at universities and further education colleges across Britain have online access to the country’s most advanced digital mapping from this month....
Peter Murray has written up some of the day's presentations on his blog.
Conrad Taylor, introducing the day, covered issues around mark-up and tagging, referring to the difficulties of marking up audio/video and unstructured text; time constraints; and difficulties of subject classification.
Tony Rose talked about information retrieval and some of the innovative approaches out there:
- semantic searching - as demonstrated by hakia and lexxe
- natural language processing - as demonstrated by powerset and lexxe
- disambiguation - as demonstrated by quintura
- assigning value to documents - as demonstrated by google
He sees future of search as addressing the following:
- rich media search
- multi/cross lingual search
- vertical search
- search agents
- specialised content search
- human UI
- social search
- answer engines
- mobile search
Tom Khazaba from SPSS talked about their products for text and data mining and the various applications they're used for (CRM, risk analysis, crime prevention etc). He stressed that the results of text analysis have to be fitted into business processes and mentioned briefly how Credit Suisse have achieved this. He listed the keys of success of text/data mining solutions:
- ease of use
- supports the whole process
- comprehensive toolkit - ie features visualisation, modelling etc so all you need is in one place
- openness - using existing infrastructure
- performance and scalability
- flexible deployment
Dan Rickman introduced geospatial information systems. He referred to the importance of metadata and ontologies for handling the large volumes of unstructured data. In geospatial information, there is also a temporal aspect as many applications will view an area over time. He mentioned OS' work on a Digital National Framework which has several principles:
- capture information at the highest resolution possible
- capture information once and use many times
- use existing proven standards etc