The Natural Language Processing (NLP) group at the University of Sheffield is one of the largest and most successful research groups in language and information in the EU. The group is based in the Department of Computer Science, and includes world-class teams in the areas of speech, language, knowledge and information processing, biotechnology, and machine learning for medical informatics. The Department of Computer Science was founded in 1982 and since then has established national and international renown for many aspects of its teaching and research. It was awarded a top Grade 5 in the most recent nationwide Research Assessment Exercise. The Department has much international collaboration in both research and teaching; companies that support its work include Daimler-Chrysler, GlaxoSmithKline, Motorola and Nokia.
The Natural Language Processing Group has focused on robust engineering of open source software and on scientific method (quantitative evaluation and repeatability) in artificial intelligence. The group has extensive experience in the fields of infrastructure for Human Language Technologies (HLT), information extraction, text summarisation, digital library support, question answering, terminology extraction, machine learning methods for HLT, HLT methods for Knowledge Management and the Semantic Web. USFD has a world-leading research record on human language technologies, developed within national and international research projects in these areas. USFD brings 15 years of experience in developing world class information extraction technology into the project and has relevant experience in extracting semantics from both textual and multimedia content. In a recent project with the UK National Archives, USFD was tasked with semantically annotating the UK Government web archive, producing a rich index in support of new search paradigms. Previously, USFD was a partner in the PrestoSpace project, which focused on preserving Europe's audiovisual legacy. USFD's role in the project was to enrich multimedia archives with semantic metadata used for indexing, searching, and supporting findability.
Within the Research Area 2, USFD will be developing methods that enable the condensation and consolidation of textual content. Methods for identifying textual and semantic similarity and redundancy will be employed for reducing the size of information being archived. Information condensation based on state-of-the-art methods for single- and multi-document summarisation will also be deployed. USFD is also involved in RA-3, leading WP 6 which deals with information contextualisation.
Prof. Hamish Cunningham has been a software engineer, systems architect, and (for the last 15 years) a researcher in NLP and knowledge technologies. He has participated in numerous national and European R&D projects, published widely and contributed to the organising and programme committees of a wide variety of conferences and workshops. He currently leads a research team of fifteen staff and performs research in these areas: Software Architecture for Language Engineering (Managed the development of the General Architecture for Text Engineering (GATE) used for infrastructure in many projects), Information Extraction (Participated in the MUC, ACE, Pascal, NTCIR evaluation programmes. Developed novel annotation graph-based finite state transduction IE front-end. Worked on IE for indexing multimedia (e.g., PRESTOSPACE, MUMIS projects) and for biodiversity research support in flora analysis and synthesis), Knowledge Management and the Semantic Web (Applying language technology to the automation of web semantics in NeOn, SEKT, KnowledgeWeb, the Advanced Knowledge Technologies project (AKT) and elsewhere), and Digital Libraries (The use of language technology to support richer annotation, indexing and collaborative modeling of culture, history and science, e.g. the PRESTOSPACE project.) Prof. Cunningham is an editorial board member of three journals and a scientific board member of the Information Retrieval Facility.
Dr. Valentin Tablan is a Research Fellow, active within the NLP group for over 11 years, where he is one of the project leads for the GATE family of products. His PhD work focused on Information Extraction and his MSc degree was specialised in parallel and distributed processing. At the University of Sheffield he has been involved with EUfunded projects since the year 2000, leading the Sheffield technical work for the Prestospace IP project and the Media Campaign STREP. The work in both PrestoSpace and Media Campaign was centred around generating semantic metadata for multimedia content. In both cases, novel multi-source approaches were devised to help mitigate the information loss inherent in automatic speech-to-text transcription. He was the engineering lead for SAM – a commercially commissioned project which focused on applying GATE-based semantic annotation on a very large scale. More recently he has been working on hybrid indexing systems that rely on automatically extracted semantic metadata. These support multiple parallel search paradigms that allow richer queries to be formulated, and more precise results to be found.
Dr. Mark A. Greenwood is a Research Associate who has been active within the NLP group for over a decade and, for the last four years, has been a core contributor to the GATE family of products. His PhD work focused on open-domain question answering which included research into aspects of information retrieval, information extraction, semantic annotation, and summarization. While working at the University of Sheffield he has been involved in a number of EU funded projects including X-Media, LarKC, Khresmoi, and TrendMiner where his work has mostly focused on information extraction and semantic annotation. He was also Sheffield's lead researcher for the UK National Archives government web archive semantic annotation project which involved ontology evolution as well as semantic annotation, indexing, and search.