Skip to content
banner-doccoll4.1.jpg

University of Brighton 2013 studentships closed for applications at 4pm on Thursday 11 April 2013

If you have submitted an application but do not hear from us within 3 weeks, regretfully you have not been successful with your application. The 2013 studentship have attracted an unprecedented number of quality applications and competition is understandably strong. The University of Brighton will offer research studentships in 2014 and will again welcome applications.

Interviews for shortlisted candidates will take place between 7 and 17 May 2013. All those invited to interview will be informed of the outcome by 12 June 2013

If you have a question regarding your application please contact our Doctoral College on +44 01273 642915 or doctoralcollegedean@brighton.ac.uk


Case-based text mining: the application of case-based reasoning to the probabilistic inheritance models of text analysis

 


Application deadline is 4pm, 11th April 2013

Apply now

.......................................................................................................................................................

The amount of information now stored electronically is vast, and growing rapidly year by year. As well as the overt knowledge contained in the individual documents and databases, this global information store contains a huge amount of hidden knowledge - knowledge that can only be discovered by looking for patterns across many individual data items, a process known as 'data mining'. And while advanced methods exist for data mining well-organised, structured information in database records, processing of information that is unstructured, and in particular information stored as text, is much more challenging.

LPS9_CBRTM1_for_web.JPGDiscovering new knowledge from large numbers of text documents is known as 'text mining', and uses advanced computer science techniques from Natural Language Processing, Knowledge Representation and Statistical Machine Learning to analyse documents and uncover new information contained in the collection as a whole, but not necessarily in any individual document.

Text mining is a rapidly developing field with a wide range of applications. Much work to date has focused on bio-medical text mining, where analysis of many scientific papers and abstracts which report results independently has led to the discovery of new links, such as potential new uses for drugs, or identification of genes associated with particular conditions such as osteoparosis [1]. But text mining also has applications across a very wide range of fields: medical research, cultural heritage, history, political analysis, customer relations, newswire monitoring, social media and online gaming – anywhere where people provide large numbers of related texts and we want to extract new knowledge from them. The potential economic and research benefits of text mining have recently been highlighted in reports from the UK government [2] and JISC [3], as well as internationally.

In our current work on text mining, we are exploring the combination of standard ‘maximum entropy’ statistical pipelines with specialised inference engines which use probabilistic inheritance models. Currently we build these inheritance models largely by hand, an approach which is only viable for relatively small-scale problems.  This studentship will explore the potential for applying techniques from Artificial Intelligence and 'Big Data' processing [4] to automate the inheritance model development on a larger scale.  In particular you will explore the application of Case-based Reasoning (CBR), a machine learning method that allows for the reuse and adaptation of domain knowledge encoded in a case base of past problems, to develop methods for creating larger inheritance models semi-automatically. This will require investigation into the use of High Performance Computing and agent-based techniques to deal with large unstructured amounts of data, for example using the Hadoop framework. The aim is to develop a more effective hybrid (inferential/statistical) approach to text mining which delivers improved performance and coverage, and more rapid deployment to new domains.

This project has the potential to make a significant contribution to the fields of probabilistic inheritance modelling, text and data mining technology, machine learning, case-based reasoning and text mining applications.

[1] "Text  Mining and Data Analytics in Call for Evidence Responses." UK Government 2011.
[2] I. Hargreaves, "Digital Opportunity: A Review of Intellectual Property and Growth," 2011.
[3] D. McDonald. "Value and Benefits of Text Mining."  Joint Information Systems Committee. 2012.
[4] "Big data: The next frontier for innovation, competition, and productivity." McKinsey Global Institute. 2011.


Contact the Doctoral College

For more information about this project, or to be put in contact with a supervisor, please contact the doctoral college.

+44 (0)1273 642915

doctoralcollegedean@brighton.ac.uk

Apply now

Search more projects for life and physical sciences