LORE / Writing Research Proposals

This page is last updated on maandag, augustus 12, 2002

This page gives some good advice on how to write research proposals, something that all PhD students doing their research in "The Lab on Reengineering (LORE)" will do at least once during their carreer. The page contains a few patterns and an annotated example.


Busy Reviewer

>> Problem

How can you make sure that the reviewer will do a proper job of assessing your research proposal ?

This problem is difficult because:

Yet, solving this problem is feasible because:

>> Solution

Convince the reviewer that you have identified an important problem and a promising solution. Exploit the content and lay-out of your proposal to make those two ideas stand out, so that they will be noticed during a quick and dirty assessment.

Here are some hints to help you achieve these goals.

Convincing Case

>> Problem

How can you write a research proposal which convinces the reviewer to reward it with a grant ?

This problem is difficult because:

Yet, solving this problem is feasible because:

>> Solution

Prepare a convincing case by illustrating what you do and do not know and make clear how your research will fill the gap. 

Here are some hints to help you achieve these goals.


Below is an annotated sample of a recent research proposal I wrote last year. For the moment, I do not know whether it will be accepted, which makes it a good example, because it is in some sense similar to what you will be writing.

Reconstruction of Software Evolution Processes

>>> This title summarizes the problem we want to solve

Abstract. In modern software engineering, researchers regard a software system as an organic life form that must continue to evolve to remain successful. Unfortunately, little is known about how successful software systems have evolved, and consequently little has been learned from previous experience. Therefore, this project will reconstruct evolution processes of existing software systems by exploiting techniques to detect duplication in large amounts of data. This way, we will acquire a better understanding of the origins of successful software systems and in the long run hope to improve current software development methods.

>>> This abstract adheres to the 4-line abstract rule; the first line addresses WHO should be interested in this research; the second line summarisez WHAT is the problem we want to solve; the third line summarizes HOW we think to tackle the problem; and the 4th line states WHY this research is important.


Several scientific studies concerning large scale software systems have shown that more than 80% of the total budget of a software project is spent during system maintenance. What may seem surprising at first is that this percentage is increasing: "the more modern methods you use in building software, the more time you spend maintaining the resulting product" [1]. The explanation for this observation is that modern software -more than their traditional counterparts-"undergoes continual change or becomes progressively less usefull" [2]. Not surprisingly, the recent trend with agile software development processes recognise change as the only constant factor in software development [3], [4]. Today, little is known about how changes affect software systems. Of course, there is the principle of software entropy, stating that an existing well-designed program gradually looses its structure and eventually turns into chaos [2]. But on the other hand, experienced software engineers are well aware of this entropy phenomenon, and take the appropriate counter measures in the form of refactoring [5]. Unfortunately, these counter measures are seldom documented and currently we lack concrete information about how successful software systems avoid the software entropy chaos. To cope with this lack of information, this project proposes a kind of software palaeontology approach. By comparing different releases of existing source code (= the fossil remainders of software systems) and analysing the differences, we will reconstruct past evolution processes. This way, we can learn how software systems gracefully adapt to changing requirements.

>>> The beginning of the above paragraph is the problem statement. It states (a) WHO is suffering the problem (software developers); (b) WHAT is the problem (software maintenance); (c) why is it necessary to solve it (80% of the budget goes to software maintenence); (d) why is it difficult to solve (we don't know how changes affect the system).
>>> The paragraph makes good use of references. It starts from empirical observations [1], refers to some early work in the field [2], and to some well-known recent contributions [3], [4], [5].
>>> The paragraph addresses the reviewer because it makes an analogy with other research (palaeontology). In this case this was very important, because the reviewers for this proposal are people from other scientific fields (fysics, biology, mathematics, ...) who don't know about computer science.

Research Method

To investigate how software systems evolve we plan an empirical validation, consisting of two steps.

>>> The research method is a key part of the proposal, which is emphasized in the lay-out by the title and the boldfaced keywords step 1, ....

Step 1. First, we will reconstruct where a system has changed by comparing several releases of its source code. Therefore, we plan to exploit existing techniques for detecting duplicated code, mainly because we want to compare many releases of large scale systems (i.e., > 500 KLOC). However, these techniques will be used in a completely new way: rather than looking for matches which represent duplicated code, we will be looking for mismatches which represent places where a program has changed. Moreover, we plan to use dot plots to help us interpret the large amounts of data that will be generated. Dot plots are a visualisation technique originally developped for investigating similarities in DNA-sequences, but later adopted for analyzing code duplication [6]. Figure 1 shows how we use such a dot-plot to compare different releases of a software system.

>>> sound plan: concrete description of what we will do and why we want to do it. 

We have first-hand experience with the three most important techniques for detecting duplicated code, namely simple line matching [6], parameterized line matching [7] and metric fingerprints [8]. Based on this experience, we believe that it is feasible to develop a scaleable technique for reconstructing the software evolution process.

>>> Builds on the research experience present in the lab. 

Figure 1: Example of a dot-plot showing the changes between two releases.
The figure shows two matrices representing subsequent releases from the same program. Each column and each line in such a matrix represents a line of code, and a dot implies that the corresponding lines are duplicates of each other. Thus, the perfect diagonal in the left hand side shows that the first release is an exact copy of itself. However, in the right hand side the diagonal is broken in several locations, revealing added or deleted lines.

>>> A picture to summarise the solution for a busy reviewer. 

Step 2. Second, we will validate our techniques on a number of case-studies, including systems developed in industry. Special care will be taken during the selection of the case-studies in order to ensure that the results can be generalised towards other systems. This is especially relevant because Godfrey et al [9] have reported that open source development seems less vulnerable to the principle of software entropy.

>>> sound plan: concrete description of what we will do and why we want to do it. 

We are currently involved in the development of a benchmark for comparing various techniques dealing with software evolution [10]. The benchmark consists of a number of criteria to assess whether a given experiment is (a) representative (e.g., process being used, implementation language, scale, ..) and (b) replicable (e.g., source code is accessible, documentation is available). We are currently seeking for a consensus in the software evolution community, among others by organizing workshops to discuss the benchmark (see for instance the Empirical Research on Sofware Evolution Workshop, organised in conjunction with the ECOOP'2002 Conference).

>>> Builds on the research experience present in the lab. 

Innovative Aspect

>>> This section argues why we believe this research is worthwhile. As it is a key part of the proposal, we emphasize it in the lay-out by the title.
>>> We show how the research we plan to do builds on the available literature. Moreover, we emphasise the expertise available in the lab. 

This project is innovative because it applies techniques for the detection of duplicated code to a problem area where these technique has not been used before, namely to recover which changes have been made to a program.
The latter problem has been studied for quite a while: Lehman [2] was among the first and the field remained active ever since. For instance, Kemerer and Slaughter analyse change logs to detect typical trends [11], De Hondt extracts information by tagging source code and comparing the different versions [12], Ball et. al. annotate code views with colours showing code age [13].
In fact, we are involved in a research network which among others investigates techniques for reconstructing past software evolution processes [14]. One partner, the Technical University of Vienna, investigates coupling between change requests and change logs [15] and uses a three-dimensional visual representation for examining a system's software release history [16]. Another partner, the University of Berne, visualises class size as a way to analyse long evolution processes [17]. We ourselves use metrics to recover the refactorings that have been applied to a program [18].

This project is also innovative because it explicitly aims for empirical validation. Software engineering is quite a young discipline and more empirical research is needed in order to become a mature research field [19]. Currently, the large research institutes (e.g., Software Engineering Institute - SEI) carry out such kind of research, but there is a definite trend in academia to engage in empirical validation as well.


>>> References are sorted by appearance in the text. This is typical if there are not too many, because it allows to follow the flow of argumentation.

[1] R. Glass, "Maintenance: Less is not More", IEEE Software July/August 1998.

[2] M. Lehman and L. Belady, "Program Evolution: Processes of Software Change", Academic Press, 1985.

[3] J. Highsmith, "Adaptive Software Development", Dorset House Publishing, 1999.

[4] K. Beck, "Extreme Programming Explained", Addison-Wesley, 1999.

[5] M. Fowler et. al., "Refactoring: Improving the Design of Existing Code", Addison-Wesley, 1999.

[6] Stéphane Ducasse, Matthias Rieger and Serge Demeyer, A language Independent Approach for Detecting Duplicated Code", Proceedings ICSM'99 (International Conference on Software Maintenance), IEEE, September 1999.

[7] Gerd van den Heuvel, "Parameterized Matching: a Technique for the Detection of Duplicated Code", Masters Thesis - University of Antwerp, 2001.

[8] Filip Van Rysselberghe, "Finding Duplicated Code via Metric Fingerprints", Masters Thesis - University of Antwerp, 2002.

[9] @Paper about open source development being something else@-

[10] Serge Demeyer, Tom Mens and Micel Wermelinger, "Towards a Software Evolution benchmark", Proceedings IWPSE'2001 (International Workshop on Principles of Software Evolution), September 2001.

[11] C. Kemerer and S. Slaughter, "An empirical approach to studying software evolution", IEEE Trans. Software Engineering, 25(4):493-509, July/August 1999.

[12] K. De Hondt, A Novel Approach to Architectural Recovery in Evolving Object-Oriented Systems, Ph.D. Dissertation, Vrije Universiteit Brussel - Departement of Computer Science, December, 1998.

[13] T. Ball and S. Eick, "Software Visualization in the Large, " IEEE Computer, Vol 29(4), April 1996.

[14] Foundations of Software Evolution. Scientific Research Network funded by the Fund for Scientific Research in Flanders. See http://prog.vub.ac.be/poolresearch/FFSE/.

[15] Harald Gall, Karin Hajek and Mehdi Jazayeri, "Detection of Logical Coupling Based on Product Release History," ICSM'98 Proceedings (International Conference on Software Maintenance), IEEE Press, 1998.

[16] Mehdi Jazayeri, Claudio Riva and Harald Gall, "Visualizing Software Release Histories: The Use of Color and Third Dimension, " Proceedings ICSM'99 (International Conference on Software Maintenance), IEEE Press, 1999.

[17] Michele Lanza and Stéphane Ducasse, "A Categorization of Classes based on the Visualisation of their Internal Structure: the Class Blueprint", Proceedings OOPSLA'2001 (International Conference on Object-Oriented Programming, Systems, Languages, and Applications), ACM Press, 2001.

[18] Serge Demeyer, Stéphane Ducasse, Oscar Nierstrasz, "Finding Refactorings via Change Metrics", Proceedings OOPSLA'2000 (International Conference on Object-Oriented Programming, Systems, Languages, and Applications), ACM Press, 2000.

[19] Norman Fenton, Shari Lawrence Pfleeger and Robert Glass "Science and Substance: a Challenge to Software Engineers", IEEE Software, Vol 11(4), July 1994.


>>> Above we argued why and what of our research. Now it is time to dive into the details.

This proposal serves as a pilot project preparing a follow-up project where the reconstruction techniques will be investigated in realistic circumstances.
During the two years covered by this proposal we will (a) develop prototype instruments for gathering and analysing empirical data and test these prototypes on public domain software; (b) set-up criteria for assessing whether an experiment is representative and replicable; (c) convince companies to participate in a joint research project.

>>> First we recapitualte the major goals of the project. These should advance the state of the art in your field.

Together, these activities must lead to a follow-up project sponsored by a Belgian research fund (IWT). This follow-up project will consist of a field study where the prototype instruments will be empirically validated on industrial software systems. An explicit goal of the latter project will be a PhD dissertation and a number of papers reporting the scientific results of both projects.

>>> You may also list some side-effect from the research which is interesting for the funding agancy. In this case, the funding agancy is the university and they want to hear that the result will result in papers and a PhD.


>>> The actual plan is a key part so again emphasized in the lay-out.
>>> The plan has logical steps, clear milestones

Months 1 - 6

Months 7 - 12

Months 13 - 18

Months 19 - 24

Year 3 - 5 [This period is outside the scope of this project but specifies its ultimate goal]

Future Funding

>>> In the end we provide arguments to convince the funding agency that this research is in line with their objectives

This project is an initial investment in a starting research group (exists since January 1rst, 2000). After five years of incubation time -where the main priority is gathering enough critical mass-, the group will acquire more than half of its funding outside the university. This is feasible because computer science is a popular research area with plenty of opportunities for research projects. More concretely, this project will give rise to a research project financed externally (milestone 4). Also, based on our experience with the European research programme (the project promoter has served as a co-ordinator for an ESPRIT project concerning software evolution; see attachment 1) we are confident that first hand experience with evolving software systems will earn us a seat in a future IST project consortium.

Mid-term Goal

>>> Besides another argument that this research is in line with the funding agency's objectives, this is also an ambitious longer term goal, which shows that there is a vision for what to do after the project has finished

The mid-term goal is to build an internationally renowned research group working in the field of Software Engineering, more particularly Software Reengineering. Among others, the group will publish its results via the proper academic channels, i.e. journals and respected scientific conferences.

This project proposal fits well with the kind of research that is becoming typical within software engineering. Indeed, by keeping close contact with the software industry, research groups find themselves in the ideal position to make authentic observations necessary for empirical validation of research hypotheses. Such empirical validation is recognised as being a sign of maturity necessary in a young discipline like software engineering. As a side effect, contact with industry permits computer science departments to get feedback on their teaching curriculum, a necessity in such a young and rapidly changing discipline.