LORE Home


LORE / Research / PhD Directions

This page is last updated on Wednesday, August 28, 2002

This page describes possible directions for PhD students doing their research in "The Lab on Reengineering (LORE)". Each of these descriptions is carefully selected to adhere to the open/closed principle: (a) it is sufficiently open to allow for the necessary deviations and explorations which are inevitable during the 3-4 years of PhD research;(b) it is sufficiently closed to produce tangible results (i.e. publications) in a short amount of time (i.e. 3-6 months).

Formal Basis for Refactoring Tools

Despite the existence of industrial strength refactoring tools (VisualWorks, TogetherJ, JBuilder, Eclipse), the core refactoring principle —namely changing the structure while preserving its behaviour—is poorly defined. This is mainly because most definitions of the behaviour of an object concentrate on the run-time aspects (e.g., state-charts, pre- and post-conditions, observable input-output) while refactoring tools must necessarily restrict themselves to the static description as specified in the source-code.

Recently, we investigated the feasibility of graph rewriting as a formal foundation for defining behaviour preservation (see paper [Mens02a]). Based on a small experiment specifying two refactorings (namely "encapsulate field" and "pull up method") we conclude that graph rewriting is a suitable formalism for investigating refactorings. However, lots of open issues remain and this thesis is supposed to explore them.

Short term goal:

Mid term goals:

Literature


UML-Consistency Based on Refactoring

Today, the UML is considered the lingua franca for expressing analysis and design models of software systems. Nevertheless, typical software systems are too complex to capture in a single model, hence UML provides different diagrams (i.e., class diagrams, sequence diagrams, state-charts) to express different views on a single system.

The presence of different diagrams brings up the issue of consistency-maintenance. Indeed, modifications in one diagram may affect others and software engineers expect their CASE-tools to help them maintaining the consistency. The current generation of UML-tools provide such support on a trivial level: when a class or method gets renamed, all occurrences of that method in other diagrams are automatically renamed as well. However, tool support fails for more difficult modifications: a method which is pulled up into one of the superclasses may invalidate certain sequence diagrams or state-charts, and current UML tools just ignore these potential inconsistencies.

Refactoring --transforming the internal structure of an object-oriented program without changing its external behaviour-- may provide a way to handle consistency problems, precisely because of their behaviour preserving nature. Indeed, as long as the pre-condition of the refactoring is satisfied (for instance, a pre-condition for the "pull-up method" states that the pulled up method may not overwrite an existing method), we know that the external behaviour remains unchanged, hence can predict the effect it will have on sequence diagrams or state-charts. 

Short term goal:

Mid term goals:

Literature


Refactoring and Program Quality

The notion of refactoring —transforming the source-code of an object-oriented program without changing its external behaviour— has been embraced by many object-oriented software developers as a way to accomodate changing requirements. The key idea is to redistrubute instance variables and methods across the class hierarchy in order to prepare the software for future extensions. If applied well, refactoring improves the maintainability of the software, however it is unclear how other quality factors are affected. For instance, refactorings typically split methods and classes up in small chunks, which increases the complexity of the inheritance tree. Also, many refactorings introduce polymorphism, which may have a negative impact on performance. This thesis will study the effect that refactorings may have on program quality by measuring both internal (complexity, coupling/cohesion) as well as external (performance, mean time between repair) quality attributes.

Short term goal:

Mid term goals:

Literature


Reconstruction of Software Evolution Processes

Recent software development processes (such as eXtreme Programming) recognise change as the only constant factor in software development. Successful software systems are those that gracefully evolve and adapt to changing requirements, a phenomenon quite similar to Darwin's principle of the survival of the fittest. Unfortunately, little is known about how successful software systems have evolved in the past. To cope with this lack of information, this thesis will investigate a kind of software palaeontology approach. The idea is to reconstruct how software systems have evolved by comparing different releases of existing source code (= the fossil remainders of software systems) and analysing the differences.

The actual comparison and analysis of changes will be done by means of a visualisation technique known as dotplots, as for instance used in duploc [Duca99b]. The following figure is an example of a dotplot showing two matrices representing subsequent releases from the same program. Each column and row in such a matrix represents a line of code, and a dot implies that the corresponding lines are duplicates of each other. Thus, the perfect diagonal (a) in the left hand side shows that the first release is an exact copy of itself. However, in the right hand side the diagonal is broken in several locations, revealing added (b) or deleted (c) lines.

Short term goal:

Mid term goals:

Literature


Identifying Duplication in Program Traces

Several surveys of software organisations have shown that the major part of the total software-development cost is devoted to software maintenance and that the bulk of this cost is attributed to implementing new requirements. As a result, the initial clean design of a software system quickly detoriates, the documentation becomes obsolete, and the running system becomes the only reliable specification of what a program is supposed to do.

Most reverse engineering techniques use the source-code as the principal starting point. However, it is known that such approaches are inherently limited: what you get out is a collection of modules, classes and maybe an inheritance hierarchy. However, this doesn't tell you which objects exist at run-time and how they collaborate to provide the intended behavior. For the latter, it is acknowledged that the program traces provide a much more accurate description. Unfortunately, this accuracy comes at a cost: program traces consist of a huge amount trivial events (object x sent mesage y to object z) and clustering these events into higher-level views has gained considerable interest [Rich02a]. This thesis will explore whether techniques for identifying code duplication can be used to identify clusters in program traces.

Short term goal:

Mid term goals:

Literature


Exploiting Reflection to Manage Program Trace Explosion

Reflection is  provided by many of the today's programming languages (e.g., java.lang.reflect for Java, the CLOS meta-object protocol, the Smalltalk meta-class hierarchy, ...). The basic idea is to have built-in language concepts in order for a program to dynamically inspect --and sometimes even alter-- its inner working. Among others, this has been exploited by tools such as debuggers, profilers, .... because it allows for such a detailed view on what is happening inside a program.

Another area where reflection has been used is for reverse engineering purposes. One experiment describes how the Smalltalk reflection facilities have been used for inferring the actual type of an object receiving a message, as part of verifying the precondition of a refactoring [BranXXa]. Unfortunately, the technique doesn't scale well: it is very good for a detailed view of the messages exchanged between a few objects, but analyzing complete program traces quickly becomes impossible due to the sheer number of objects and messages involved [Rich02a]. Nevertheless, one does not necessarily have to generate a complete program trace. Precisely the fact that we use reflection, makes it possible to focus on the relevant part of a program trace only.

@@--argue for the presence of black-box test-cases here--@@

Short term goal:

Mid term goals:

Literature


Refactorings and Reengineering Patterns for Web-systems

@...it's not the age of a system that makes it a legacy system, it is the rate at which it changes [Deme02a]. Web-based systems change very rapidly. Moreover, often poorly designed because being the first on the market has business advantage + often done by non software engineers

Learn from how people have refactored and reengineerd web-based systems@@

Literature


Adequate Black-Box Test Coverage for Refactoring

The notion of refactoring --transforming the source-code of an object-oriented program without changing its external behaviour-- has been embraced by many object-oriented software developers as a way to accomodate changing requirements. The key idea is to redistrubute instance variables and methods across the class hierarchy in order to prepare the software for future extensions.

While refactoring operations are defined to preserve behaviour, it is generally accepted that one needs good regression tests to guarantee that no new defects are introduced [Fowl99a]. To satisfy this goal, the regression tests should provide adequate coverage for all the execution paths present in the code [Bind00a]. Unfortunately, code coverage presents us with a problem in a refactoring context, because it
(a) lacks completeness: it is close to impossible to identify all possible execution paths due to run-time polymorphism;
(b) lacks abstraction: several refactorings will alter the execution paths in the code, thus breaking the test suite itself.

Therefore, black-box testing seems to be a more realistic approach towards regression tests for refactoring purposes. With black-box testing, one uses design models --such as use-cases, state-charts, sequence diagrams, activity-diagrams-- of the program behaviour and define test suits that cover all execution paths therein. If these design model succeed to represent the program behaviour on a higher level (i.e. the "what" and not the "how") it is indeed possible to define test suites that are (a) complete enough to guarantee that no new defects are introduced, yet (b) abstract enough to ensure that the test suite itself remain valid.

Unfortunately, today there is little or no empirical evidence that black-box testing indeed provides adequate test coverage for refactoring purposes. Worse, it is not yet known which design models provide optimal coverage. Therefore, this project will compare black-box testing techniques based on different design models --namely use-cases, state-charts, sequence diagrams, activity-diagrams-- to seek out
(a) whether they provide adequate test coverage for refactoring purposes,
(b) which design model provides optimal coverage under which circumstances.

Short term goals

Mid term goals

Literature


LORE Home