Detecting duplicate code in PHP files


Duplicated code in projects is a frequent thing and also the one ripe for factoring out in a new class or function. Cut/Paste coding is a common development practice among programmers, a lot of which can lead to code size increase and maintenance nightmares. PHPCPD (php copy paste detector) is a PEAR tool that makes it easier to detect duplicate code in php projects. Below is a short tutorial on the PHPCPD package.

1. Installing phpcpd
We will be using the PEAR installer for this purpose. First the PEAR channel that is used to distribute phpcpd needs to be registered with the local PEAR environment. This tells PEAR from where the install files should be downloaded.

c:\> pear channel-discover pear.phpunit.de
Adding Channel "pear.phpunit.de" succeeded
Discovery of channel "pear.phpunit.de" succeeded

After this is done the PEAR installer is ready to install phpcpd.

c:\> pear install phpunit/phpcpd
downloading phpcpd-1.1.1.tgz ...
Starting to download phpcpd-1.1.1.tgz (8,078 bytes)
.....done: 8,078 bytes
install ok: channel://pear.phpunit.de/phpcpd-1.1.1

Post installation you will find the PHPCPD source files in the PEAR directory.

2. Running your first check.
Here is our first check on the admin sub directory of a project with the results. You can check for duplicate code in a individual file or a directory.

c:\localhost\project> phpcpd ./admin
phpcpd 1.1.1 by Sebastian Bergmann.
 
Found 3 exact clones with 50 duplicated lines in 6 files:
 
  - .messages.php:95-105
    .messagesgroup.php:112-122
 
  - .viewschedules.php:14-23
    .tutorbookings.php:14-23
 
  - .ampieexport.php:4-35
    .amcolumnexport.php:4-35
 
0.35% duplicated lines out of 14456 total lines of code.

3. Options
By default phpcpd will search for a minimum of 5 identical lines and 70 identical tokens. So if there are less than 5 duplicate lines in the code or less than 70 identical tokens they will be ignored. To override this you can use the –min-lines and –min-tokens switch as below.

c:\localhost\project> phpcpd --min-lines 4 --min-tokens 40 ./admin
phpcpd 1.1.1 by Sebastian Bergmann.
 
Found 9 exact clones with 187 duplicated lines in 14 files:
 
  - .actionaction.updatestudent.php:15-27
    .actionaction.updatetutor.php:15-27
 
  - .adminFunctions.php:91-98
    .adminFunctions.php:124-131
 
  - .messages.php:95-118
    .messagesgroup.php:112-135
 
  - .viewschedules.php:14-84
    .tutorbookings.php:14-84
 
  - .viewschedules.php:167-185
    .tutorbookings.php:145-163
 
  - .tutors.php:14-20
    .dimdim.php:14-20
 
  - .ampieexport.php:4-44
    .amcolumnexport.php:4-44
 
  - .geoipgeoip.inc.php:236-241
    .geoipgeoip.inc.php:272-277
 
  - .studentschedule.php:14-20
    .Copy of onetomanyschedule.php:14-20
 
1.29% duplicated lines out of 14457 total lines of code.

The report generated by phpcpd can also be exported to a PMD-CPD xml format. The following scans the admin directory and saves the report in the projectPhpcpd.xml file.

c:\localhost\project> phpcpd --log-pmd projectPhpcpd.xml ./admin

Most of the php source files have the .php extension and phpcpd uses this by default when comparing files. To add other extensions to the list you can use the –suffixes option, which takes a comma separated list of extension names.

c:\localhost\project> phpcpd --suffixes php,php5 ./admin

Concluding thoughts
There is also a Java program called PMD which can detect duplicate code. But the main advantage of a PEAR package is that you can integrate it in your project itself or use it with phpUnderControl.

This site is a digital habitat of Sameer Borate, a freelance web developer working in PHP, MySQL and WordPress. I also provide web scraping services, website design and development and integration of various Open Source API's. Contact me at metapix[at]gmail.com for any new project requirements and price quotes.

9 Responses

1

Detecting duplicate code in PHP files : CodeDiesel « Ideas | Just another WordPress weblog

April 7th, 2009 at 1:47 pm

[...] CodeDiesel – Duplicated code in projects is a frequent thing and also the one ripe for factoring out in a new class or function. Cut/Paste coding is a common development practice among programmers, a lot of which can lead to code size increase and maintenance nightmares. PHPCPD (php copy paste detector) is a PEAR tool that makes it easier to detect duplicate code in php projects. Below is a short tutorial on the PHPCPD package. { no comment } :| { Tags: coding, dupe, php } [...]

2

網站製作學習誌 » [Web] 連結分享

April 7th, 2009 at 7:54 pm

[...] Detecting duplicate code in PHP files [...]

3

james t.

April 8th, 2009 at 10:37 pm

nice article!

4

Sameer Borate’s Blog: Detecting duplicate code in PHP files : WebNetiques, LLC : Website Developers in Minneapolis, MN

April 8th, 2009 at 10:45 pm

[...] his blog today Sameer looks at a method for finding duplicate code in your applications with the help of [...]

5

Mahbub

April 18th, 2009 at 10:36 pm

Man! You & I are using the same theme :) .

Nice to see the tool but i hate pear. Will try sometime.

6

Detecting duplicate code in PHP files : CodeDiesel

April 25th, 2009 at 11:49 pm

[...] Read the original post: Detecting duplicate code in PHP files : CodeDiesel [...]

7

Ira Baxter

January 23rd, 2010 at 9:52 pm

Semantic Designs has been building copy/paste detectors for big systems since 1998. At the website http://www.semanticdesigns.com/Products/Clone you can read about how clone detection works, and see examples of clone analysis on several hundred thousand lines of PHP code (a BBS system).

8

Joaco

May 12th, 2010 at 12:19 pm

Great article! Was wondering if there are any flags for excluding or ignoring specific file/folder(s).

9

Binh Duong

September 26th, 2012 at 4:20 am

Great !

Your thoughts

Sign up for fresh content in your email