Detecting duplicate code in PHP files


Posted in: testing, tools | Save to del.icio.us | Twit This! 6 Apr 2009

Duplicated code in projects is a frequent thing and also the one ripe for factoring out in a new class or function. Cut/Paste coding is a common development practice among programmers, a lot of which can lead to code size increase and maintenance nightmares. PHPCPD (php copy paste detector) is a PEAR tool that makes it easier to detect duplicate code in php projects. Below is a short tutorial on the PHPCPD package.

1. Installing phpcpd
We will be using the PEAR installer for this purpose. First the PEAR channel that is used to distribute phpcpd needs to be registered with the local PEAR environment. This tells PEAR from where the install files should be downloaded.

c:\> pear channel-discover pear.phpunit.de
Adding Channel "pear.phpunit.de" succeeded
Discovery of channel "pear.phpunit.de" succeeded

After this is done the PEAR installer is ready to install phpcpd.

c:\> pear install phpunit/phpcpd
downloading phpcpd-1.1.1.tgz ...
Starting to download phpcpd-1.1.1.tgz (8,078 bytes)
.....done: 8,078 bytes
install ok: channel://pear.phpunit.de/phpcpd-1.1.1

Post installation you will find the PHPCPD source files in the PEAR directory.

2. Running your first check.
Here is our first check on the admin sub directory of a project with the results. You can check for duplicate code in a individual file or a directory.

c:\localhost\project> phpcpd ./admin
phpcpd 1.1.1 by Sebastian Bergmann.
 
Found 3 exact clones with 50 duplicated lines in 6 files:
 
  - .messages.php:95-105
    .messagesgroup.php:112-122
 
  - .viewschedules.php:14-23
    .tutorbookings.php:14-23
 
  - .ampieexport.php:4-35
    .amcolumnexport.php:4-35
 
0.35% duplicated lines out of 14456 total lines of code.

3. Options
By default phpcpd will search for a minimum of 5 identical lines and 70 identical tokens. So if there are less than 5 duplicate lines in the code or less than 70 identical tokens they will be ignored. To override this you can use the –min-lines and –min-tokens switch as below.

c:\localhost\project> phpcpd --min-lines 4 --min-tokens 40 ./admin
phpcpd 1.1.1 by Sebastian Bergmann.
 
Found 9 exact clones with 187 duplicated lines in 14 files:
 
  - .actionaction.updatestudent.php:15-27
    .actionaction.updatetutor.php:15-27
 
  - .adminFunctions.php:91-98
    .adminFunctions.php:124-131
 
  - .messages.php:95-118
    .messagesgroup.php:112-135
 
  - .viewschedules.php:14-84
    .tutorbookings.php:14-84
 
  - .viewschedules.php:167-185
    .tutorbookings.php:145-163
 
  - .tutors.php:14-20
    .dimdim.php:14-20
 
  - .ampieexport.php:4-44
    .amcolumnexport.php:4-44
 
  - .geoipgeoip.inc.php:236-241
    .geoipgeoip.inc.php:272-277
 
  - .studentschedule.php:14-20
    .Copy of onetomanyschedule.php:14-20
 
1.29% duplicated lines out of 14457 total lines of code.

The report generated by phpcpd can also be exported to a PMD-CPD xml format. The following scans the admin directory and saves the report in the projectPhpcpd.xml file.

c:\localhost\project> phpcpd --log-pmd projectPhpcpd.xml ./admin

Most of the php source files have the .php extension and phpcpd uses this by default when comparing files. To add other extensions to the list you can use the –suffixes option, which takes a comma separated list of extension names.

c:\localhost\project> phpcpd --suffixes php,php5 ./admin

Concluding thoughts
There is also a Java program called PMD which can detect duplicate code. But the main advantage of a PEAR package is that you can integrate it in your project itself or use it with phpUnderControl.




Share this post

Share on Facebook
Share on Twitter
Share on StumbleUpon
Share on Delicious
Share on Digg
Share on Technorati
Share on Reddit
Feeds RSS Subscribe to site Feed

Other related posts



7 Responses

1

Detecting duplicate code in PHP files : CodeDiesel « Ideas | Just another WordPress weblog

April 7th, 2009 at 1:47 pm

[...] CodeDiesel - Duplicated code in projects is a frequent thing and also the one ripe for factoring out in a new class or function. Cut/Paste coding is a common development practice among programmers, a lot of which can lead to code size increase and maintenance nightmares. PHPCPD (php copy paste detector) is a PEAR tool that makes it easier to detect duplicate code in php projects. Below is a short tutorial on the PHPCPD package. { no comment } :| { Tags: coding, dupe, php } [...]

2

網站製作學習誌 » [Web] 連結分享

April 7th, 2009 at 7:54 pm

[...] Detecting duplicate code in PHP files [...]

3

james t.

April 8th, 2009 at 10:37 pm

nice article!

4

Sameer Borate’s Blog: Detecting duplicate code in PHP files : WebNetiques, LLC : Website Developers in Minneapolis, MN

April 8th, 2009 at 10:45 pm

[...] his blog today Sameer looks at a method for finding duplicate code in your applications with the help of [...]

5

Mahbub

April 18th, 2009 at 10:36 pm

Man! You & I are using the same theme :).

Nice to see the tool but i hate pear. Will try sometime.

6

Detecting duplicate code in PHP files : CodeDiesel

April 25th, 2009 at 11:49 pm

[...] Read the original post: Detecting duplicate code in PHP files : CodeDiesel [...]

7

Ira Baxter

January 23rd, 2010 at 9:52 pm

Semantic Designs has been building copy/paste detectors for big systems since 1998. At the website http://www.semanticdesigns.com/Products/Clone you can read about how clone detection works, and see examples of clone analysis on several hundred thousand lines of PHP code (a BBS system).

Comment Form

Use the html <code> tag to insert small source code snippets

For longer code examples use http://pastie.org/.

Get latest updates by E-mail

About this blog

This site is a digital habitat of Sameer, a freelance web developer working from Pune.More

Recent Comments

  • sameer: My apologies! I'm not conversant with SharePoint. [...]
  • avanthi: Is it possible to automate share point people picker control through selenium. When i record throug [...]
  • sameer: Check to see if the 'IDE > options > format' is set to HTML. [...]
  • sameer: Google strips any newline characters form the text. Although it does accept it with the online trans [...]
  • Arjan: Fiddler is a debugging tool for IE (not Microsoft's Fiddler) [...]
  • Susan Martin: while creating a test for site, command icons on IDE greyed out and do not respond when selected. I [...]
  • Saar: Thanks for this example. helped me a lot. I have 1 problem, I am translating chunks of code, but I [...]
  • sameer: You can add extra GET variables in the options array as below: $pager_options = array( 'mode [...]

  • Users Online

    • 7 Users Online
    • 6 Guests, 1 Bot