Preventing spam email harvesting

One of the main sources spammers harvest emails ids are from websites and Google. An easy way to prevent email harvesting is to not disclose email ids on your website, least not in an obvious way.  If your site has a few dozen pages than you can manually scan those to see if any email id is being displayed. However, for large site with hundreds of pages it is not an easy process. One tool that can make the process easier is ‘theHarvester’.

theHarvester is a program that enables you to gather emails, sub-domains, hosts, employee names, open ports and banners from different public sources like search engines. This tool is actually intended to be used by Penetration testers in the early stages to understand the customer footprint on the Internet. It is also useful for anyone that wants to know what an attacker can see about their organization by scanning sites.

theHarvester is a Python script that you can use from the command-line to scan search engines to look for email ids for a particular domain. For example the following will scan Google for emails ids and hosts for the domain ‘’. The number of results scanned is limited to 500 by the -l tag.

C:\theHarvester-2.2>python -b google -l 500 -d

*TheHarvester Ver. 2.2              *
*Coded by Christian Martorella      *
*Edge-Security Research             *
*      *

[-] Searching in Google:
        Searching 0 results...
        Searching 100 results...
        Searching 200 results...
        Searching 300 results...
        Searching 400 results...
        Searching 500 results...

[+] Emails found:

[+] Hosts found in search engines:

Try the help screen to see what other options are there with the tool.


*TheHarvester Ver. 2.2              *
*Coded by Christian Martorella      *
*Edge-Security Research             *
*      *

Usage: theharvester options

       -d: Domain to search or company name
       -b: Data source (google,bing,bingapi,pgp,linkedin,google-profiles,people123,jigsaw,all)
       -s: Start in result number X (default 0)
       -v: Verify host name via dns resolution and search for virtual hosts
       -f: Save the results into an HTML and XML file
       -n: Perform a DNS reverse query on all ranges discovered
       -c: Perform a DNS brute force for the domain name
       -t: Perform a DNS TLD expansion discovery
       -e: Use this DNS server
       -l: Limit the number of results to work with(bing goes from 50 to 50 results,
           -h: use SHODAN database to query discovered hosts
            google 100 to 100, and pgp doesn't use this option)

Examples:./ -d -l 500 -b google
         ./ -d -b pgp
         ./ -d microsoft -l 200 -b linkedin