Preventing spam email harvesting


One of the main sources spammers harvest emails ids are from websites and Google. An easy way to prevent email harvesting is to not disclose email ids on your website, least not in an obvious way.  If your site has a few dozen pages than you can manually scan those to see if any email id is being displayed. However, for large site with hundreds of pages it is not an easy process. One tool that can make the process easier is ‘theHarvester’.

theHarvester is a program that enables you to gather emails, sub-domains, hosts, employee names, open ports and banners from different public sources like search engines. This tool is actually intended to be used by Penetration testers in the early stages to understand the customer footprint on the Internet. It is also useful for anyone that wants to know what an attacker can see about their organization by scanning sites.

theHarvester is a Python script that you can use from the command-line to scan search engines to look for email ids for a particular domain. For example the following will scan Google for emails ids and hosts for the domain ‘microsoft.com’. The number of results scanned is limited to 500 by the -l tag.

C:\theHarvester-2.2>python theHarvester.py -b google -l 500 -d microsoft.com
 
*************************************
*TheHarvester Ver. 2.2              *
*Coded by Christian Martorella      *
*Edge-Security Research             *
*cmartorella@edge-security.com      *
*************************************
 
[-] Searching in Google:
        Searching 0 results...
        Searching 100 results...
        Searching 200 results...
        Searching 300 results...
        Searching 400 results...
        Searching 500 results...
 
[+] Emails found:
------------------
example.Wong@microsoft.com
name-changed@microsoft.com
msdymg@microsoft.com
 
[+] Hosts found in search engines:
------------------------------------
65.55.57.27:www.microsoft.com
65.55.184.16:windowsupdate.microsoft.com
157.56.56.139:support.microsoft.com
65.55.227.140:office.microsoft.com
65.52.103.234:windows.microsoft.com
157.56.65.75:billing.microsoft.com
168.62.21.49:msdn.microsoft.com
65.54.237.200:store.microsoft.com
64.4.11.25:go.microsoft.com
157.56.56.109:answers.microsoft.com
65.55.11.238:schemas.microsoft.com
131.107.115.215:mailb.microsoft.com
168.62.21.58:technet.microsoft.com
65.52.103.78:social.technet.microsoft.com

Try the help screen to see what other options are there with the tool.

C:\theHarvester-2.2>python theHarvester.py
 
*************************************
*TheHarvester Ver. 2.2              *
*Coded by Christian Martorella      *
*Edge-Security Research             *
*cmartorella@edge-security.com      *
*************************************
 
Usage: theharvester options
 
       -d: Domain to search or company name
       -b: Data source (google,bing,bingapi,pgp,linkedin,google-profiles,people123,jigsaw,all)
       -s: Start in result number X (default 0)
       -v: Verify host name via dns resolution and search for virtual hosts
       -f: Save the results into an HTML and XML file
       -n: Perform a DNS reverse query on all ranges discovered
       -c: Perform a DNS brute force for the domain name
       -t: Perform a DNS TLD expansion discovery
       -e: Use this DNS server
       -l: Limit the number of results to work with(bing goes from 50 to 50 results,
           -h: use SHODAN database to query discovered hosts
            google 100 to 100, and pgp doesn't use this option)
 
Examples:./theharvester.py -d microsoft.com -l 500 -b google
         ./theharvester.py -d microsoft.com -b pgp
         ./theharvester.py -d microsoft -l 200 -b linkedin

This site is a digital habitat of Sameer Borate, a freelance web developer working in PHP, MySQL and WordPress. I also provide web scraping services, website design and development and integration of various Open Source API's. Contact me at metapix[at]gmail.com for any new project requirements and price quotes.

Your thoughts