Backtrack 4: Information Gathering: Searchengine: The Harvester – Email, User Names, Subdomain & Hostnames Finder
The next tool on Backtrack 4 I am going to review is The Harvester which was written by the guys over at Edge Security. The Harvester is a tool for gathering e-mail accounts, user names and hostnames/subdomains from different public sources. It’s a really simple tool, but very effective.
The supported sources are:
Lets take a look at the options which are available:
root@666:/pentest/enumeration/google/theharvester# ./theHarvester.py
*************************************
*TheHarvester Ver. 1.6 *
*Coded by Christian Martorella *
*Edge-Security Research *
*cmartorella@edge-security.com *
*************************************
Usage: theharvester options
-d: domain to search or company name
-b: data source (google,bing,pgp,linkedin)
-s: start in result number X (default 0)
-v: verify host name via dns resolution
-l: limit the number of results to work with(bing goes from 50 to 50 results,
google 100 to 100, and pgp does'nt use this option)
Examples:./theharvester.py -d microsoft.com -l 500 -b google
./theharvester.py -d microsoft.com -b pgp
./theharvester.py -d microsoft -l 200 -b linkedin
Lets use cnn.com as a example:
root@666:/pentest/enumeration/google/theharvester# ./theHarvester.py -d cnn.com -l 500 -b bing
*************************************
*TheHarvester Ver. 1.6 *
*Coded by Christian Martorella *
*Edge-Security Research *
*cmartorella@edge-security.com *
*************************************
Searching for cnn.com in bing :
======================================
Limit: 500
Searching results: 0
Searching results: 50
Searching results: 100
Searching results: 150
Searching results: 200
Searching results: 250
Searching results: 300
Searching results: 350
Searching results: 400
Searching results: 450
Accounts found:
====================
@cnn.com
cnnfutures@cnn.com
====================
Total results: 2
Hosts found:
====================
www.cnn.com
edition.cnn.com
money.cnn.com
sportsillustrated.cnn.com
amfix.blogs.cnn.com
live.cnn.com
news.blogs.cnn.com
politicalticker.blogs.cnn.com
marquee.blogs.cnn.com
weather.cnn.com
m.cnn.com
transcripts.cnn.com
www.cnnstudentnews.cnn.com
ac360.blogs.cnn.com
campbellbrown.blogs.cnn.com
newsource.cnn.com
cgi.cnn.com
joybehar.blogs.cnn.com
topics.edition.cnn.com
internationaldesk.blogs.cnn.com
us.cnn.com
larrykinglive.blogs.cnn.com
topics.cnn.com
weather.edition.cnn.com
cnnwire.blogs.cnn.com
scitech.blogs.cnn.com
on.cnn.com
ricksanchez.blogs.cnn.com
archives.cnn.com
community.cnn.com
sports.si.cnn.com
arabic.cnn.com
quiz.cnn.com
newsroom.blogs.cnn.com
cgi.money.cnn.com
partners.cnn.com
pagingdrgupta.blogs.cnn.com
features.blogs.fortune.cnn.com
tech.fortune.cnn.com
insession.blogs.cnn.com
business.blogs.cnn.com
behindthescenes.blogs.cnn.com
olympics.blogs.cnn.com
afghanistan.blogs.cnn.com
gdyn.cnn.com
premium.cnn.com
inthefield.blogs.cnn.com
ypwr.blogs.cnn.com
premium.edition.cnn.com
edition1.cnn.com
drgupta.cnn.com
edition2.cnn.com
wallstreet.blogs.fortune.cnn.com
tips.blogs.cnn.com
mxp.blogs.cnn.com
So as you can see from this search we were able to get a lot of possible subdomains but not very many email address’s. This is one reason its important to run your query on all available search engines.
Lets show a example which will show a few more email address’s:
root@666:/pentest/enumeration/google/theharvester# ./theHarvester.py -d 53.com -l 500 -b google
*************************************
*TheHarvester Ver. 1.6 *
*Coded by Christian Martorella *
*Edge-Security Research *
*cmartorella@edge-security.com *
*************************************
Searching for 53.com in google :
======================================
Limit: 500
Searching results: 0
Searching results: 100
Searching results: 200
Searching results: 300
Searching results: 400
Accounts found:
====================
josh.paskewicz@53.com
@53.com
info@tapioles53.com
@.53.com
rachael.smith@53.com
nan.horton@53.com
aler...@53.com
alertingservice@53.com
j.brinkman@53.com
Jerome.Gilbert@53.com
Gilbert@53.com
michelle.weddington@53.com
====================
Total results: 12
Hosts found:
====================
www.53.com
reo.53.com
direct.53.com
premierissue.53.com
retire.53.com
ir.53.com
tdsc.53.com
secure.53.com
ra.53.com
2Fwww.53.com
Www.53.com
252Fwww.53.com
espanol.53.com
employee.53.com
bnjhz.php?...53.com
express.53.com
www.ra.53.com
Ra.53.com
3Dreo.53.com
wwww.53.com
Retire.53.com
@.53.com
www.express.53.com
mxism.php?...53.com
pngyo.php?...53.com
Using this example we got a lot more results, for example we now know that most likely all the email address’s will follow the following naming convention, firstname.lastname@53.com. This can be a very useful piece of knowledge because as long as we have a first and last name of any one at 53rd bank, we have their email address.
This is just one of the may tools which can aid a penetration tester in the passive reconnaissance process.
The supported sources are:
- Google – emails,subdomains/hostnames
- Bing search – emails, subdomains/hostnames
- Pgp servers – emails, subdomains/hostnames
- Linkedin – user names
Lets take a look at the options which are available:
root@666:/pentest/enumeration/google/theharvester# ./theHarvester.py
*************************************
*TheHarvester Ver. 1.6 *
*Coded by Christian Martorella *
*Edge-Security Research *
*cmartorella@edge-security.com *
*************************************
Usage: theharvester options
-d: domain to search or company name
-b: data source (google,bing,pgp,linkedin)
-s: start in result number X (default 0)
-v: verify host name via dns resolution
-l: limit the number of results to work with(bing goes from 50 to 50 results,
google 100 to 100, and pgp does'nt use this option)
Examples:./theharvester.py -d microsoft.com -l 500 -b google
./theharvester.py -d microsoft.com -b pgp
./theharvester.py -d microsoft -l 200 -b linkedin
Lets use cnn.com as a example:
root@666:/pentest/enumeration/google/theharvester# ./theHarvester.py -d cnn.com -l 500 -b bing
*************************************
*TheHarvester Ver. 1.6 *
*Coded by Christian Martorella *
*Edge-Security Research *
*cmartorella@edge-security.com *
*************************************
Searching for cnn.com in bing :
======================================
Limit: 500
Searching results: 0
Searching results: 50
Searching results: 100
Searching results: 150
Searching results: 200
Searching results: 250
Searching results: 300
Searching results: 350
Searching results: 400
Searching results: 450
Accounts found:
====================
@cnn.com
cnnfutures@cnn.com
====================
Total results: 2
Hosts found:
====================
www.cnn.com
edition.cnn.com
money.cnn.com
sportsillustrated.cnn.com
amfix.blogs.cnn.com
live.cnn.com
news.blogs.cnn.com
politicalticker.blogs.cnn.com
marquee.blogs.cnn.com
weather.cnn.com
m.cnn.com
transcripts.cnn.com
www.cnnstudentnews.cnn.com
ac360.blogs.cnn.com
campbellbrown.blogs.cnn.com
newsource.cnn.com
cgi.cnn.com
joybehar.blogs.cnn.com
topics.edition.cnn.com
internationaldesk.blogs.cnn.com
us.cnn.com
larrykinglive.blogs.cnn.com
topics.cnn.com
weather.edition.cnn.com
cnnwire.blogs.cnn.com
scitech.blogs.cnn.com
on.cnn.com
ricksanchez.blogs.cnn.com
archives.cnn.com
community.cnn.com
sports.si.cnn.com
arabic.cnn.com
quiz.cnn.com
newsroom.blogs.cnn.com
cgi.money.cnn.com
partners.cnn.com
pagingdrgupta.blogs.cnn.com
features.blogs.fortune.cnn.com
tech.fortune.cnn.com
insession.blogs.cnn.com
business.blogs.cnn.com
behindthescenes.blogs.cnn.com
olympics.blogs.cnn.com
afghanistan.blogs.cnn.com
gdyn.cnn.com
premium.cnn.com
inthefield.blogs.cnn.com
ypwr.blogs.cnn.com
premium.edition.cnn.com
edition1.cnn.com
drgupta.cnn.com
edition2.cnn.com
wallstreet.blogs.fortune.cnn.com
tips.blogs.cnn.com
mxp.blogs.cnn.com
So as you can see from this search we were able to get a lot of possible subdomains but not very many email address’s. This is one reason its important to run your query on all available search engines.
Lets show a example which will show a few more email address’s:
root@666:/pentest/enumeration/google/theharvester# ./theHarvester.py -d 53.com -l 500 -b google
*************************************
*TheHarvester Ver. 1.6 *
*Coded by Christian Martorella *
*Edge-Security Research *
*cmartorella@edge-security.com *
*************************************
Searching for 53.com in google :
======================================
Limit: 500
Searching results: 0
Searching results: 100
Searching results: 200
Searching results: 300
Searching results: 400
Accounts found:
====================
josh.paskewicz@53.com
@53.com
info@tapioles53.com
@.53.com
rachael.smith@53.com
nan.horton@53.com
aler...@53.com
alertingservice@53.com
j.brinkman@53.com
Jerome.Gilbert@53.com
Gilbert@53.com
michelle.weddington@53.com
====================
Total results: 12
Hosts found:
====================
www.53.com
reo.53.com
direct.53.com
premierissue.53.com
retire.53.com
ir.53.com
tdsc.53.com
secure.53.com
ra.53.com
2Fwww.53.com
Www.53.com
252Fwww.53.com
espanol.53.com
employee.53.com
bnjhz.php?...53.com
express.53.com
www.ra.53.com
Ra.53.com
3Dreo.53.com
wwww.53.com
Retire.53.com
@.53.com
www.express.53.com
mxism.php?...53.com
pngyo.php?...53.com
Using this example we got a lot more results, for example we now know that most likely all the email address’s will follow the following naming convention, firstname.lastname@53.com. This can be a very useful piece of knowledge because as long as we have a first and last name of any one at 53rd bank, we have their email address.
This is just one of the may tools which can aid a penetration tester in the passive reconnaissance process.