Backtrack 4: Information Gathering: Searchengine: The Harvester – Email, User Names, Subdomain & Hostnames Finder

The next tool on Backtrack 4 I am going to review is The Harvester which was written by the guys over at Edge Security. The Harvester is a tool for gathering e-mail accounts, user names and hostnames/subdomains from different public sources. It’s a really simple tool, but very effective.
The supported sources are:
  • Google – emails,subdomains/hostnames
  • Bing search – emails, subdomains/hostnames
  • Pgp servers – emails, subdomains/hostnames
  • Linkedin – user names


    Below I will go through a few examples of data mining some common search engines for usernames, email address’s and subdomains. The information gained in passive reconnaissance can be a invaluable resource for the penetration tester.

    Lets take a look at the options which are available:

    root@666:/pentest/enumeration/google/theharvester# ./theHarvester.py

    *************************************
    *TheHarvester Ver. 1.6             *
    *Coded by Christian Martorella      *
    *Edge-Security Research             *
    *cmartorella@edge-security.com      *
    *************************************

    Usage: theharvester options

           -d: domain to search or company name
           -b: data source (google,bing,pgp,linkedin)
           -s: start in result number X (default 0)
           -v: verify host name via dns resolution
           -l: limit the number of results to work with(bing goes from 50 to 50 results,
                google 100 to 100, and pgp does'nt use this option)

    Examples:./theharvester.py -d microsoft.com -l 500 -b google
             ./theharvester.py -d microsoft.com -b pgp
             ./theharvester.py -d microsoft -l 200 -b linkedin


    Lets use cnn.com as a example:

    root@666:/pentest/enumeration/google/theharvester# ./theHarvester.py -d cnn.com -l 500 -b bing

    *************************************
    *TheHarvester Ver. 1.6             *
    *Coded by Christian Martorella      *
    *Edge-Security Research             *
    *cmartorella@edge-security.com      *
    *************************************

    Searching for cnn.com in bing :
    ======================================

    Limit:  500
    Searching results: 0
    Searching results: 50
    Searching results: 100
    Searching results: 150
    Searching results: 200
    Searching results: 250
    Searching results: 300
    Searching results: 350
    Searching results: 400
    Searching results: 450

    Accounts found:
    ====================

    @cnn.com
    cnnfutures@cnn.com
    ====================

    Total results:  2

    Hosts found:
    ====================

    www.cnn.com
    edition.cnn.com
    money.cnn.com
    sportsillustrated.cnn.com
    amfix.blogs.cnn.com
    live.cnn.com
    news.blogs.cnn.com
    politicalticker.blogs.cnn.com
    marquee.blogs.cnn.com
    weather.cnn.com
    m.cnn.com
    transcripts.cnn.com
    www.cnnstudentnews.cnn.com
    ac360.blogs.cnn.com
    campbellbrown.blogs.cnn.com
    newsource.cnn.com
    cgi.cnn.com
    joybehar.blogs.cnn.com
    topics.edition.cnn.com
    internationaldesk.blogs.cnn.com
    us.cnn.com
    larrykinglive.blogs.cnn.com
    topics.cnn.com
    weather.edition.cnn.com
    cnnwire.blogs.cnn.com
    scitech.blogs.cnn.com
    on.cnn.com
    ricksanchez.blogs.cnn.com
    archives.cnn.com
    community.cnn.com
    sports.si.cnn.com
    arabic.cnn.com
    quiz.cnn.com
    newsroom.blogs.cnn.com
    cgi.money.cnn.com
    partners.cnn.com
    pagingdrgupta.blogs.cnn.com
    features.blogs.fortune.cnn.com
    tech.fortune.cnn.com
    insession.blogs.cnn.com
    business.blogs.cnn.com
    behindthescenes.blogs.cnn.com
    olympics.blogs.cnn.com
    afghanistan.blogs.cnn.com
    gdyn.cnn.com
    premium.cnn.com
    inthefield.blogs.cnn.com
    ypwr.blogs.cnn.com
    premium.edition.cnn.com
    edition1.cnn.com
    drgupta.cnn.com
    edition2.cnn.com
    wallstreet.blogs.fortune.cnn.com
    tips.blogs.cnn.com
    mxp.blogs.cnn.com
     

    So as you can see from this search we were able to get a lot of possible subdomains but not very many email address’s. This is one reason its important to run your query on all available search engines.

    Lets show a example which will show a few more email address’s:
    root@666:/pentest/enumeration/google/theharvester# ./theHarvester.py -d 53.com -l 500 -b google

    *************************************
    *TheHarvester Ver. 1.6             *
    *Coded by Christian Martorella      *
    *Edge-Security Research             *
    *cmartorella@edge-security.com      *
    *************************************

    Searching for 53.com in google :
    ======================================

    Limit:  500
    Searching results: 0
    Searching results: 100
    Searching results: 200
    Searching results: 300
    Searching results: 400

    Accounts found:
    ====================

    josh.paskewicz@53.com
    @53.com
    info@tapioles53.com
    @.53.com
    rachael.smith@53.com
    nan.horton@53.com
    aler...@53.com
    alertingservice@53.com
    j.brinkman@53.com
    Jerome.Gilbert@53.com
    Gilbert@53.com
    michelle.weddington@53.com
    ====================

    Total results:  12

    Hosts found:
    ====================

    www.53.com
    reo.53.com
    direct.53.com
    premierissue.53.com
    retire.53.com
    ir.53.com
    tdsc.53.com
    secure.53.com
    ra.53.com
    2Fwww.53.com
    Www.53.com
    252Fwww.53.com
    espanol.53.com
    employee.53.com
    bnjhz.php?...53.com
    express.53.com
    www.ra.53.com
    Ra.53.com
    3Dreo.53.com
    wwww.53.com
    Retire.53.com
    @.53.com
    www.express.53.com
    mxism.php?...53.com
    pngyo.php?...53.com

    Using this example we got a lot more results, for example we now know that most likely all the email address’s will follow the following naming convention, firstname.lastname@53.com. This can be a very useful piece of knowledge because as long as we have a first and last name of any one at 53rd bank, we have their email address.
    This is just one of the may tools which can aid a penetration tester in the passive reconnaissance process.
    Next Post Previous Post