There is another recipe in this module that illustrates how to extract e-mails from a website. This recipe will show you how to create a local Maltego transform, which you can then use within Maltego itself to generate information. It can be used in conjunction with URL spidering transforms to pull e-mails from entire websites.
The following code shows how to extract e-mails from a website through the use of regular expressions:
import urllib2
import re
import sys
tarurl = sys.argv[1]
url = urllib2.urlopen(tarurl).read()
regex = re.compile((“([a-z0-9!#$%&’*+\/=?^_`{|}~- ]+(?:\.[*+\/=?^_`{|}~-]+(?:\.[a-z0-9!#$%&’*+\/=?^_`” “{|}~- ]+)*(@|\sat\s)(?:[a-z0-9](?:[a-z0-9-]*[a-z0-9])?(\.|” “\ sdot\s))+[a-z0-9](?:[a-z0-9-]*[a-z0-9])?)”))
print”<MaltegoMessage>”
print”<MaltegoTransformResponseMessage>”
print” <Entities>”
emails = re.findall(regex, url)
for email in emails:
print” <Entity Type=\”maltego.EmailAddress\”>”
print” <Value>”+str(email[0])+”</Value>”
print” </Entity>”
print” </Entities>”
print”</MaltegoTransformResponseMessage>”
print”</MaltegoMessage>”The top of the script imports the necessary modules. After this, we then assign the URL supplied as an argument to a variable and open the url list using urllib2:
tarurl = sys.argv[1] url = urllib2.urlopen(tarurl).read()
We then create a regular expression that matches the format of a standard e-mail address:
regex = re.compile((“([a-z0-9!#$%&’*+\/=?^_`{|}~-]+(?:\.[a-z0- 9!#$%&’*+\/=?^_`” “{|}~-]+)*(@|\sat\s)(?:[a-z0-9](?:[a-z0-9- ]*[a-z0-9])?(\.|” “\sdot\s))+[a-z0-9](?:[a-z0-9-]*[a-z0-9])?)”))The preceding regular expression should match e-mail addresses in the format email@address.com or e-mail at address dot com.
We then output the tags required for a valid Maltego transform output:
print”<MaltegoMessage>” print”<MaltegoTransformResponseMessage>” print” <Entities>”
Then, we find all instances of text that match our regular expression inside the url content:
emails = re.findall(regex, url)
We then take each e-mail address we have found and output it in the correct format for a Maltego transform response:
for email in emails: print” <Entity Type=\”maltego.EmailAddress\”>” print” <Value>”+str(email[0])+”</Value>” print” </Entity>”
We then close the open tags that we opened earlier:
print” </Entities>” print”</MaltegoTransformResponseMessage>” print”</MaltegoMessage>”