We can also look for features that are usually found in malicious pages, such as the following:
- The count of HTML tags in the web page
- The count of hyperlinks in the web page
- The count of iframes in the web page
We can search for these features using the following code:
def web_content_features(url):
webfeatures={}
total_count=0
try:
source_code = str(opener.open(url))
webfeatures['src_html_cnt']=source_code.count('<html')
webfeatures['src_hlink_cnt']=source_code.count('<a href=')
webfeatures['src_iframe_cnt']=source_code.count('<iframe')
We can also count the number of suspicious JavaScript objects, as shown in the following list:
- The count of evals
- The count of escapes
- The count of links
- The count of underescapes
- The count of exec() functions
- The count of search functions
We can count these objects using the following code:
webfeatures['src_eval_cnt']=source_code.count('eval(')
webfeatures['src_escape_cnt']=source_code.count('escape(')
webfeatures['src_link_cnt']=source_code.count('link(')
webfeatures['src_underescape_cnt']=source_code.count('underescape(')
webfeatures['src_exec_cnt']=source_code.count('exec(')
webfeatures['src_search_cnt']=source_code.count('search(')
We can also count the number of times html, hlink, and iframe appear in the web feature keys, as shown in the following code:
for key in webfeatures:
if(key!='src_html_cnt' and key!='src_hlink_cnt' and key!='src_iframe_cnt'):
total_count=total_count + webfeatures[key]
webfeatures['src_total_jfun_cnt']=total_count
We also look for other web features and handle the exceptions, as shown in the following code:
except Exception, e:
print "Error"+str(e)+" in downloading page "+url
default_value=nf
webfeatures['src_html_cnt']=default_value
webfeatures['src_hlink_cnt']=default_value
webfeatures['src_iframe_cnt']=default_value
webfeatures['src_eval_cnt']=default_value
webfeatures['src_escape_cnt']=default_value
webfeatures['src_link_cnt']=default_value
webfeatures['src_underescape_cnt']=default_value
webfeatures['src_exec_cnt']=default_value
webfeatures['src_search_cnt']=default_value
webfeatures['src_total_jfun_cnt']=default_value
return webfeatures