We will use Alexa's website ranking system to help us discern which URLs are malicious and which are benign. Alexa ranks websites based on their popularity by looking at the number of individuals who visit the site. We use Alexa's popularity rank for each website. The basic idea for using Alexa for this purpose is that highly popular sites are usually non-malicious.
The top 10 most popular websites on Alexa are as follows:

The following Python function is used to detect the popularity:
def site_popularity_index(host_name):
xmlpath='http://data.alexa.com/data?cli=10&dat=snbamz&url='+host_name
try:
get_xml= urllib2.urlopen(xmlpath) # get the xml
get_dom =minidom.parse(get_xml) # get the dom element
get_rank_host=find_ele_with_attribute(get_dom,'REACH','RANK')
ranked_country=find_ele_with_attribute(get_dom,'COUNTRY','RANK')
return [get_rank_host,ranked_country]
except:
return [nf,nf]
We will use the preceding parameters to segregate a lousy URL from a legitimate URL. A legitimate URL will have a a proper ASN, and will have a high site-popularity index. However, these are just heuristic measures to detect the the lousiness of a URL.