Python ->
Python script for finding string in web page

Following python script scans web page line by line in order to find some string. In many cases web scrapping scenarios evolves around [BeautifulSoup] and [requests] modules. But this solution get results by using [urllib2] and [html2text] modules, simply reading html document as plain text. Original idea belongs to Randy Olson and can be found here: https://gist.github.com/rhiever/8411589 . I like this idea because it requires few code to write.

# ---------------------------------------------------------
# Simple web scrapping using urllib2 and html2text modules
# ---------------------------------------------------------

import urllib2
from html2text import html2text


def main():
   # main begin

   # set web-page address
   url = 'http://www.sqlexamples.info/tsql.htm'
   # set search string
   search_str = 'Table'

   # print title
   print
   print 'web page address: {0}'.format(url)
   print 'looking for string: {0}'.format(search_str)
   print

   print '---- begin ----'
   print

   # execute of urllib2.urlopen("...").read() returns web page data
   # html2text converts raw data to text format
   # text will be splited to lines using .split("\n")

   # print each line that includes search_str
   for line in html2text(urllib2.urlopen(url).read()).split("\n"):
     if search_str in line:
       print line

   print
   print '----- end -----'

   # main end


if __name__ == "__main__": main()




sqlexamples.info