但urllib2模块自身功用虽然还行,却仍离pycurl模块有较大间隔。且在做略微复杂一些的操作时,pycurl模块运用起来比urllib2模块方便很多。pycurl是必需要纯熟运用的模块之一,但也由于它上手能够会让人有摇头疼,这里我本人写了一个模块,以此即可方便的运用它: from pycurl import *
import StringIO, time, random def curl(url, retry=False, delay=1, **kwargs):
\'\'\'Basic usage: curl(\'http:// www. xxx.com/\'), will download the url. If set `retry` to True, when network error, it will retry automatically.
`delay` set the seconds to delay between every retry. **kwargs can be curl params. For example:
useragent_list = [ \'Mozilla/5.0 (Windows; U; Windows NT 5.1; en-GB; rv:1.8.1.6) Gecko/20070725 Firefox/2.0.0.6\',
\'Mozilla/4.0 (compatible; MSIE 7.0; Windows NT 5.1)\', \'Mozilla/4.0 (compatible; MSIE 7.0; Windows NT 5.1; .NET CLR 1.1.4322; .NET CLR 2.0.50727; .NET CLR 3.0.04506.30)\',
\'Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; .NET CLR 1.1.4322)\', \'Opera/9.20 (Windows NT 6.0; U; en)\',
\'Mozilla/4.0 (compatible; MSIE 5.0; Windows NT 5.1; .NET CLR 1.1.4322)\', \'Opera/9.00 (Windows NT 5.1; U; en)\',
\'Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; en) Opera 8.50\', \'Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; en) Opera 8.0\',
\'Mozilla/4.0 (compatible; MSIE 6.0; MSIE 5.5; Windows NT 5.1) Opera 7.02 [en]\', \'Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.7.5) Gecko/20060127 Netscape/8.1\',
] size = len(useragent_list)
useragent = useragent_list[random.randint(0, size-1)] s = StringIO.StringIO()