SCRAPING USING URLLIB

[root@rhel7 html]# python
Python 2.7.5

>>> import urllib

>>> f=urllib.urlopen("https://www.microsoft.com")
>>> print f
<addinfourl at 140286118192624 whose fp = <socket._fileobject object at 0x7f96f39ed7d0>>
>>> f
<addinfourl at 140286118192624 whose fp = <socket._fileobject object at 0x7f96f39ed7d0>>

Important : Here f is an object and by using f we can grab lots of info about url which are listed below

#### To get more info about its web page the server name where this page is hosted

>>> print f.info()

Server: Apache

ETag: "6082151bd56ea922e1357f5896a90d0a:1425454794"

Accept-Ranges: bytes

Content-Length: 1020

Content-Type: text/html

Connection: close

X-N: S

@@@ To get URL from f object

>>> print f.geturl()

https://www.microsoft.com

@@@ To read the page code written in html

p>>> print f.read()

<html><head><title>Microsoft Corporation</title><meta http-equiv="X-UA-Compatible" content="IE=EmulateIE7"></meta><meta http-equiv="Content-Type" content="text/html; charset=utf-8"></meta><meta name="SearchTitle" content="Microsoft.com" scheme=""></meta><meta name="Description" content="Get product information, support, and news from Microsoft." scheme=""></meta><meta name="Title" content="Microsoft.com Home Page" scheme=""></meta><meta name="Keywords" content="Microsoft, product, support, help, training, Office, Windows, software, download, trial, preview, demo, business, security, update, free, computer, PC, server, search, download, install, news" scheme=""></meta><meta name="SearchDescription" content="Microsoft.com Homepage" scheme=""></meta></head><body><p>Your current User-Agent string appears to be from an automated process, if this is incorrect, please click this link:<a href="http://www.microsoft.com/en/us/default.aspx?redir=true">United States English Microsoft Homepage</a></p></body></html>

@@@ You can also customize the f.info() data

>>> print f.info()['Server']

Apache

>>> print f.info()['Date']

@@@ Get response code of http

>>> print f.code
200

@@@ you can encode url keywords which are in dictonary

>>> d={1:"this",2:"tow j "}

>>> print urllib.urlencode(d)[root@rhel7 html]# python

Python 2.7.5 (default, Feb 11 2014, 07:46:25)
[GCC 4.8.2 20140120 (Red Hat 4.8.2-13)] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>>

>>> import urllib

Important : Here f is an object and by using f we can grab lots of info about url which are listed below

#### To get more info about its web page the server name where this page is hosted

>>> print f.info()

Server: Apache

ETag: "6082151bd56ea922e1357f5896a90d0a:1425454794"

Last-Modified: Wed, 04 Mar 2015 07:39:54 GMT

Accept-Ranges: bytes

Content-Length: 1020

Content-Type: text/html

Date: Mon, 10 Aug 2015 02:33:46 GMT

Connection: close

X-N: S

@@@ To get URL from f object

>>> print f.geturl()

https://www.microsoft.com

@@@ To read the page code written in html

p>>> print f.read()

@@@ You can also customize the f.info() data

>>> print f.info()['Server']

Apache

>>> print f.info()['Date']

Mon, 10 Aug 2015 02:33:46 GMT

>>>

@@@ Get response code of http

>>> print f.code
200

@@@ you can encode url keywords which are in dictonary

>>> d={1:"this",2:"tow j "}

>>> print urllib.urlencode(d)

1=this&2=tow+j+

Search This Blog

TECHTED

SCRAPING USING URLLIB

Comments

Post a Comment

Popular posts from this blog

HOW TO USE DOCKER

Flutter Native Application Development

SINGLE LOOP SORTING TECHNIQUE