【Python】urllibモジュールについて【標準ライブラリ】

2025年1月31日 hiroshi コメント 0件のコメント

urllibモジュールとは
urllib.requestによるHTTPリクエスト
urllib.parseによるURL解析
urllib.errorによるエラーハンドリング
urllib.robotparserによるrobots.txt解析
まとめ

urllibモジュールとは

Pythonの標準ライブラリであるurllibは、Webとの通信やURLの処理を簡単に行うためのモジュールです。 urllibは4つのサブモジュールを持ち、それぞれ異なる用途に使われます。

urllib.request – HTTPリクエストの送信
urllib.parse – URLの解析や構成
urllib.error – HTTPリクエストのエラーハンドリング
urllib.robotparser – robots.txtの解析

urllib.requestによるHTTPリクエスト

urllib.request は、HTTPやHTTPSを使ったWebリクエストを送信するためのサブモジュールです。

基本的な使い方

import urllib.request

url = "https://www.example.com"
response = urllib.request.urlopen(url)
html = response.read().decode("utf-8")

print(html)  # 取得したHTMLを表示

リクエストヘッダーの追加

import urllib.request

url = "https://www.example.com"
req = urllib.request.Request(url, headers={"User-Agent": "Mozilla/5.0"})

with urllib.request.urlopen(req) as response:
    html = response.read().decode("utf-8")

print(html)

POSTリクエストの送信

import urllib.request
import urllib.parse

url = "https://www.example.com/login"
data = {"username": "user", "password": "pass"}
encoded_data = urllib.parse.urlencode(data).encode("utf-8")

req = urllib.request.Request(url, data=encoded_data, method="POST")
with urllib.request.urlopen(req) as response:
    print(response.read().decode("utf-8"))

urllib.parseによるURL解析

urllib.parse はURLの解析や組み立てを行うためのモジュールです。

URLの分解

import urllib.parse

url = "https://www.example.com/search?q=python&sort=asc"
parsed_url = urllib.parse.urlparse(url)

print(parsed_url.scheme)  # https
print(parsed_url.netloc)  # www.example.com
print(parsed_url.path)    # /search
print(parsed_url.query)   # q=python&sort=asc

クエリパラメータの解析と生成

import urllib.parse

query = "q=python&sort=asc"
params = urllib.parse.parse_qs(query)

print(params)  # {'q': ['python'], 'sort': ['asc']}

new_query = urllib.parse.urlencode({"q": "urllib", "sort": "desc"})
print(new_query)  # q=urllib&sort=desc

URLの結合

import urllib.parse

base = "https://www.example.com/search"
params = {"q": "python", "sort": "asc"}

full_url = base + "?" + urllib.parse.urlencode(params)
print(full_url)  # https://www.example.com/search?q=python&sort=asc

urllib.errorによるエラーハンドリング

urllib.error は、HTTPリクエスト中のエラーを処理するためのモジュールです。

HTTPエラーのキャッチ

import urllib.request
import urllib.error

url = "https://www.example.com/nonexistent"

try:
    response = urllib.request.urlopen(url)
except urllib.error.HTTPError as e:
    print("HTTPError:", e.code, e.reason)
except urllib.error.URLError as e:
    print("URLError:", e.reason)

リトライ処理

import urllib.request
import time

url = "https://www.example.com"

for _ in range(3):  # 3回リトライ
    try:
        response = urllib.request.urlopen(url)
        print(response.read().decode("utf-8"))
        break
    except urllib.error.URLError as e:
        print("リクエスト失敗:", e.reason)
        time.sleep(2)  # 2秒待機

urllib.robotparserによるrobots.txt解析

urllib.robotparser は、robots.txtのルールを解析し、特定のURLがクロール可能かを確認するためのモジュールです。

robots.txtの解析

import urllib.robotparser

rp = urllib.robotparser.RobotFileParser()
rp.set_url("https://www.example.com/robots.txt")
rp.read()

print(rp.can_fetch("*", "https://www.example.com/page"))

まとめ

urllib モジュールは、Webとの通信やURL解析に便利な機能を提供します。ただし、最新のWeb開発では requests ライブラリを使うことが一般的なので、必要に応じて使い分けるとよいでしょう。

勉強お助け猫の庭

講義レビューや教科書レビュー、自動メール作成、Rによるプログラミングについての記事を提供します。

【Python】urllibモジュールについて【標準ライブラリ】

2025年1月31日 hiroshi コメント 0件のコメント

urllibモジュールとは

urllib.requestによるHTTPリクエスト

基本的な使い方

リクエストヘッダーの追加

POSTリクエストの送信

urllib.parseによるURL解析

URLの分解

クエリパラメータの解析と生成

URLの結合

urllib.errorによるエラーハンドリング

HTTPエラーのキャッチ

リトライ処理

urllib.robotparserによるrobots.txt解析

robots.txtの解析

まとめ

コメントを残すコメントをキャンセル

urllibモジュールとは

urllib.requestによるHTTPリクエスト

基本的な使い方

リクエストヘッダーの追加

POSTリクエストの送信

urllib.parseによるURL解析

URLの分解

クエリパラメータの解析と生成

URLの結合

urllib.errorによるエラーハンドリング

HTTPエラーのキャッチ

リトライ処理

urllib.robotparserによるrobots.txt解析

robots.txtの解析

まとめ

コメントを残す コメントをキャンセル

コメントを残すコメントをキャンセル