파이썬 Python을 사용하여 HTML에서 href 링크를 얻으려면 어떻게해야합니까?

import urllib2

website = "WEBSITE"
openwebsite = urllib2.urlopen(website)
html = getwebsite.read()

print html

여태까지는 그런대로 잘됐다.

하지만 일반 텍스트 HTML의 href 링크 만 원합니다. 이 문제를 어떻게 해결할 수 있습니까?

해결 방법

from BeautifulSoup import BeautifulSoup
import urllib2
import re

html_page = urllib2.urlopen("http://www.yourwebsite.com")
soup = BeautifulSoup(html_page)
for link in soup.findAll('a'):
    print link.get('href')

http : // 로 시작하는 링크 만 원하는 경우 다음을 사용해야합니다.

soup.findAll('a', attrs={'href': re.compile("^http://")})

BS4가있는 Python 3에서는 다음과 같아야합니다.

from bs4 import BeautifulSoup
import urllib.request

html_page = urllib.request.urlopen("http://www.yourwebsite.com")
soup = BeautifulSoup(html_page, "html.parser")
for link in soup.findAll('a'):
    print(link.get('href'))

참조 페이지 https://stackoverflow.com/questions/3075550

'파이썬' 카테고리의 다른 글

파이썬 Python Nose Import Error (0)	2020.11.23
파이썬 How to filter objects for count annotation in Django? (0)	2020.11.22
파이썬 Apache Spark 사전 빌드 버전에서 spark-csv와 같은 새 라이브러리를 추가하는 방법 (0)	2020.11.22
파이썬 팬더의 가져 오기 오류를 해결하는 방법은 무엇입니까? (0)	2020.11.22
파이썬 Spark 컨텍스트 'sc'가 정의되지 않았습니다. (0)	2020.11.22

프로그램 샘플 소스

파이썬 Python을 사용하여 HTML에서 href 링크를 얻으려면 어떻게해야합니까?

해결 방법

'파이썬' 카테고리의 다른 글

댓글

티스토리툴바

파이썬 Python을 사용하여 HTML에서 href 링크를 얻으려면 어떻게해야합니까?

해결 방법

'파이썬' 카테고리의 다른 글

관련글

댓글

티스토리툴바