파이썬 BeautifulSoup Grab Visible 웹 페이지 텍스트

그렇다면 스크립트, 주석, CSS 등을 제외한 모든 보이는 텍스트를 어떻게 찾아야합니까?

해결 방법

이 시도:

from bs4 import BeautifulSoup
from bs4.element import Comment
import urllib.request


def tag_visible(element):
    if element.parent.name in ['style', 'script', 'head', 'title', 'meta', '[document]']:
        return False
    if isinstance(element, Comment):
        return False
    return True


def text_from_html(body):
    soup = BeautifulSoup(body, 'html.parser')
    texts = soup.findAll(text=True)
    visible_texts = filter(tag_visible, texts)  
    return u" ".join(t.strip() for t in visible_texts)

html = urllib.request.urlopen('http://www.nytimes.com/2009/12/21/us/21storm.html').read()
print(text_from_html(html))

참조 페이지 https://stackoverflow.com/questions/1936466

'파이썬' 카테고리의 다른 글

파이썬 12 시간을 24 시간 시간으로 변환 (0)	2021.01.05
파이썬 in python, get the output of system command as a string (0)	2021.01.04
파이썬 2.7에서 원시 입력이 정수인지 어떻게 확인합니까? (0)	2021.01.04
파이썬 Python을 사용하여 암호로 보호 된 Excel 파일을 여는 방법은 무엇입니까? (0)	2021.01.04
파이썬 PDF 파일 열기 (0)	2021.01.04

프로그램 샘플 소스

파이썬 BeautifulSoup Grab Visible 웹 페이지 텍스트

해결 방법

'파이썬' 카테고리의 다른 글

댓글

티스토리툴바

파이썬 BeautifulSoup Grab Visible 웹 페이지 텍스트

해결 방법

'파이썬' 카테고리의 다른 글

관련글

댓글

티스토리툴바