파이썬 utf8 codec can't decode byte 0x96 in python

여러 사이트의 페이지에 특정 단어가 있는지 확인하려고합니다. 스크립트는 예를 들어 15 개 사이트에서 잘 실행되고 중지됩니다.

UnicodeDecodeError : 'utf8'코덱은 15344 위치에서 0x96 바이트를 디코딩 할 수 없습니다. 잘못된 시작 바이트

나는 stackoverflow에 대한 검색을 수행하고 많은 문제를 발견했지만 내 경우에 무엇이 잘못되었는지 이해할 수 없습니다.

filetocheck = open("bloglistforcommenting","r")
resultfile = open("finalfile","w")

for countofsites in filetocheck.readlines():
        sitename = countofsites.strip()
        htmlfile = urllib.urlopen(sitename)
        page = htmlfile.read().decode('utf8')
        match = re.search("Enter your name", page)
        if match:
            print "match found  : " + sitename
            resultfile.write(sitename+"\n")

        else:
            print "sorry did not find the pattern " +sitename

print "Finished Operations"

Mark의 의견에 따라 코드를 변경하여 beautifulsoup을 구현했습니다.

htmlfile = urllib.urlopen("http://www.homestead.com")
page = BeautifulSoup((''.join(htmlfile)))
print page.prettify()

이제이 오류가 발생합니다.

page = BeautifulSoup((''.join(htmlfile)))
TypeError: 'module' object is not callable

드디어 작동하게되었습니다. 도와 주셔서 감사합니다. 다음은 최종 코드입니다.

import urllib
import re
from BeautifulSoup import BeautifulSoup

filetocheck = open("listfile","r")

resultfile = open("finalfile","w")
error ="for errors"

for countofsites in filetocheck.readlines():
        sitename = countofsites.strip()
        htmlfile = urllib.urlopen(sitename)
        page = BeautifulSoup((''.join(htmlfile)))  
        pagetwo =str(page) 
        match = re.search("Enter YourName", pagetwo)
        if match:
            print "match found  : " + sitename
            resultfile.write(sitename+"\n")

        else:
            print "sorry did not find the pattern " +sitename

print "Finished Operations"

해결 방법

Beautiful Soup은 빠른 속도를 위해 설계된 Python HTML / XML 파서입니다. turnaround projects like screen-scraping. Three features make it 강한:

뷰티플 수프는 나쁜 마크 업을 주면 질식하지 않습니다. 그것은 parse tree that makes approximately as much sense as your original document. This is usually good enough to collect the data you need and 도망쳐.

Beautiful Soup은 몇 가지 간단한 방법과 Pythonic을 제공합니다. idioms for navigating, searching, and modifying a parse tree: a toolkit for dissecting a document and extracting what you need. You 각 응용 프로그램에 대해 사용자 지정 파서를 만들 필요가 없습니다.

아름다운 Soup automatically converts incoming documents to Unicode and outgoing documents to UTF-8. You don't have to think about encodings, unless the document doesn't specify an encoding and Beautiful Soup can't 하나를 자동 감지합니다. 그런 다음 원래 인코딩을 지정하기 만하면됩니다.

내 강조.

참조 페이지 https://stackoverflow.com/questions/7873556

'파이썬' 카테고리의 다른 글

파이썬 비 차단 Redis Pubsub가 가능합니까? (0)	2020.09.27
파이썬 if-condition-assignment 한 줄 (0)	2020.09.27
파이썬 PIL에서 생성 한 이미지를 브라우저로 보내는 방법은 무엇입니까? (0)	2020.09.27
파이썬 Find free disk space in python on OS/X (0)	2020.09.27
파이썬에서 ','로 구분 된 목록에서 객체 분할 (0)	2020.09.27

프로그램 샘플 소스

파이썬 utf8 codec can't decode byte 0x96 in python

해결 방법

'파이썬' 카테고리의 다른 글

댓글

티스토리툴바

파이썬 utf8 codec can't decode byte 0x96 in python

해결 방법

'파이썬' 카테고리의 다른 글

관련글

댓글

티스토리툴바