파이썬 Pyspark는 표준 목록을 데이터 프레임으로 변환

케이스는 정말 간단합니다. 다음 코드를 사용하여 파이썬 목록을 데이터 프레임으로 변환해야합니다.

from pyspark.sql.types import StructType
from pyspark.sql.types import StructField
from pyspark.sql.types import StringType, IntegerType

schema = StructType([StructField("value", IntegerType(), True)])
my_list = [1, 2, 3, 4]
rdd = sc.parallelize(my_list)
df = sqlContext.createDataFrame(rdd, schema)

df.show()

다음 오류로 인해 실패했습니다.

    raise TypeError("StructType can not accept object %r in type %s" % (obj, type(obj)))
TypeError: StructType can not accept object 1 in type <class 'int'>

해결 방법

이 솔루션은 또한 더 적은 코드를 사용하고 RDD 로의 직렬화를 피하며 이해하기 쉬운 접근 방식입니다.

from pyspark.sql.types import IntegerType

# notice the variable name (more below)
mylist = [1, 2, 3, 4]

# notice the parens after the type name
spark.createDataFrame(mylist, IntegerType()).show()

참고 : 변수 list 이름 지정 정보 : list 라는 용어는 Python 내장 함수이므로 이름 / 라벨로 내장 이름을 사용하지 않는 것이 좋습니다. list () 함수와 같은 것을 덮어 쓰게되기 때문입니다. 빠르고 더러운 것을 프로토 타이핑 할 때 많은 사람들이 mylist 와 같은 것을 사용합니다.

참조 페이지 https://stackoverflow.com/questions/48448473

'파이썬' 카테고리의 다른 글

파이썬 Python: How to remove empty lists from a list? (0)	2020.10.14
파이썬 Check if a Python list item contains a string inside another string (0)	2020.10.14
파이썬 Unable to install matplotlib using pip (0)	2020.10.13
파이썬 하위 프로세스 호출을 텍스트 파일로 어떻게 파이프합니까? (0)	2020.10.13
파이썬 TypeError : ca n't multiply sequence by non-int of type 'float'가 발생하는 이유는 무엇입니까? (0)	2020.10.13

프로그램 샘플 소스

파이썬 Pyspark는 표준 목록을 데이터 프레임으로 변환

해결 방법

'파이썬' 카테고리의 다른 글

댓글

티스토리툴바

파이썬 Pyspark는 표준 목록을 데이터 프레임으로 변환

해결 방법

'파이썬' 카테고리의 다른 글

관련글

댓글

티스토리툴바