๋ฐ์ดํ„ฐ ๋ถ„์„

[๋ฐ์ดํ„ฐ ๋ถ„์„] pandas ๊ธฐ์ดˆ

์ฃผ์˜ ๐Ÿฑ 2021. 3. 22. 11:22
728x90
๋ฐ˜์‘ํ˜•

pandas

ํ…Œ์ด๋ธ” ํ˜•ํƒœ์˜ ๋ฐ์ดํ„ฐ๋ฅผ ์‰ฝ๊ฒŒ ๋‹ค๋ฃฐ ์ˆ˜ ์žˆ๋Š” ํŒŒ์ด์ฌ ๋ผ์ด๋ธŒ๋Ÿฌ๋ฆฌ

 

pandas ๋ผ์ด๋ธŒ๋Ÿฌ๋ฆฌ ๋ถˆ๋Ÿฌ์˜ค๊ธฐ

import pandas as pd

pd.ํ•จ์ˆ˜๋ช…()

 

 

20-11arrived.xlsx

row1 = ์ œ๋ชฉ --๋ฐ์ดํ„ฐ ๋ถ„์„์—๋Š” ๋ถˆํ•„์š”

row2 = 4๊ฐœ์˜ ์นผ๋Ÿผ๋ช…

row3~8 = ๋ถ„์„ํ•  ๋ฐ์ดํ„ฐ -- ๋ฐ์ดํ„ฐ ๋ถ„์„์— ํ•„์š”

row 9~10 = ์š”์•ฝ ์ •๋ณด -- ๋ฐ์ดํ„ฐ ๋ถ„์„์— ๋ถˆํ•„์š”

 

 

arrived_1= pd.read_excel('./Downloads/20-11arrived.xlsx',header=1, skipfooter =2,usecols = 'A:C')

arrived_1.head(3)

pd.read_excel() -์—‘์…€ ํŒŒ์ผ ๋ถˆ๋Ÿฌ์˜ค๋Š” ํ•จ์ˆ˜ ์•ˆ์— Path๋„ฃ์Œ

header =1  - ์นผ๋Ÿผ๋ช… ์žˆ๋Š” ์œ„์น˜. 0๋ถ€ํ„ฐ ์‹œ์ž‘์ด๋ฏ€๋กœ row2 = 1

skipfooter =2 - ๋งˆ์ง€๋ง‰ row์—์„œ ๋‘ ์ค„์„ ์ƒ๋žต

usecols ='A:C' A๋ถ€ํ„ฐ C์นผ๋Ÿผ๊นŒ์ง€ ๋ถˆ๋Ÿฌ์˜จ๋‹ค

 

.head(3) - ์ฒ˜์Œ๋ถ€ํ„ฐ 3๋ฒˆ์งธrow๊นŒ์ง€ ๋ณด์—ฌ์ค€๋‹ค,

 

.tail(3)

 

arrived_1.info() ๋ฐ์ดํ„ฐ ์š”์•ฝ ์ •๋ณด ํ™•์ธ

 

<class 'pandas.core.frame.DataFrame'>
์ด ๋ฐ์ดํ„ฐ๋Š” pandas์˜ ๋ฐ์ดํ„ฐํ”„๋ ˆ์ž„ ํด๋ž˜์Šค๋‹ค

RangeIndex: 6 entries, 0 to 5

0~5๊นŒ์ง€ ์ด 6 rows ๋กœ ๊ตฌ์„ฑ

Data columns (total 3 columns):

์ด 3๊ฐœ์˜ ์นผ๋Ÿผ

 

0 ๊ตญ์ ์ฝ”๋“œ 6 non-null object

๊ตญ์ ์ฝ”๋“œ ์นผ๋Ÿผ์€ ๋นˆ์นธ ์—†์ด(non-null) 6๊ฐœ์˜ row๋กœ ๊ตฌ์„ฑ๋œ object(๋ฌธ์ž ์†์„ฑ)๋ณ€์ˆ˜๋‹ค

 

dtypes: int64(1), object(2)

 

memory usage: 272.0+ bytes

 

 

๊ธฐ์ดˆํ†ต๊ณ„๋Ÿ‰ ํ™•์ธ

arrived_1.describe()

์ˆซ์žํ˜• ๋ณ€์ˆ˜์— ๋Œ€ํ•œ ์—ฌ๋Ÿฌ๊ฐ€์ง€ ํ†ต๊ณ„๋Ÿ‰์„ ์ถœ๋ ฅํ•˜๋Š” ํ•จ์ˆ˜(์—ฌ๊ธฐ์„œ๋Š” ์ž…๊ตญ๊ฐ์ˆ˜๊ฐ€ ์ˆซ์žํ˜•๋ณ€์ˆ˜)

์ˆœ์„œ๋Œ€๋กœ ๊ฐœ์ˆ˜,ํ‰๊ท ,ํ‘œ์ค€ํŽธ์ฐจ,์ตœ์†Ÿ๊ฐ’,1๋ถ„์œ„์ˆ˜,2๋ถ„์œ„์ˆ˜(์ค‘์œ„์ˆ˜),3๋ถ„์œ„์ˆ˜,์ตœ๋Œ“๊ฐ’

 

 


  ๋ฐ์ดํ„ฐ ์„ ํƒ - ์นผ๋Ÿผ ๊ธฐ์ค€   

 

arrived_1['์ž…๊ตญ๊ฐ์ˆ˜']

์—ฌ๋Ÿฌ ์นผ๋Ÿผ์„ ์„ ํƒํ•˜๋ ค๋ฉด ๋ฆฌ์ŠคํŠธ๋กœ ๋ฌถ์–ด์ฃผ์–ด์•ผ ํ•œ๋‹ค

arrived_1[['๊ตญ์ ์ฝ”๋“œ','์ž…๊ตญ๊ฐ์ˆ˜']]

 

 

์นผ๋Ÿผ ์ƒ์„ฑ

๊ธฐ์กด ๋ฐ์ดํ„ฐ์— ์กด์žฌํ•˜์ง€ ์•Š๋˜ ๊ธฐ์ค€๋…„์›”์ด๋ผ๋Š” ์นผ๋Ÿผ์— '2020-11'์„ ๋ถ€์—ฌํ•œ ๊ฒฐ๊ณผ,๊ธฐ์ค€๋…„์›”์ด๋ผ๋Š” ์‹ ๊ทœ ์นผ๋Ÿผ์ด ์ƒ์„ฑ๋จ

arrived_1['๊ธฐ์ค€๋…„์›”']= '2020-11'
arrived_1

 


 

 

  ๋ฐ์ดํ„ฐ ์„ ํƒ -  ๋กœ์šฐ ๊ธฐ์ค€  : ํŠน์ •์กฐ๊ฑด์— ๋งž๋Š” ๋ฐ์ดํ„ฐ ํ•„ํ„ฐ๋ง ํ•  ๋•Œ

 

์„ฑ๋ณ„์ด ๋‚จ์„ฑ์ธ ๊ฒฝ์šฐ๋งŒ ์ฐพ๊ธฐ

condition = (arrived_1['์„ฑ๋ณ„'] == '๋‚จ์„ฑ')
condition

๋‚จ์„ฑ์˜ index ๋ฒˆํ˜ธ๋Š” 0,2,4

arrived_1[condition]

 

 

์„ฑ๋ณ„์ด ๋‚จ์„ฑ ์ด๋ฉด์„œ ์ž…๊ตญ๊ฐ์ˆ˜๊ฐ€ 150,000๋ช… ์ด์ƒ

&์œผ๋กœ ์—ฐ๊ฒฐ

condition = (arrived_1['์„ฑ๋ณ„']== '๋‚จ์„ฑ') & (arrived_1['์ž…๊ตญ๊ฐ์ˆ˜']>=150000)
arrived_1[condition]

AND &

OR |

 


condition = (arrived_1['๊ตญ์ ์ฝ”๋“œ'].isin(['A01','A08']))
arrived_1[condition==False]

.isin() ์•ˆ์— ์ฐพ๊ณ  ์‹ถ์€ ๊ฐ’์„ ๋ฆฌ์ŠคํŠธ ํ˜•ํƒœ๋กœ ๋„ฃ์œผ๋ฉด True False ๋ฐ˜ํ™˜

condition == False ๋กœ ์„ค์ •ํ•˜๋ฉด ํ•ด๋‹นํ•˜์ง€ ์•Š๋Š” ๊ฒฝ์šฐ๋งŒ ๋ณผ ์ˆ˜ ์žˆ์Œ

 

๋ฐ˜์‘ํ˜•