45


 

1.  XML ์ด๋ž€?

๐Ÿ‹  eXtensible Markup Language ์€ ๋ฐ์ดํ„ฐ ์ €์žฅ ๋ฐ ์ „๋‹ฌ์„ ์œ„ํ•ด ๋งŒ๋“  ๋‹ค๋ชฉ์  ๋งˆํฌ์—… ์–ธ์–ด Markup Language
      ๐Ÿ“Œ  ๋งˆํฌ์—… ์–ธ์–ด๋Š” ์ผ๋ฐ˜์ ์ธ ํ…์ŠคํŠธ์™€ ๊ตฌ๋ถ„๋˜๋Š” ํƒœ๊ทธ Tag๋ฅผ ์ด์šฉํ•ด ๋ฌธ์„œ๋‚˜ ๋ฐ์ดํ„ฐ๋ฅผ ๊ตฌ์กฐํ™” ํ•˜๋Š” ์–ธ์–ด

      ๐Ÿ“Œ  JSON๊ณผ ๋งˆ์ฐฌ๊ฐ€์ง€๋กœ ๋ฐ์ดํ„ฐ ์ „๋‹ฌ์„ ๋ชฉ์ ์œผ๋กœ ๋งŒ๋“  ๊ตฌ์กฐํ™”๋œ ํ…์ŠคํŠธ ํ˜•์‹
      ๐Ÿ“Œ  ๋Œ€ํ‘œ์ ์ธ ๋งˆํฌ์—… ์–ธ์–ด๋กœ๋Š” HTML HyperText Markup Language

๐Ÿ‹  HTML์˜ ๊ฒฝ์šฐ๋Š” ํƒœ๊ทธ๊ฐ€ ๋ฏธ๋ฆฌ ์ •ํ•ด์ ธ ์žˆ์ง€๋งŒ XML์€ ์ž์‹ ์ด ํƒœ๊ทธ๋ฅผ ์ •์˜ํ•ด์„œ ์‚ฌ์šฉํ•  ์ˆ˜ ์žˆ์Œ
       โžก๏ธ ๋‹จ XML ๋ฌธ์„œ์˜ ๊ทœ์น™์„ ๋”ฐ๋ผ์•ผ ํ•จ

๐Ÿ‹  ํƒœ๊ทธ๋Š” '<๋ฌธ์ž์—ด>'๋กœ ์‹œ์ž‘ํ•ด์„œ '</๋ฌธ์ž์—ด>'๋กœ ๋๋‚˜์•ผ ํ•œ๋‹ค
      ๐Ÿ“Œ  ์‹œ์ž‘๊ณผ ๋ ํƒœ๊ทธ์˜ ๋ฌธ์ž์—ด์€ ๊ฐ™์•„์•ผ ํ•˜๋ฉฐ ๋Œ€์†Œ๋ฌธ์ž๋ฅผ ๊ตฌ๋ถ„

      ๐Ÿ“Œ  ์ค‘์ฒฉํ•ด ์—ฌ๋Ÿฌ ๊ฐœ๋ฅผ ์ด์šฉํ•  ์ˆ˜ ์žˆ๋Š”๋ฐ ์ด๋•Œ ํƒœ๊ทธ๋Š” ๋ฐ˜๋“œ์‹œ ์˜ฌ๋ฐ”๋ฅธ ์ˆœ์„œ๋Œ€๋กœ ์ด์šฉํ•ด์•ผ ํ•จ

             ex.  '<abc><def> ~ </def></abc>' ์™€ ๊ฐ™์ด ๋‚˜์ค‘์— ๋‚˜์˜จ ์‹œ์ž‘ ํƒœ๊ทธ์— ๋Œ€์‘ํ•˜๋Š” ๋ ํƒœ๊ทธ๊ฐ€ ๋จผ์ € ๋‚˜์™€์•ผ ํ•จ

     ๐Ÿ“Œ  XML ๋ฌธ์„œ์—์„œ๋Š” ์‹œ์ž‘ ํƒœ๊ทธ์—์„œ ๋ ํƒœ๊ทธ๊นŒ์ง€๋กœ ์ด๋ฃจ์–ด์ง„ ๊ฒƒ์„ ์š”์†Œ element๋ผ๊ณ  ํ•จ

๐Ÿ‹  XML ๋ฌธ์„œ์—๋Š” ๋ฐ˜๋“œ์‹œ ์ตœ์ƒ์œ„ root ์š”์†Œ๊ฐ€ ์žˆ์–ด์•ผ ํ•จ 

      ๐Ÿ“Œ  ์ตœ์ƒ์œ„ ์š”์†Œ๋Š” ์‹œ์ž‘๊ณผ ๋ ํƒœ๊ทธ๋กœ ๋‹ค๋ฅธ ๋ชจ๋“  ์š”์†Œ๋ฅผ ๊ฐ์‹ธ์•ผ ํ•จ

๐Ÿ‹  ์ฃผ์„์€ '<!--'์™€  '-->'๋กœ ๋ฌธ์ž์—ด์„ ๊ฐ์‹ธ์„œ ํ‘œ์‹œ. ์ฆ‰ '<!-- ์ด๊ฒƒ์€ ์ฃผ์„์ž…๋‹ˆ๋‹ค -->'์™€ ๊ฐ™์ด ์‚ฌ์šฉ

๐Ÿ‹  xml์„ ์ง€์ •ํ•˜๋Š” ๋ชจ๋“ˆ์€ ๋‚ด์žฅ ๋ชจ๋“ˆ๋กœ xml์ด ์žˆ๊ณ , ์™ธ๋ถ€ ๋ชจ๋“ˆ์˜ ๊ฒฝ์šฐ xmltodict ์‚ฌ์šฉ
     ๐Ÿ“Œ  xml ๋ฐ์ดํ„ฐ๋ฅผ ํŒŒ์ด์ฌ์˜ ๋”•์…”๋„ˆ๋ฆฌ ํƒ€์ž…์œผ๋กœ ๋ฐ”๋กœ ๋ณ€ํ™˜
 

๐Ÿ‹  xmltodict ๋ผ์ด๋ธŒ๋Ÿฌ๋ฆฌ์—์„œ XML ํ˜•์‹์˜ ๋ฐ์ดํ„ฐ๋ฅผ ํŒŒ์ด์ฌ์˜ ๋”•์…”๋„ˆ๋ฆฌ ํƒ€์ž…์œผ๋กœ ๋ณ€ํ™˜ํ•˜๋Š” ํ•จ์ˆ˜
     ๐Ÿ“Œ  xmltodict.parse(xml_input, [ xml_attribs=True ํ˜น์€ False ])
     ๐Ÿ“Œ  xml_input์€ XML ํƒ€์ž…์˜ ๋ฐ์ดํ„ฐ
     ๐Ÿ“Œ  xml_attribs์€ ๊ธฐ๋ณธ ๊ฐ’์€ True
            โžก๏ธ  False ์ด๋ฉด XML ํ˜•์‹์˜ ๋ฐ์ดํ„ฐ๋ฅผ ๋”•์…”๋„ˆ๋ฆฌ ํƒ€์ž…์œผ๋กœ ๋ณ€ํ™˜ํ•  ๋•Œ ์†์„ฑ์„ ๋ฌด์‹œ

 

๐Ÿ‹   pprint.pprint() ํ•จ์ˆ˜๋ฅผ ์‚ฌ์šฉํ•˜๋ฉด ๋‚ด์šฉ์ด ๋“ค์—ฌ์“ฐ๊ธฐ ๋œ ์ƒํƒœ๋กœ ๋ณด๊ธฐ์ข‹๊ฒŒ ์ถœ๋ ฅ ๊ฐ€๋Šฅ  

import xmltodict
import pprint

with open('../input_2/data.xml', 'r', encoding='utf-8') as xml_file:
    dict_data = xmltodict.parse(xml_file.read(), xml_attribs=True)  # xml ๋ฐ์ดํ„ฐ๋ฅผ ๋”•์…”๋„ˆ๋ฆฌ๋กœ ๋ณ€๊ฒฝ
    #print(dict_data)
    #print(dict_data['response']['body']['items'])
    datas = dict_data['response']['body']['items']
    pprint.pprint(datas)

 

data.xml ๋‚ด์šฉ

 

์‹คํ–‰ ๊ฒฐ๊ณผ

 

 


 

2. ์˜คํŽ˜๋ผ ๊ณต์—ฐ ์ œ๋ชฉ๊ณผ ๋งํฌ์ฃผ์†Œ ์ถ”์ถœ ๋ฐ ์ €์žฅ

 

import csv
from datetime import datetime
import xmltodict
import pprint
import requests

# 1. ๋ฐ์ดํ„ฐ ์ถ”์ถœ
url = 'https://www.daeguoperahouse.org/rss.php'
response = requests.get(url)
dict_data = xmltodict.parse(response.text)
pprint.pprint(dict_data)

์‹คํ–‰ ๊ฒฐ๊ณผ


 

# 2. ํ•„์š”ํ•œ ๋ฐ์ดํ„ฐ ์ถ”์ถœํ•˜๊ธฐ

data_list = dict_data['rss']['channel']['item']

new_datas:list[dict] = list()  # ํ•„์š” ๋‚ด์šฉ๋งŒ ๋‹ด์„ ๋ฆฌ์ŠคํŠธ ํ•„์š”

for k in data_list:
    new_data = dict()
    new_data['title'] = k['title']
    new_data['link'] = k['link']
    new_datas.append(new_data)
pprint.pprint(new_datas)

 

์‹คํ–‰ ๊ฒฐ๊ณผ


# 3. ํ˜„์žฌ ๋‚ ์งœ๋กœ ํŒŒ์ผ๋ช… ์ง€์ •ํ•ด์„œ ์ €์žฅํ•˜๊ธฐ

today = datetime.now().strftime('%y%m%d')  # ํ˜„์žฌ ๋‚ ์งœ ๊ตฌํ•˜๊ธฐ
print(today)

with open(f'../output_02/daeguoperahouse_{today}.csv', 'wt', newline='', encoding='utf-8') as csv_file:
    writer = csv.DictWriter(csv_file, fieldnames=['title', 'link'])
    writer.writeheader()
    for k in new_datas:
        writer.writerow(k)

print('ํŒŒ์ผ ์ƒ์„ฑ์„ ์„ฑ๊ณตํ–ˆ์Šต๋‹ˆ๋‹ค.')

ํŒŒ์ผ ์ƒ์„ฑ ์™„๋ฃŒ

 

 

 

 

 

 

 

[ ๋‚ด์šฉ ์ฐธ๊ณ  : IT ํ•™์› ๊ฐ•์˜ ]

+ Recent posts