본문 바로가기
Portfolio & Toy-Project

프로젝트 : 부산항만공사 서비스 제안관련 데이터분석-4

by Mr.DonyStark 2024. 3. 11.

ㅁ 프로젝트 산출물 :  https://busanportservice.streamlit.app/

 

Abstract

Busan Port

busanportservice.streamlit.app

 

ㅁ선용품 관련 품목선정 중 주류 카테고리관련 데이터 크롤링에 대한 워드 크라우드

  ○ 셀레니움 → 크롤링 활용

  ○ 뷰티풀숲 → 크롤링 활용

  ○ re  전처리 활용

  ○ wordcloud  → 워드크라우드 활용

  ○ stropwords  → 불용어 사전 활용

  ○ pandas → 전처리 활용

from selenium import webdriver #Selenium의 웹 드라이버를 사용하기 위한 모듈을 임포트
from selenium.webdriver.common.by import By #Selenium에서 사용하는 By 클래스를 임포트합. 웹 요소를 검색하는데 사용.
from selenium.webdriver.common.keys import Keys #키보드 입력 제어를 위해 Keys 클래스 임포트
from selenium.webdriver.chrome.service import Service #Chrome 드라이버 서비스를 사용하기 위한 모듈 임포트
from selenium.webdriver.chrome.options import Options #Chrome 드라이버 옵션을 설정하기 위한 클래스 임포트
from webdriver_manager.chrome import ChromeDriverManager #Chrome 드라이버를 자동으로 설치 및 관리하는데 사용되는 드라이버 매니저 임포트
from bs4 import BeautifulSoup #Beautiful soup 크롤링을위해
from wordcloud import WordCloud, STOPWORDS
from PIL import Image
import pandas as pd
import time
import numpy as np
import matplotlib.pyplot as plt

myOption = Options() #옵션객체 변수지정
myOption.add_argument("--start-maximized") #크롬 드라이버 창 최대화
myOption.add_argument("--incognito") #크롬 드라이버 시크릿모드로 진행
myOption.add_experimental_option("excludeSwitches", ["enable-automation"]) #드라이버 시작시 불필요문구 미표시되도록 설정
myOption.add_experimental_option("excludeSwitches", ["enable-logging"]) #터미널상의 불필요문구 미표시되도록 설정
myOption.add_experimental_option("detach", True) #드라이버 자동꺼짐 방지

txtList = list()
txtList.append('''Korean Alcoholic Drinks: A Beginner's Guide
Like other east Asian cultures, the consumption of alcohol is a practice dating back thousands of years in Korea. However, it was not until Koryo Dynasty (10th century CE) that Korean alcohol began to form its own, unique identity. 
Today, let's take a tour of the best that Korean liquor, wine, and beer have to offer! Whether you’re looking for a sweet Korean red wine, a shot of soju, or another Korean alcoholic drink to pair with Korean cuisine, this list has you covered.
Makgeolli 
Makgeolli can be thought of as a “raw” or “original” rice wine. If you've heard of takju before, it was likely in reference to makgeolli. The alcohol does not typically go through distillation, meaning it retains a low ABV, often around that of strong beer: 8%. In Korea, makgeolli is typically not pasteurized, meaning it retains many of its original, earthy, tangy flavors. Most forms will contain a suspension of chalky sediments, making makgeolli appear cloudy.
soju
Soju
Soju is likely the most famous alcoholic beverage produced in Korea, strongly associated with drinking culture on the peninsula. Technically similar to vodka (and the Japanese beverage, shochu), soju is clear; it is distilled from various starches including wheat, glutinous rice, barley, or sweet potato. Traditionally, the majority of the beverage is produced in the Andong region, but smaller producers have began to crop up in the last decade or so. Interestingly, unlike vodka, soju is often produced and sold at a variety of different strengths, ranging from about 15% to 50%! Soju is often consumed in small glasses, slightly larger than the size of a shot glass. For bonus points, try some somaek - it’s a combination of soju and beer that typically incorporates a lager-style of beer and is delicious!
chrysanthemums
Gukhwaju 
Rice wines hold a special place in South Korean drinking culture. Gukhwaju - also known as flower wine - is a traditional rice wine, but with a twist: it is flavored with dry chrysanthemum flowers. These flowers are crushed and added to the rice wine as it is fermenting. This addition adds both a slight tint and herbal flavor to the finished product.
[2], CC BY-SA 2.0, via Wikimedia Commons
Baekseju
Baekseju can be thought of as herbal rice alcohol. It’s not quite as astringent (or alcoholic) as traditional rice wine, but it retains much of the earthy and chalky flavor. Baekseju is known for its additives. Depending on the producer, it typically contains up to 12 different spices and herbs, notably featuring ginseng, licorice, ginger, and cinnamon.
Dansul
Dansul is another form of Korean rice wine, but it is very different from makgeolli or baekseju. Dansul features the use of “nuruk” - a traditional Korean fermentation starter used since the period of the Three Kingdoms nearly 2,000 years ago. The beverage is unique in that the rice undergoes incomplete fermentation - leaving dansul at an ABV of only 2-3%. Because of this low alcohol content, it is typically sweeter and cloudier than other rice wines. 
korean beer
Beer
While beer (maekju, 맥주; 麥酒) became familiar in Korea somewhat later than in America, it seems as if beer is now a staple of Korean drinking culture. In fact, Seoul has had a brewery for over 100 years! American consumers may be familiar with some of the most popular Korean beer brands, including Hite Jinro and Oriental Brewery. Regulatory changes by the Korean government in 2011 and 2014 have allowed the craft beer industry to expand in Korea, meaning we will hopefully see an influx of high-quality Korean beer in the near future!
black raspberry korean wine
pepelady, CC BY 4.0, via Wikimedia Commons
Bokbunja-ju 
South Korea has a strong wine culture. Interestingly, many of the most distinctive Korean wines are made from fermentables other than grapes. Bokbunja-ju - often called bokbunja wine - is distilled from Korean black raspberries. It is somewhat stronger than a typical American wine, clocking in around 15-19% ABV. The berries add a moderate amount of acidity - allowing bokjunja wine to pair well with many types of seafood.
Maesil-ju
Another famous fruit-inspired Korean alcoholic beverage, maesil-ju is known as a type of plum wine or plum liqueur. Its sweet taste comes from an infusion of yellow or green plums. The spirit is produced by starting with soju, a distilled spirit, and soaking plums in the liquor. The mixture is typically left to age for nearly 100 days, after which the fruit is removed. At this point, sugar is added and the fruit wine is left to age for 3-6 months, although it could be consumed immediately. While it’s not quite as thick as traditional European dessert wines, maesil-ju is definitely sweet enough to fit the bill.
Keywords: South Korea, Korean food, Korean drinks, alcohol, Korean wine, vodka, Korean alcoholic beverages''')

#드라이버 설정1
#https://www.90daykorean.com/korean-alcohol/
myService1 = Service(ChromeDriverManager().install()) #크롬드라이버 설치
myDriver = webdriver.Chrome(service=myService1, options=myOption) #드라이버 서비스 및 옵션 지정

targetURL = "https://www.90daykorean.com/korean-alcohol/"
myDriver.get(targetURL)
print(f"{targetURL}\t접속 완료")
getText = myDriver.find_element(By.CLASS_NAME, value='''ast-post-format-.single-layout-1.ast-no-date-box''').text
utf8GetText1 = getText.encode('utf-8')
txtList.append(utf8GetText1)
myDriver.quit()

#드라이버 설정2
#https://theculturetrip.com/asia/south-korea/articles/a-guide-to-korean-best-spirits
myService2 = Service(ChromeDriverManager().install()) #크롬드라이버 설치
myDriver = webdriver.Chrome(service=myService2, options=myOption) #드라이버 서비스 및 옵션 지정
targetURL = "https://theculturetrip.com/asia/south-korea/articles/a-guide-to-korean-best-spirits"
myDriver.get(targetURL)
print(f"{targetURL}\t접속 완료")
getText = myDriver.find_element(By.CLASS_NAME, value='''width-container''').text
utf8GetText2 = getText.encode('utf-8')
txtList.append(utf8GetText2)
myDriver.quit()

#드라이버 설정3
#https://www.gourmetpro.co/blog/south-korea-spirits-market-2023
myService3 = Service(ChromeDriverManager().install()) #크롬드라이버 설치
myDriver = webdriver.Chrome(service=myService3, options=myOption) #드라이버 서비스 및 옵션 지정
targetURL = "https://www.gourmetpro.co/blog/south-korea-spirits-market-2023"
myDriver.get(targetURL)
print(f"{targetURL}\t접속 완료")
getText = myDriver.find_element(By.CLASS_NAME, value='''blog-body-rich-text.w-richtext''').text
utf8GetText3 = getText.encode('utf-8')
txtList.append(utf8GetText3)
myDriver.quit()

#드라이버 설정4
#https://www.tasteatlas.com/best-rated-alcoholic-beverages-in-korea
myService4 = Service(ChromeDriverManager().install()) #크롬드라이버 설치
myDriver = webdriver.Chrome(service=myService4, options=myOption) #드라이버 서비스 및 옵션 지정
targetURL = "https://www.tasteatlas.com/best-rated-alcoholic-beverages-in-korea"
myDriver.get(targetURL)
print(f"{targetURL}\t접속 완료")
getText = myDriver.find_element(By.CLASS_NAME, value="top-list-article").text
utf8GetText4 = getText.encode('utf-8')
txtList.append(utf8GetText4)
myDriver.quit()

#드라이버 설정5
#https://www.korea.net/NewsFocus/Business/view?articleId=128865
myService5 = Service(ChromeDriverManager().install()) #크롬드라이버 설치
myDriver = webdriver.Chrome(service=myService5, options=myOption) #드라이버 서비스 및 옵션 지정
targetURL = "https://www.korea.net/NewsFocus/Business/view?articleId=128865"
myDriver.get(targetURL)
print(f"{targetURL}\t접속 완료")
getText = myDriver.find_element(By.CLASS_NAME, value="post-main-cont ").text
utf8GetText5= getText.encode('utf-8')
txtList.append(utf8GetText5)
myDriver.quit()

for v in range(len(txtList)):
    with open("주류영문크롤링.txt", "ab") as rawFile:
        if isinstance(txtList[v], str):
            txtList[v] = txtList[v].encode('utf-8')  # 문자열을 바이트로 변환
        rawFile.write(txtList[v])
        rawFile.write(b"\n")
 
#텍스트, 이미지 호출
with open("C:/myPython/주류영문크롤링.txt", "r", encoding="utf-8") as file:
    ma_text = file.read()
ma_img = np.array(Image.open("C:/myPython/liqourBottle.png"))

#불용어사전 추가
notUseText = ['S','re','korean','diffrent','alcohol']  #불용어 단어 추가
stopDict = set(STOPWORDS)
for n in range(len(notUseText)):
    stopDict.add(notUseText[n])


#워드크라우드
myWordCloud = WordCloud(background_color="black", #워드크라우드 배경
                        relative_scaling=0.3, #단어 상대적 크기 조정
                        max_words = 2500,  #워드크라우드 표시 단어 최대 수
                        mask = ma_img, #워드크라우드 이미지
                        stopwords = stopDict, #불용어 단어 지정
                        colormap='Set2', #칼라맵 사용
                        font_path='C:/myPython/myFont/Jalnan2TTF.ttf' #폰트사용
                        )
myWordCloud = myWordCloud.generate(ma_text)
myWordCloud

#이미지 및 틀 생성
plt.figure(figsize=(6,6))
plt.imshow(myWordCloud, interpolation="bilinear")
plt.axis("off")
plt.show()