stale element reference 오류 원인과 해결법

728x90

Selenium을 사용할 때 for 루프 내에서 화면 전환 후 다시 되돌아오는 동작을 반복할 때 발생할 수 있는 stale element reference 오류와 이를 방지하는 방법에 대한 정리

🔧 문제 상황: stale element reference: stale element not found

💬 오류 설명:

stale element reference는 DOM이 변경된 후에도 이전 요소를 참조하려 할 때 발생합니다.
예: 기사 목록 페이지에서 요소(articles[i])를 클릭해 상세 페이지로 이동한 후, driver.back()으로 목록 페이지로 돌아왔을 때, 그 이전의 articles[i]는 더 이상 유효하지 않은 객체가 됩니다.

📌 발생 조건 요약:

articles = driver.find_elements(...) # DOM 1에서 수집된 요소들
for i in range(len(articles)): 
	articles[i].click() 	# 페이지 이동 → DOM 2로 전환 
	driver.back() 			# 다시 목록 페이지 → DOM 3 (새 DOM) 
	articles[i].click() 	# ❌ 이전 요소는 stale 상태

✅ 해결 방법: 요소를 매 루프마다 다시 로드(re-fetch)

for i in range(total_articles):
    # 매 반복마다 기사 목록 요소 새로 수집
    articles = driver.find_elements(By.CSS_SELECTOR, "ul.search-result-list li a")

    # 필요한 경우 요소 수보다 반복 인덱스가 클 때 방어
    if i >= len(articles):
        break

    # 안전하게 요소 클릭
    articles[i].click()

    # 상세 페이지 처리 ...
    
    # 뒤로 가기
    driver.back()

    # 목록 페이지가 다시 로딩될 때까지 대기
    WebDriverWait(driver, 10).until(
        EC.presence_of_element_located((By.CSS_SELECTOR, "ul.search-result-list li"))
    )

📎 요약 정리

❗ 문제	페이지 이동으로 인해 이전에 수집한 요소가 stale 상태가 됨
📌 원인	driver.find_elements()로 수집한 요소는 DOM이 바뀌면 무효
✅ 해결	매 루프마다 find_elements()를 호출하여 최신 DOM 기반 요소 사용
⏳ 보완	driver.back() 후 WebDriverWait으로 페이지 재로딩 확인

💡 추가 팁

for i in range(...) 대신 while 루프나 article in articles: 패턴도 사용할 수 있지만, articles를 새로 불러오는 경우에는 index 기반이 안전합니다.
데이터가 많은 경우에는 중복 수집을 방지하기 위해 수집한 링크 URL을 따로 저장해 비교할 수도 있습니다.

💡 ChatGPT가 리팩토링한 예시 코드

from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
from selenium.webdriver.chrome.service import Service
from webdriver_manager.chrome import ChromeDriverManager
import pandas as pd
from datetime import datetime
import time

# Selenium 드라이버 설정
options = webdriver.ChromeOptions()
options.add_argument('--headless')
options.add_argument('--no-sandbox')
options.add_argument('--disable-dev-shm-usage')

driver = webdriver.Chrome(service=Service(ChromeDriverManager().install()), options=options)
wait = WebDriverWait(driver, 10)

# URL 접근
url = "https://www.etoday.co.kr/search/main?fldSort=1&keyword=%EC%98%A4%EB%8A%98%EC%9D%98+%EC%A6%9D%EC%8B%9C+%EB%A6%AC%ED%8F%AC%ED%8A%B8"
driver.get(url)
time.sleep(2)

# 크롤링 데이터 저장 리스트
data = []

# 기사 수 제한 (예: 상위 5개만)
article_limit = 5
article_index = 0

while article_index < article_limit:
    # 최신 기사 목록 요소 다시 로딩
    wait.until(EC.presence_of_element_located((By.CSS_SELECTOR, "ul.search-result-list li a")))
    articles = driver.find_elements(By.CSS_SELECTOR, "ul.search-result-list li a")

    # 방어 코드: 인덱스 초과 방지
    if article_index >= len(articles):
        print("기사 수 부족")
        break

    try:
        link = articles[article_index].get_attribute("href")
        driver.get(link)

        # 기사 본문 로드 대기
        wait.until(EC.presence_of_element_located((By.CLASS_NAME, "articleView")))

        # 날짜 추출
        date_element = driver.find_element(By.CLASS_NAME, "newsinfo")
        article_date = date_element.text.strip().split("\n")[0].split(" ")[1]

        # 본문 추출 및 ◇ 분리
        article_element = driver.find_element(By.CLASS_NAME, "articleView")
        parts = article_element.text.strip().split("◇")

        for part in parts:
            lines = part.strip().split("\n")
            if len(lines) > 1:
                jongmok_name = lines[0].strip()
                jongmok_article = "\n".join(lines[1:])
                data.append({
                    "날짜": article_date,
                    "종목명": jongmok_name,
                    "종목기사": jongmok_article
                })

        # 기사 처리 후 다시 검색 결과로 돌아감
        driver.back()
        wait.until(EC.presence_of_element_located((By.CSS_SELECTOR, "ul.search-result-list li")))

        article_index += 1

    except Exception as e:
        print(f"[오류] {article_index}번째 기사 처리 중 예외 발생: {e}")
        driver.back()
        article_index += 1
        continue

# 데이터프레임 생성 및 엑셀 저장
df = pd.DataFrame(data)
today_str = datetime.today().strftime("%Y-%m-%d")
file_name = f"crawled_data_{today_str}.xlsx"
df.to_excel(file_name, index=False)
print(f"{file_name} 파일로 저장 완료.")

driver.quit()

'IT > Selenium' 카테고리의 다른 글

DataFrame에서 특정 인덱스의 특정 열(필드) 값을 가져오는 방법 (0)	2025.05.08
pandas.DataFrame에 열(컬럼)을 추가하는 방법 (1)	2025.04.30
Selenium으로 크롤링한 데이터를 엑셀에 저장하기 (1)	2025.04.09
리스트의 데이터를 Google Sheets에 저장 (0)	2025.04.09
시크릿모드로 창 열기(ChromeOptions) (0)	2025.03.22

소소하게 꼼지락 거리는 재미난 일상들

stale element reference 오류 원인과 해결법

🔧 문제 상황: stale element reference: stale element not found

💬 오류 설명:

📌 발생 조건 요약:

✅ 해결 방법: 요소를 매 루프마다 다시 로드(re-fetch)

📎 요약 정리

💡 추가 팁

💡 ChatGPT가 리팩토링한 예시 코드

'IT > Selenium' 카테고리의 다른 글

티스토리툴바

stale element reference 오류 원인과 해결법

🔧 문제 상황: stale element reference: stale element not found

💬 오류 설명:

📌 발생 조건 요약:

✅ 해결 방법: 요소를 매 루프마다 다시 로드(re-fetch)

📎 요약 정리

💡 추가 팁

💡 ChatGPT가 리팩토링한 예시 코드

'IT > Selenium' 카테고리의 다른 글

관련글

티스토리툴바