Jetson & 머신러닝

구글 이미지 수집

아크리엑터 2020. 6. 8. 23:43
반응형

 

OSX(맥)에서 설치하는 방법을 아래에 설명한다.

 

1. 크롬드라이버 설치

$ brew cask install chromedriver

$ brew cask install chromedriver

Updating Homebrew...
==> Downloading https://chromedriver.storage.googleapis.com/83.0.4103.39/chromedriver_mac64.zip
Already downloaded: /Users/igi/Library/Caches/Homebrew/downloads/2d64dd6160a21ca7c931e2018ba8fea06a98b5d57417434ad4b21b125d731d1d--chromedriver_mac64.zip
==> Verifying SHA-256 checksum for Cask 'chromedriver'.
==> Installing Cask chromedriver
==> Linking Binary 'chromedriver' to '/usr/local/bin/chromedriver'.
🍺  chromedriver was successfully installed!

 

2. 구글이미지 다운로드 설치

$ git clone https://github.com/ultralytics/google-images-download

git clone https://github.com/ultralytics/google-images-download

Cloning into 'google-images-download'...
remote: Enumerating objects: 3, done.
remote: Counting objects: 100% (3/3), done.
remote: Compressing objects: 100% (3/3), done.
remote: Total 710 (delta 0), reused 1 (delta 0), pack-reused 707
Receiving objects: 100% (710/710), 301.27 KiB | 526.00 KiB/s, done.
Resolving deltas: 100% (398/398), done.

$ cd google-images-download 

 

3.  이미지 파일 다운로드

$ python3 bing_scraper.py --search 'honeybees on flowers' --limit 10 --download --chromedriver /usr/local/bin/chromedriver

$ python3 bing_scraper.py --search 'honeybees on flowers' --limit 10 --download --chromedriver /usr/local/bin/chromedriver

Traceback (most recent call last):
  File "bing_scraper.py", line 28, in <module>
    from tqdm import tqdm
ImportError: No module named 'tqdm'

 

위와 같이 모듈 찾지 못한다는 오류가 발생한다면 아래 명령어로 모듈을 설치한다.

$ pip install tqdm

Collecting tqdm
  Downloading tqdm-4.46.1-py2.py3-none-any.whl (63 kB)
     |████████████████████████████████| 63 kB 151 kB/s 
Installing collected packages: tqdm
Successfully installed tqdm-4.46.1

 

4. 정상적으로 실행된 결과 화면이다.

그런데, 중간에 보면, URLError on an image 라고 하여 오류가 발생하는 경우가 있다. TLSV1 머시기 하는 오류 메시지가 나온다. 이것은 TLSv1을 지원하지 않아서 나오는 것 같다. Client프로그램이 TLSv1을 사용하도록 하고 있나?!!!! 아니면, 서버가 TLSv2 이상을 지원하지 못하는 것인가?  그냥 추정한다.

이런 오류는 그냥 무시하고, 마지막을 보면, 실제 결과는 10개의 파일을 원했는데, 10개를 모두 수집한 결과를 볼 수 있다. 

python3 bing_scraper.py --search 'honeybees on flowers' --limit 10 --download --chromedriver /usr/local/bin/chromedriver


Searching for https://www.bing.com/images/search?q=honeybees%20on%20flowers
Downloading HTML... 2504656 elements: 100%|██████████████████████| 30/30 [00:21<00:00,  1.40it/s]
Downloading images...
1/10 https://berkshirefarmsapiary.files.wordpress.com/2013/07/imgp8415.jpg 
2/10 URLError on an image...trying next one... Error: <urlopen error [SSL: TLSV1_ALERT_PROTOCOL_VERSION] tlsv1 alert protocol version (_ssl.c:719)>
2/10 URLError on an image...trying next one... Error: <urlopen error [Errno 54] Connection reset by peer>
2/10 URLError on an image...trying next one... Error: <urlopen error [SSL: TLSV1_ALERT_PROTOCOL_VERSION] tlsv1 alert protocol version (_ssl.c:719)>
2/10 http://sites.psu.edu/alouise/wp-content/uploads/sites/38740/2016/04/save-the-bees-2.jpg 
3/10 http://static.independent.co.uk/s3fs-public/thumbnails/image/2013/06/05/18/web-bees-epa.jpg 
4/10 http://www.realfarmacy.com/wp-content/uploads/2013/05/ahoney_bee_bta_092708_074.jpg 
5/10 http://heavenawaits.files.wordpress.com/2008/11/honey_bee_extracts_nectar.jpg 
6/10 http://beesweetnaturals.com/wp-content/uploads/2013/08/Honey-Bee-Flowers-Widescreen.jpg 
7/10 URLError on an image...trying next one... Error: <urlopen error [SSL: TLSV1_ALERT_PROTOCOL_VERSION] tlsv1 alert protocol version (_ssl.c:719)>
7/10 URLError on an image...trying next one... Error: <urlopen error [SSL: TLSV1_ALERT_PROTOCOL_VERSION] tlsv1 alert protocol version (_ssl.c:719)>
7/10 https://media.mnn.com/assets/images/2015/07/HoneyBeeOnAsterFlower.jpg.838x0_q80.jpg 
8/10 https://cdn.agdaily.com/wp-content/uploads/2018/07/bg-honeybees_pollinator-003.jpg 
9/10 URLError on an image...trying next one... Error: <urlopen error [SSL: TLSV1_ALERT_PROTOCOL_VERSION] tlsv1 alert protocol version (_ssl.c:719)>
9/10 URLError on an image...trying next one... Error: <urlopen error [SSL: TLSV1_ALERT_PROTOCOL_VERSION] tlsv1 alert protocol version (_ssl.c:719)>
9/10 URLError on an image...trying next one... Error: <urlopen error [SSL: TLSV1_ALERT_PROTOCOL_VERSION] tlsv1 alert protocol version (_ssl.c:719)>
9/10 http://media.treehugger.com/assets/images/2014/09/honeybee-flower.jpg.662x0_q100_crop-scale.jpg 
10/10 URLError on an image...trying next one... Error: <urlopen error [Errno 54] Connection reset by peer>
10/10 https://www.mybeeline.co/media/cache/full/posts/Honeybees01.jpg 
Done with 9 errors in 88.7s. All images saved to /Volumes/DATA/Documents/yolo/google-images-download/images

다은 받은 파일은 images 폴더 내에 총 10개의 파일이 저장되어있다.

반응형