Skip to content

kmc-jp/heineken-crawler

Repository files navigation

heineken-crawler

Hayai Kensaku Pukiwiki Crawler & Client

Requirements

Python 3.x

Usage

You can also use Docker image. Entrypoint is poetry run python3.

PukiWiki

  • Setup
$ poetry install
$ edit config/pukiwiki.py
  • Crawl
$ poetry run python3 pukiwiki-crawler.py crawl
  • Create index
$ poetry run python3 pukiwiki-crawler.py add-index
  • Clients for dev
# show help
$ poetry run python3 dev-client.py -h

Mail (Paragate)

  • Setup
$ edit config/paragate.py
  • Crawl
$ poetry run python3 paragate-crawler.py crawl
  • Create index
$ poetry run python3 paragate-crawler.py add-index

Scrapbox

  • Setup
$ edit config/scrapbox.py

SCRAPBOX_CONNECT_SID はブラウザの開発者ツールから cookie[connect.sid] を取得してください。

  • Crawl
$ poetry run python3 scrapbox-crawler.py crawl
  • Create index
$ poetry run python3 scrapbox-crawler.py add-index

Tips

To access dev app in kubernetes...

$ kubectl port-forward service/{svc name} 9200:9200

Words

  • els => elastic search

License

See LICENSE for license and DOCKER_NOTICE for Docker image notices.