mirror of
https://gitlab.dit.htwk-leipzig.de/fsr-im/tools/flatscraper.git
synced 2025-07-15 19:18:49 +02:00
77 lines
2.5 KiB
Markdown
77 lines
2.5 KiB
Markdown
# Flatscraper - A Simple Web Scraper for Flat Listings 🔍🏠
|
||
|
||
**Flatscraper** is a lightweight web scraper that extracts flat listings from a specified URL. It leverages the [BeautifulSoup](https://www.crummy.com/software/BeautifulSoup/) library to parse HTML and capture essential details about flats—**title**, **price**, and **location**.
|
||
|
||
## 🚀 Features
|
||
|
||
- 🏠 **Scrapes flat listings** from a specified URL
|
||
- 🔍 **Extracts key details:** title, price, and location
|
||
- 💾 **Saves data** to a CSV file
|
||
- ⚙️ **Supports command-line arguments** for customization
|
||
- 🛠️ **Easy to use and modify** for different websites
|
||
- 🧪 **Includes a simple test case** for demonstration
|
||
- 📚 **Utilizes Python’s built-in libraries** along with BeautifulSoup for HTML parsing
|
||
- 🔔 **Discord Webhook integration** for notifications
|
||
|
||
## 🏢 Housing Providers
|
||
|
||
- **LWB**
|
||
- **Lipsia**
|
||
- **BGL**
|
||
- **VLW**
|
||
- **Wogetra**
|
||
|
||
## 📦 Requirements
|
||
|
||
You can run the bot natively on your machine or use a Docker image. The requirements include:
|
||
|
||
- **Python 3.6 or higher**
|
||
- **BeautifulSoup4**
|
||
- **Requests**
|
||
- **Pandas**
|
||
- **Discord Webhook** (optional, for notifications)
|
||
- **Docker** (optional, for containerization)
|
||
|
||
## 🛠️ Installation
|
||
|
||
### 1. Environment Setup
|
||
|
||
Ensure that the `.env` file is configured correctly. An example is available in the `sample.env` file. Copy it to `.env` and fill in the required values.
|
||
The `SAP_SESSIONID` and `COOKIE_SESSSION` are obtained after performing a search on the LWB website. Use your browser's developer tools to locate them in local storage.
|
||
*Future versions will include automatic form processing to obtain a valid session ID.*
|
||
|
||
### 2. Python Environment
|
||
|
||
You can use a virtual environment to install the necessary packages:
|
||
|
||
```bash
|
||
# Create a virtual environment
|
||
python -m venv venv
|
||
|
||
# Activate the virtual environment
|
||
# On Windows:
|
||
venv\Scripts\activate
|
||
# On macOS and Linux:
|
||
source venv/bin/activate
|
||
|
||
# Install the required packages
|
||
pip install -r requirements.txt
|
||
```
|
||
|
||
### 3. Docker Environment
|
||
|
||
Alternatively, use the Docker image provided in the repository:
|
||
|
||
```bash
|
||
# Build the Docker image
|
||
docker build -t flatscraper .
|
||
|
||
# Run the Docker container
|
||
docker run -it --rm flatscraper
|
||
```
|
||
|
||
## 🎉 Have Fun and Happy Scraping!
|
||
|
||
Wishing you a great time and speedy flat searching with the bot. If you have any questions or suggestions, feel free to [open an issue on GitLab](https://gitlab.com). I'll respond as soon as possible.
|
||
|
||
*Happy scraping! 🚀* |