dealroom.co parsing

This program does the following:

Imports the necessary modules: requests to send HTTP requests, urljoin from urllib.parse to join URLs, BeautifulSoup to parse HTML, time to delay execution, csv to write data to a CSV file, os to work with the operating system, and dotenv to load settings from an .env file.
Loads settings from the .env file using the load_dotenv() function.
Sets the value of the last_page variable to 91. This is the number of pages to be processed.
Sets the base URL and creates an empty firms_data list to store firm data.
Gets proxy server credentials from environment variables.
Specifies proxy server settings, including host, port, and credentials.
Creates a requests.Session() session, sets the session parameters, including the proxy server and disables SSL certificate verification.
Disable warnings about insecure requests with requests.packages.urllib3.disable_warnings().
Creates a global variable counter to keep track of the number of firms processed.
Starts a loop that will iterate through the pages until the last_page variable is reached.
Generates the URL of the current page and sends a GET request to get the page's HTML using requests.get().
Delays execution by 5 seconds so that the page has time to load.
Uses BeautifulSoup to parse HTML code and extract company links.
Creates a loop that will iterate over references to firms.
Loads the company page using session.get() and delays execution by 5 seconds to load the page.
Uses BeautifulSoup to extract business data such as name, description, website, and LinkedIn link.
Adds company data to the firms_data list.
Displays progress information.
Increases the value of the counter counter.
Checks the exit condition of the loop.
Opens dealroom_data.csv file to record results in CSV format.
Creates a csv.DictWriter object to write data to a CSV file.
Writes the header (column names) using the writer.writeheader() method.
Writes each firm's data to a CSV file using a for loop.
Closes the file.

Name		Name	Last commit message	Last commit date
Latest commit History 18 Commits
.vscode		.vscode
src		src
.gitignore		.gitignore
1.json		1.json
2.json		2.json
2.txt		2.txt
README.md		README.md
package-lock.json		package-lock.json
package.json		package.json
package.json.ts_old		package.json.ts_old
undetected-chromedriver.txt		undetected-chromedriver.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

dealroom.co parsing

About

Releases

Packages

Languages

Vladimir0657305/dealroom_co_parsing

Folders and files

Latest commit

History

Repository files navigation

dealroom.co parsing

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages