scraper module¶
-
scraper.add_repository(name=None, url=None, print_result=True)¶ Stores the metadata of a repository into the MongoDB database.
- Parameters
name – (opt) The screen name of the repository.
url – (opt) The URL to the repository.
print_result – (opt) Boolean if the result of the add operation shall be printed or not.
- Returns
The new repo UUID.
-
scraper.delete_repository(uuid)¶ Deletes a repository.
- Parameters
uuid – The UUID of the repository to be deleted.
- Returns
Whether or not the deletion was successful (boolean).
-
scraper.download_files(uuid)¶ Downloads every single file from a given repository.
- Parameters
uuid – The UUID of the repository to download files from.
- Returns
The duration in seconds (float).
-
scraper.download_repository(uuid=None)¶ Starts downloading data of a specific repository.
- Parameters
uuid – The UUID of the repository. If no UUID is given, you have to choose from the available ones.
- Returns
The time that the process took in seconds (float).
-
scraper.get_package_count(url)¶ Gets the number of packages available in a CKAN repository.
- Parameters
url – The url to the CKAN repository.
- Returns
The number of packages.
-
scraper.scrape_chunk(repository_uuid, rows, start)¶ Fetches a whole chunk of documents.
- Parameters
url – The url to the CKAN repository.
rows – The chunk size.
start – The offset.
- Returns
The fetched links, as a list.
-
scraper.scrape_urls(repository_uuid)¶ Fetches a repository.
- Parameters
uuid – The UUID of the repository to scrape.
- Returns
The number of downloadable resources.