scraper module¶

scraper.add_repository(name=None, url=None, print_result=True)¶

Stores the metadata of a repository into the MongoDB database.

Parameters

name – (opt) The screen name of the repository.
url – (opt) The URL to the repository.
print_result – (opt) Boolean if the result of the add operation shall be printed or not.

Returns

The new repo UUID.

scraper.delete_repository(uuid)¶

Deletes a repository.

scraper.download_files(uuid)¶

Downloads every single file from a given repository.

scraper.download_repository(uuid=None)¶

Starts downloading data of a specific repository.

Parameters: uuid – The UUID of the repository. If no UUID is given, you have to choose from the available ones.
Returns: The time that the process took in seconds (float).

scraper.get_package_count(url)¶

Gets the number of packages available in a CKAN repository.

scraper.scrape_chunk(repository_uuid, rows, start)¶

Fetches a whole chunk of documents.

Parameters

Returns

The fetched links, as a list.

scraper.scrape_urls(repository_uuid)¶

Fetches a repository.