helpers.classes package

Submodules

helpers.classes.arrayproperties module

class helpers.classes.arrayproperties.ArrayProperties

Bases: object

add_object()
add_simple()
analysis()
end_of_object()

needed to count only the elements seen at the very level of the array

get_amount_empty()
get_amount_mixed()
get_amount_nested()
get_amount_object()
get_amount_only_array()
get_amount_simple()
get_results()
is_in_array()
nest()
un_nest()

helpers.classes.config_keys module

class helpers.classes.config_keys.ConfigKeys(value)

Bases: enum.Enum

An enumeration.

COLLECTION_NAME = 'collection_name'
DELAY_SECONDS = 'delay_seconds'
MASTER_ADDRESS = 'master_address'
MASTER_PORT = 'master_port'
MAX_FILESIZE = 'max_filesize'
MAX_THREADS = 'max_threads'
SKIP_RESOURCES = 'skip_resources'
STORAGE_DIRECTORY = 'storage_directory'
TIMEOUT = 'timeout'
UPDATE_NODE_INTERVAL = 'update_node_interval'

helpers.classes.errors module

class helpers.classes.errors.AnalysisErrors(value)

Bases: helpers.classes.errors.Errors

Errors that might occur during the data analysis.

UNKNOWN = 'Unknown error'
class helpers.classes.errors.Errors(value)

Bases: enum.Enum

This is just a pseudo-abstract class.

class helpers.classes.errors.ScrapingErrors(value)

Bases: helpers.classes.errors.Errors

Errors that might occur while scraping or downloading a repository.

CKAN = 'CKAN error'
CONNECTION = 'Connection error'
CONTENTTYPE = 'Content-Type'
FILETOOLARGE = 'File too large'
MIMETYPE = 'MimeType'
PARSING = 'Parsing after download'
PROTOCOL = 'Unsupported protocol'
SSL = 'SSL'
STATUSCODE = 'Bad status code'
TIMEOUT = 'Timeout'
UNKNOWN = 'Unknown error'

helpers.classes.exceptions module

exception helpers.classes.exceptions.InvalidConfigurationError

Bases: Exception

helpers.classes.filestatus module

class helpers.classes.filestatus.FileStatus(value)

Bases: enum.Enum

An enumeration.

FINISHED = 'finished'
NULL = 'null'
WIP = 'wip'

helpers.classes.job module

class helpers.classes.job.AnalysisJob(descriptor, pid=None)

Bases: helpers.classes.job.Job

class helpers.classes.job.DownloadJob(descriptor, pid=None)

Bases: helpers.classes.job.Job

class helpers.classes.job.Job(descriptor, pid=None)

Bases: object

get_descriptor()
get_timestamp()
passed_time()
class helpers.classes.job.JobType(value)

Bases: enum.Enum

An enumeration.

ANALYSIS = 'analysis'
DOWNLOAD = 'download'

helpers.classes.node module

class helpers.classes.node.Node(address, port, uuid='39079a69-32b8-42e8-9d5b-7b0051507d42', enabled=True, storage_directory=None)

Bases: object

add_file(file)

Sets the UUID of the file the node currently works on.

get_address()

Returns the address of the node.

get_connection_tuple()

Returns a tuple containing the address and the port.

get_files()

Returns the UUID of the file the node currently works on.

get_port()

Returns the port number of the node.

get_semaphore()

Returns the semaphore value of the node.

get_storage_directory()
get_uuid()

Returns the UUID of the node.

is_enabled()
knows_uuid()
register_sent_uuid()
remove_file(file)

Removes a file from the file list.

Returns

True if everything went fine, else False.

toggle()

helpers.classes.nodetype module

class helpers.classes.nodetype.NodeType(value)

Bases: enum.IntEnum

An enumeration.

ARRAY = 5
ARRAY_ITEM = 11
BOOLEAN = 4
EMPTY_STRING = 12
INTEGER = 2
NON_EMPTY_STRING = 13
NULL = 8
NUMBER = 3
OBJECT = 6
PROPERTY = 10
STRING = 1
UNNAMED_OBJECT = 7
VOID = 9

helpers.classes.repositorystatus module

class helpers.classes.repositorystatus.RepositoryStatus(value)

Bases: enum.Enum

An enumeration.

ADDED = 'added'
ANALYZED = 'analyzed'
ANALYZING = 'analyzing'
DOWNLOADED = 'downloaded'
DOWNLOADED_IR = 'downloaded_ir'
DOWNLOADING = 'downloading'
SCRAPED = 'scraped'
SCRAPING = 'scraping'

helpers.classes.statisticsbuilder module

class helpers.classes.statisticsbuilder.StatisticsBuilder

Bases: object

class MultResProperties

Bases: object

Contains the variables used to count the occurences of multiplicity keywords in schemas. For reasons explained in the paper, these variables have no meaning for JSON documents.

compact()

Returns tuple ready to use elsewhere in the scripts.

get_allof()
get_anyof()
get_oneof()
NODE_TYPES = [<NodeType.STRING: 1>, <NodeType.NUMBER: 3>, <NodeType.INTEGER: 2>, <NodeType.BOOLEAN: 4>, <NodeType.ARRAY: 5>, <NodeType.OBJECT: 6>, <NodeType.NULL: 8>]
class RequiredProperties

Bases: object

Contains the variables used to count the number of required properties within a document.

compact()

Returns tuple reay to use elsewhere in the sripts.

class TypeCountProperties

Bases: object

Contains the variables used to count the occurences of types.

compact()

Returns tuple ready to use elsewhere in the scripts.

get_amount_of_abusive_booleans()
get_amount_of_abusive_numbers()
get_amount_of_arrays()
get_amount_of_booleans()
get_amount_of_empty_strings()
get_amount_of_integers()
get_amount_of_non_empty_strings()
get_amount_of_nulls()
get_amount_of_numbers()
get_amount_of_objects()
get_amount_of_strings()
get_amounts()
full_analysis(custom_objects)
req_analyze_co(root_custom_objects)

Computes the stats related to req/opt property characteristics.

Module contents