################################ Using the Command-Line Interface ################################ To use the jHound web app, simply execute ``make start`` in the jHound directory. The Flask-based web app will then launch locally and listen on port 5000. Alternatively, you can execute the ``run_webapp.py`` script: .. code-block:: python3 run_webapp.py runserver --host= --port= The jHound project consists of 3 phases: scraping, downloading and analyzing data. Scrape data =========== Every CKAN repository consists of a collection of files. "Scraping" means to retrieve a list of links to all these files. To scrape a repository, you have to add it to the database first. You can simply use the web app for that or, if you prefer a classical experience, the jHound CLI (``python3 .`` within the jhound directory): .. code-block:: Welcome to the jHound shell! Type ? to list all commands. jhound> add Enter a name for the repository: Open Data Rostock Enter the URL for the repository: opendata-hro.de > OK, added opendata-hro.de as Open Data Rostock. UUID: 16da5cd0-a416-42ec-8524-481955e5d5cc The information, as well as a UUID, will be stored in the database. The UUID can be seen as an alias. To scrape the repository, simply use the web app again (by clicking the "Scrape" button in the "Data Sources" overview) or execute: `python3 scraper.py scrape ` If you don't provide a UUID, a list of available repositories will be displayed, and you can choose between them. The repository will then be "scraped". Download data ============= After scraping a repository, you can download its files. This can either be done via the web app as well, or by using the ``scraper.py`` script: ``python3 scraper.py download `` Again, if you don't provide a UUID, a list of repositores to select from will be shown. Depending on the size of the repository, the download process will probably take some time. Consider setting ``MAX_DOWNLOAD_LINKS`` to a small(er) value (see above) for testing purposes. (It is planned to parallelize the download process in the future.) (done) Analyze data ============ Due to a bug in the current release, it is not possible to analyze data locally, therefore the web app can't be used for that as well. In order to analyze the data, you have to start a master and at least one node server. The reason for that is that jHound is supposed to be fully parallelized in the future (hence the "Large-Scale" name), starting with its main feature, the data analysis. So, if you want to analyze a downloaded repository, you have to... 1) Start at least one node server: ``python3 node_server.py -p `` (If you don't provide a port, you will be prompted to enter one) 2) Add the information for the nodes to your config file (see above). 3) Start the master server: ``python3 master_server.py -r [-q]`` with ``REPOSITORY_UUID`` being the UUID of an existing repository. If you also provide ``-q`` or ``--quiet``, you will see less output. At the moment, it is recommended that everything runs on the same machine because the nodes are looking for the `STORAGE_DIRECTORY` of your configuration. It is still possible to distribute the node servers if you use a mount directory, though. This will also be improved in the future. If everything is set up correctly, the analysis will start and the results will be stored in the database. Show results ============ After a repository has been analyzed, you can take a look at the results by using the web app. In the "Data Sources" overview, you should see a green button saying "Show results". Simply click on that button to view the results.