-
Notifications
You must be signed in to change notification settings - Fork 2
Home
This wiki describes how to run the link checker component.
The LOM_linkchecker, as part of the ODS project, checks the URL exists in the technical.location element of the IEEE LOM metadata and stores the results in a mySQL database. The program reads a folder contains LOM metadata in XML format and automatically moves those files which include invalid URLs to another folder (let's say broken folder). The file, in our case, is invalid if the HTTP RESPONSE of the URL is not 200. If the file contains an invalid IEEE LOM metadata (e.g., does not include technical.location element) is also moved to the broken folder. We name the file as 'ill-formed'.
1- The program is a JAR file and the java JDK should be installed in your machine. JAVA_HOME has to be set in your system as well.
2- MySQL (any version) should be also installed in your system
Step 1: restore the backup exists here (linkchecker.sql) in MySQL. After restoring, you should have:
- A database entitled linkchecker
- A table entitled log
This database includes a table with four fields: logID, URL, FileName, TimeStamp
logID: is identifier of the logs URL: the URL of the metadata ( it returns "Error" when the URL doesn't exist) FileName: full path the checked file TimeStamp: Data and time of checking
Step 2: The jar file of the linkchecker is available at here. Run the link checker with the following parameters in order:
a- metadata folder (e.g., "C:\lomfiles")
b- username ( the user of MySQL DB that has access to the linkchecker DB e.g., root)
c- password ( the password of MySQL user)
d- broken Folder (the folder that broken files take place e.g., "C:\brokenfiles")
This is an example of running the jar file:
For windows:
lom_linkchecker C:\lomfiles root password C:\brokenfiles
For Linux:
lom_linkchecker /usr/lomfiles root password /usr/lomfiles/brokenfiles
Results in the database:
When result is 200: the URL is live
When result is 403 or 404: the URL is broken
When result is 500: the URL is unreachable due to the server error
When result is -2: the URL is not a http request (maybe it is a ftp or https request)
When result is -3: the timeout error
When result is -4: the connection was refused by the server
When result is -5: unrecognized error