Creating (and editing) a Job

In order to move documents from a filesystem into an Alfresco repository it is necessary to create a job. The user can create a new job by clicking on the "Create new job" link, displayed on the dashboard (homepage). This presents a form in which the user must specify a number of parameters relevant to the job, and is divided into the following categories:

  • General
  • Source-Sink
  • Schedule
  • Customizable options
  • Additional options

These categories will now be discussed in further detail.

General parameters

In this section the user must specify the name and description of the job. These fields are mandatory, hence the form will not submit if one of the fields is empty. Moreover, the user must ensure that there is no already existing job with the same name. Note that the field, Name, is not case-sensitive, implying that you cannot have a job "move" and a job "Move".

Source-Sink parameters

In the source-sink section the user must define the file-source and the file-sink. The file-source, or "import from", block has two fields - path and extension. Path is the absolute path of the folder from which documents need to be moved. Path is a mandatory parameter and must, therefore, contain a value. Once the value is entered, click "Ok" for the value to be registered. If the path value is incorrect this can be removed by clicking the "remove" button next to the value. A new valuecan then be entered by pressing "Add input path." Then, an extension can be specified, which determines which type of files are read. For example, if a user wants to load only pdf's, he/she can enter ".pdf" or simply "pdf" in the extension field. 

The file-sink, or "destination", block consists of a repository destination and a path within that repository. If a destination already exists, the user can select one by clicking the radio-button adjacent to the destination. If no destinations exist, one can be created by clicking the create new destination button (See next subsection). The destination path denotes the root folder that the files will be written to; this path is always a subfolder of the Company Home folder. 

Create new destination

If there are no destinations, the user must create one by pressing the create new destination button. There are 6 elements to be specified:

  1. Name
  2. Type
  3. URL
  4. Username
  5. Password
  6. Number of threads

A name for the destination is mandatory. Furthermore, it may not be the same as that of an existing destination, and, just like the job-name, it is not case-sensitive. So far there is only one type of destination, which is Alfresco. More types of destination may be added in the future. The URL field must contain the url of the destination. For example, if this is a localhost running on port 8080 it would be: "http://localhost:8080/alfresco". Username and password are simply the username and password within Alfresco. Lastly, the number of threads denotes how many threads are used to upload the documents. The default is 5 threads, which means that, in principle, documents can be uploaded 5 at a time. 

Schedule parameters

A job requires a schedule so that it can run at set times, reducing the need for manual initiation. Jobs can be set to run

For the single run a date and time have to be entered. This is facilitated by displaying a calendar when clicking on the date field, and a dropdown of set time-values when clicking on the time field. More specific times may also be entered manually. To make a job run hourly the user must choose on what minute of the hour this job run. Similarly, the time must be selected when running the job daily. To run the job weekly a day of the week must be selected and a time must be chosen. To create an advanced schedule prior knowledge of cron-jobs is required.  This will be briefly discussed in the next subsection.

When a desired schedule has been assembled, press "Ok" to confirm it. To then add another schedule, click "Add schedule."

Advanced Scheduling

Move2alf uses the Quartz scheduler, which requires jobs of the format

secondsminutes | hours | day of month | month | day of week | year 

For example, to run a job every day at 12:30 in the month july, the following can be entered:

0 30 12 * 7 ? *

The "*" sign indicates that it runs for every value of this field, in this case every day of the month and every year. The question mark must be entered either for day of month or day of week, the scheduler cannot handle both values. Lastly the year field is optional. Taking the last two points into account, the schedule can also be written as:

0 30 12 ? 7 *
For more information on this option, please consult the quartz-scheduler website: 
http://www.quartz-scheduler.org.

Customizable options

There are a few options that allow, to an extent, user customization:

The Execute command before and after job essentially have the same function, however the one executes before the job starts and the other when the job finishes. Any command-line commands can be entered and will be executed at the respective times. This can be useful, for example, when processing files before they are uploaded, and/or cleaning up files when the uploads have finished. A convenient way for doing this would be to call batch-files (Windows) or bash-scripts (Unix) that perform these tasks. Test these commands on the command-line first, before entering these in the job-creation field to avoid problems with move2alf at runtime.

The metadata handlers are custom-written Java classes used for giving the correct set of metadata values to an uploaded document of a particular content-type. Usually this is done by parsing a file that contains the document name and metadata. The default metadata handler is the "Filesystem metata - Read metadata from filesystem" parser, which uses Alfresco's standard metadata extractor. The custom handlers allow the user to specify additional variables with values that are incorporated in the Java parsers. The reason why this is not hardcoded in the parsers is to reduce the amount of parsers, and increase te flexibility of these parsers. For example, if two jobs are exactly the same, but, are of different departments, the parser may require an additional parameter specifying which job applies to which department.

Similarly, the transformation handlers are custom Java classes used for converting documents from one type to another. The standard transform option is "No transformation", however a built-in option is the "Convert tiff files to PDF" option. Also for these custom classes additional parameters may be given to the Java transform class. 

Additional options

There are a number of additional options related to 

Existing documents

Move2alf can handle existing documents in 5 different ways:

  1. Leave the existing document, but give an error in the logs
  2. Leave the existing document and do not log anything
  3. Overwrite the document
  4. Delete the existing document
  5. Do not upload the document, but simply list the presence of the document in the log if it exists

If documents must be unique and there should be no way that 2 documents of the same name can appear, the first option should be chosen. In this way if an error occurs there must be a problem. Option 2 can be selected if, for some reason, a document is not meant to be updated. If a document may be updated with a later version option 3 should be chosen. Option 4 can be selected to easily remove the documents from a recently loaded test-batch. Finally, option 5 can be selected to check if a certain batch has already been loaded.

Move documents

There are 3 scenarios in which documents can be moved to a different folder:

  1. Move all documents to a specified directory before loading into Alfresco begins
  2. Move all successfully loaded files to a specified directory
  3. Move all files that failed to load to a specified directory

The first option can be useful to separate "to be processed" files from "already processed" files. It also prevents re-uploading of existing files if old files aren't removed from the input folder. Option 2 and 3 can be used to separate processed files from failed files to determine which files have a problem loading and must be adjusted and/or reloaded. For each option a path must be entered where the documents are to be moved to.

Notification e-mails

There are two options for sending notification e-mails:

  1. Send an e-mail for each error that occurs
  2. Send one e-mail at the end of the cycle indicating that errors occured in its duration

Option one can be used to determine what exactly the errors were without having to manually scan the log-file. Option two is useful if a user simply wants to know if an error occurred or not: e-mail means error, no e-mail means no error. The user then knows if action must be undertaken, or not.