Blogs

Alfresco Cloud Cache (by Sander Bylemans & XeniT)

ECM systems like Alfresco have very particular usage patterns, in which only a small portion of (meta) data is consulted at any window of time. So they fit very well into smart caching techniques. As for content, pushing documents to the proximity of users decreases network latency considerably. Combining both capabilities yields a much faster Alfresco setup. When an application is connected to a network, the speed of the network is most often the slowest link in the execution chain of the application. At XeniT we have implemented a proxy between the business side and the application side.

Description

The basic idea behind the local proxy is to intercept the calls to the remote alfresco system and take care of these requests locally whenever possible. The user cannot distinguish the proxy from the real Alfresco system. The proxy runs locally and intercepts request for documents to Alfresco. When this document is not locally available, the proxy will forward the request to the Alfresco system. After receiving the document from the Alfresco system, the proxy will store this document locally if its size permits this. When a document is too large to be stored in the proxy, it will be forwarded to the user but not stored locally by the proxy. If the proxy is not able to store this document because other documents take up too much space, it will delete the least recently used documents until the document that is requested can fit in the proxy. When this document is locally available, the proxy will send it directly to user, without a request to the Alfresco system, which reduces the time spend on retrieving the document enormously.

This system is already deployed at Xenit as a testing instance on an Intel NUC, a device that fits nicely onto any disk for small or large companies. It acts as a cloud cache inside the corporate wall.

Visualization

To record the improvements and other characteristics of the proxy, it will upload these to ElasticSearch using Logstash. We can then use Kibana to visualize these properties. In Figure 1 you can see an example which shows whether the retrieval was locally available or needed to be fetched from the Alfresco system, the time needed to fetch the documents locally and not locally (Alfresco server side), which IP requested it and which documents have been requested the most, characterized by their nodeRef, along with their different content-URLs, if any, and whether they are available locally or not. When content can be retrieved locally from the cloud cache, it is sent in milliseconds to the requesting IP address. When shipped from our Alfresco server in Germany, it takes typically 200 ms or more for tiny files. Any characteristic can be added to visualize, the only requirement is that the properties needed to calculate these are available in the ElasticSearch or you can do the calculations before adding it to ElasticSearch.

 

Technical Details

The application is written in Node.js, which is a platform built on Javascript. Node.js is primarily asynchronous, this means it is non-blocking. When dealing with functions, one needs to wait for the result to be returned by a callback. After the callback, you can use the results returned by the method. This is not how normal programming languages work. Normal programming languages block until a result is received from the method they called. It is beneficial in this case because other operations that do not depend on the callback can be executed while waiting for it.

Add new comment

The content of this field is kept private and will not be shown publicly.