Post-Processing and Preview Generation of Payload#
After a client updates the payload of a file, it undergoes post-processing. Currently, this includes the following:
- Remove old thumbnails if any exist
- Scan the payload to determine the mime-type
- Attempt to create a BlurHash value
- Attempt to calculate an aspect ratio value
- Attempt to create thumbnail images
Post-processing is triggered after the payload has been uploaded successfully by the client. Spaces considers the following scenarios a successful upload:
- The upload ended successfully and size of payload matches
intendedSize
of metadata. - Deprecated: The upload ended successfully and
intendedSize == 0
. This logic exists to support legacy clients which do not useintendedSize
.
Post-processing of payload is executed asynchronously in the background. Its completion is indicated by a new FILE_UPDATED change event, which is announced as a SPACE_ACTIVITY
event over the event feed.
In addition to post-processing, Spaces Backend supports on-demand file-preview generation that returns a JPEG from almost any type of file. See Preview generation.
Mime-type detection#
While the client provides a mime-type during metadata creation, this is verified after the payload has been updated. The database is then updated with the calculated mime-type.
Mime-type detection is based on Apache Tika.
Blurhash#
Once the mime-type is detected, the post-processing will attempt to find a Previewer
-service that promises support for the specific mime-type. If one is found, the Previewer
-service will attempt to extract a scaled-down image of the payload. Depending on the type of payload, this may include
- Extract a frame from a video,
- Create an image of the first page of a PDF,
- Render plain text into an image,
- Decode and scale down an image file, etc.
If an image can be extracted, a blurhash-value will be calculated and persisted in the metadata of the object.
If no previewer is available, or an image could not be extracted, the blurhash is null
.
Blurhashes can be used to generate pseudo-images prior to loading an actual image. The pseudo-images represent a strongly blurred version of the actual image.
Aspect ratio#
The aspect ratio of an image can be used to scale up or down the preview of a selected object (image, text document, etc). In the context, it is represented by width divided by height and is saved as a decimal number for maximum precision.
Thumbnail generation#
If a scaled down image was extracted from the payload, it will then be scaled down much further into the following sizes:
- 200x200 pixels
- 100x100 pixels
- 50x50 pixels
The thumbnails are encoded into JPEG and persisted into the database. These thumbnails can be retrieved via the API.
If no previewer is available, or an image could not be extracted, no thumbnails exist.
Preview generation#
Generating on-demand previews for files depends on an external service that we call the preview service. The preview service is a tiny Python HTTP webservice that is designed to solely serve preview requests and offload such functionality off of Spaces Backend. It uses operating system tools to generate previews for different types of files. This is an important requirement in terms of performance during preview generation.
The preview service supports a large range of different mimetypes and has features like creating preview for specific pages when it comes to multi-page document files and specifying the maximum dimensions for the generated preview. This is useful for clients that want to create previews depending on certain device dimensions.
NOTE: Please keep in mind that the service maintains the aspect ratio of the created preview images.
Limitations#
- Fetching a thumbnail that does not exist results in a
HTTP 404
. It is not possible to determine from that if post-processing is not yet finished, or if the payload is not supported and therefore no thumbnail will ever exist. - Generating a preview for a specific frame in a video is currently not supported. Previews for videos are always an image of the first frame.
- Generating a preview for very large files could time out due to the fact that preview generation takes place in a different service. Future improvements could allow the preview generation to be event-based instead of blocking or saving a very big thumbnail during the post-processing step and down-scaling to the requested dimensions later.