Unmanaged Files - Part 1

Tutorial

This is the first in a six-part series about using unmanaged files in the Drupal environment to your benefit.

If you've been working with Drupal for a while, you're likely familiar with "managed files." If so, feel free to jump ahead to "What is an *un*managed file?"

Managed files are files that Drupal tracks in its database. They most often find their way to Drupal via its UI, being uploaded during content creation or the submission of a form. They are typically media files, such as documents, spreadsheets, PDFs or images.

What Is a Managed File?
Data related to managed files are tracked by Drupal inside its database in the file_managed table. The structure of the table is:

    +----------+---------------------+
    | Field    | Type                |
    +----------+---------------------+
    | fid      | int(10) unsigned    |
    | uuid     | varchar(128)        |
    | langcode | varchar(12)         |
    | uid      | int(10) unsigned    |
    | filename | varchar(255)        |
    | uri      | varchar(255)        |
    | filemime | varchar(255)        |
    | filesize | bigint(20) unsigned |
    | status   | tinyint(4)          |
    | created  | int(11)             |
    | changed  | int(11)             |
    +----------+---------------------+

These fields provide all the meta information you would want available in order to select and process files based on various criteria.

As you can see, this metadata can be used for granular control of the file assets, such as selecting all files of the same mime type, language, etc. You might wonder how this ties into the use of media. For example, what if an image being used is defined as a media file rather than an image file. The media file is a superset of an image file...it is a managed file that is pointed to by the media wrapper, which accommodates reuse. For example, an image uploaded as a Media entity is still stored as a managed file. The Media entity is a wrapper that adds reusability and metadata, pointing back to the underlying managed file.

What Is an Unmanaged File
Simply put, an unmanaged file is a file that is not managed by Drupal, which has no record of its existence. Of course, no file stored on a server is completely unmanaged...the server's file system is aware of it, but for the purposes of a discussion in the context of Drupal, the file is not managed. 

About This Tutorial

With all the advantages that come from using a managed file, why would we ever want Drupal to ignore a file? 
This tutorial will present one answer by way of a use case that lays out why you might want to use unman aged files in certain circumstances and some ways to do that. 

Here is what you can expect to learn throughout this series:

Part 1 (this part) – the specification for an example use case, an abstracted architecture to fulfill it, and putting some scaffolding in place

Part 2 – the file handler that allows the random selection of an “in the wild” unmanaged file

The three following parts (3–5) show different ways to render the output from the handler:

Part 3 – a twig-friendly variable in a block preprocess function

Part 4 – a reusable block plugin via a custom module

Part 5 - a twig extension

Then in the final part, we create the file handling.

Part 6 – additional file-handling logic to implement the selection rule (no more than one image per region)

The Spec

We will ultimately create a block for a homepage, where, for our purposes, populating the block is the focus of the tutorial. Here are the specifications of the block:

  • Displays images of the map outline of three countries
  • The images will be selected randomly
  • The images will be stored segregated by regions, continental or oceanic
  • When making the random selection no more than one image may be from the same region

The Architecture

The detailed architecture will vary somewhat between the three versions of the solution that we will cover. What differs between the three approaches is how the output from the custom module ends up being rendered on the page. 

TUTORIAL PART

APPROACH

FILE HANDLING

THEME CODE

CONTAINER

TWIG?

3

Preprocess variable

Custom module

Preprocess hook

Custom block

Y

4

Block plugin

Custom module

n/a

Block plugin instance

Y

5

Twig extension

Custom module

n/a

Custom block

Y

 

The Files

The files are outline maps of nations. There are 228 of them. This is a good time to pause and reflect on the premise of this tutorial: sometimes unmanaged files are the best answer. If this is one of those times, then why?

To answer that, let’s consider how one would handle adding 228 images to Drupal, where they will then be managed files. We need to remember that in doing so we must also identify the category (continent, etc.) for the image. There are some options:

  • Create a content type and use the upload widget on the node form, then set the category using a text list field or a taxonomy term
  • Create a hierarchical taxonomy vocabulary and use the upload widget on the term form and an add-on field to select the category from another vocabulary
  • Create a media item for each image and use a contributed module that enables identifying a category folder for each image
  • Use a contributed module for bulk upload (though I don’t know that doing so would accommodate the category requirement)

The common denominator here is effort. Whether uploading these files individually or setting their category individually, or both, there is quite a lot of effort required. Now, if there is a requirement for the files to be part of node content, or even a benefit to be gained that justifies the effort, then so be it…go with your judgement. In our case, let’s revisit the information that would be available given a managed file and look at the value of it in the context of our specification.

❌ fid – the files will be selected randomly, so their ID’s aren’t needed if there is a way to select them without using it
uuid –  there is no need to ensure that the image has a unique identifier since it’s name will be unique
langcode - the site will not be multilingual, so there is no need to have versions of the image that present the country’s name in another language
filename  the filename is available regardless of whether the file is managed or not
uri – the path of the images, if unmanaged, will be static, all in a subfolder of the same parent, either hardcoded or persisted in a config page, so the uri is unimportant
filemime - all files are jpeg images, so the mime type is unnecessary information
filesize - the size of the file does not need to be known
status - the status would not be used since all of the images are permanent
created - there is no need to know on which date the file was created
changed - there is no need to know on which date the file was last changed

Clearly, the file information that becomes available when it is managed is unimportant for our purposes. Were any single item of information important for our needs, the calculus changes (and since that situation wouldn’t contribute to this tutorial, I wouldn’t have chosen it). Agreed? Good.

You might be wondering how the requirement will be met if using Drupal and files of which it is not aware. It’s a fair question that will be answered in part 5 of the tutorial. For now, let’s get the files into place on the server for use later.

Following is the tree on my local containing all the images:

├── segregated_maps

│   ├── africa

│   ├── antarctica

│   ├── asia

│   ├── australia

│   ├── caribbean

│   ├── central america

│   ├── europe

│   ├── mideast

│   ├── north america

│   ├── pacific islands

│   └── south america

 As there were with managed files, there are options for putting this structure in place on the server. Here are some:

  1. Use ssh to log into the server and create the folder structure, and then scp to move the files
  2. Use an application like Filezilla to sftp the folder structure and files
  3. Use rsync to move the files, creating the folder structure as it does

I’m going to use option 3 and have it compress the files when transferring them, which makes it very fast. I’ll place the segregated_maps folder under public://.

rsync -avz segregated_maps myuser@myserver:/path/to/target/

I’ll break the command down in case you’ve never used it before.

-a - recurses the folders below the one specified as the source, and preserves most settings, like permissions

-v – verbose, increasing the amount of information output from the command

-z – compresses the files while transferring

segregated_maps – the source folder or file. If a folder, providing it without a trailing slash causes the folder to be created and its contents transmitted, whereas a trailing slash would transmit the folder’s contents but not create the folder

myuser@myserver – the destination, in this case a ssh address (‘myserver’ is an alias…normally it would be an IP address or domain name)

: - marks the start of the destination path

/path/to/target – the path to the target destination relative to the login folder

The result of the command is that the tree shown above and all the image files contained in it now exist at the target path.

Next Up

In Part 2, we’ll start coding and build a functional prototype of the unmanaged files handler, which we’ll complete in Part 6.

Note: If you’d like to follow along with actual code, I’ll be publishing the example module and supporting notes in my public GitHub repo, with the link included in Part 2.

  • Drupal
  • Drupal Planet