Writing the tranc module

We hit Content and interface translation don’t clearly separate. I set out to fix it for ourselves and then released it back. It’s possible the solution is not correct or not useful for anyone else, nonetheless some of the coding challenges worth talking about.

You need the tranc module at hand to make any sense of this post.

Of decorators and testing

It’d be tempting to write TrancTranslationManager extends TranslationManager and narrow down the class to change the langcode in getStringTranslation and be done with it. This would, however, introduce a strong coupling between our module and TranslationManager and a future core upgrade might just break it, perhaps in some subtle fashion. Instead we use the decorate pattern, implement the relevant interfaces with each method changing the langcode as necessary and delegating to the original string_translation service. Another advantage besides possible future bugs: the webprofiler replaces the class on the string_translation service — if it were decoraring, tranc and webprofiler could be run at the same time without a problem. Good thing we do not use the webprofiler. Someone should file a patch against it to decorate…

Yet another advantage of this decorator class is the closedness of it. We know every nook and cranny of it. We can reason about it. Even without a test, we can confidently say this is doing what it’s supposed to do. It is very easy to see the external dependencies: there is exactly one call to anything not the translationManager. Of course, bugs might still happen: maybe some of the arguments of the proxy calls have the wrong order, maybe we left out a return. On the other hand, the IDE would not let us leave out a return or introduce one where one is not needed. I would actually argue against writing a unit test against a class like this: it will be the expression of the same logic in a different format and just an unncessary maintanance headache. It will definitely not find bugs. In fact, the first version of this class had a very unexpected bug — one that neither a unit nor a kernel test would find!

The simplest test is to enable the module and visit a page. This blows up. W.T.F. As the doxygen notes LanguageRequestSubscriber class calls a public method on the TranslationManager class which is not on the interface. This happens to be a core bug so that’s great: we discovered a core bug which should be easy to be fixed. Adding methods to interfaces are not considered a BC break. This is the fundamental problem with many testing and indeed object oriented programming itself: you imagine a world and fit your test or class to it. But what happens when the world does not adhere to the mental model of a puny programmer?? Sucks to be you, that’s what happens.

Speaking of doxygen, that doxygen is absolutely necessary and useful. Putting phpcs enforced doxygen on protected $languageManager saying “The language manager”, however, is just clutter. Unless forced, don’t do this either.

Of Twig and documentation

Another part of the module is changing the default theme to print in the content language. I know enough of Twig that changing a template from code requires a visitor but it’s been a very, very long time since I wrote one. So before I wrote a single line of code, I read https://twig.symfony.com/doc/2.x/api.html and https://twig.symfony.com/doc/2.x/internals.html Well, I only read the Basics section and then came Rendering and I stopped there because it didn’t look relevant. The internals page looks much more relevant and it’s short enough. I also explored the core Twig integration: TwigEnvironment, TwigExtension (only down to getName the rest is very clearly not relevant to us, it’s implementations of various Drupal specific Twig functionanlity) and TwigNodeVisitor. TwigNodeVisitor makes us very happy because it changes a filter to another which is exactly what we need to change the t filter to tc. But how will we know we are in the default theme? Well, on the Drupal half we can fish out the default theme from somewhere and on the Twig half, I dunno, surely a Twig node carries its filename. Well, Node::getFilename has this most helpful message:

@trigger_error('The '.__METHOD__.' method is deprecated since version 1.27 and will be removed in 2.0. Use getTemplateName() instead.', E_USER_DEPRECATED);

This really is very helpful because I would have never guessed getTemplateName is the filename! It is certainly not documented anywhere I can find. Once you have it, of course it’s easy to verify, for example Compiler:

$this->filename = $node->getTemplateName();

As for finding the default theme, I Googled drupal 8 get default theme, the first non-drupal StackExchange answer is ThemeHandler::getDefault. This returns a string but there’s also a getTheme method on the theme handler, it returns an Extension object which has the getPath method we need. So that’s a done. (While none of the Drupal SE answers are a direct answer, this answer can be used to deduce the correct method despite it is only mentioning the deprecated setDefault method — surely there’s a getDefault).

It’s worth implementing the visitor this far.

For the trans tag, I decided I wanted to change the langcode in its options as that seeemed much easier than introducing a transc tag. First, I wanted to write a little exploratory script to see what {% trans %} parses into. If you look at the internals page it shows how to get to the nodes. The whole page has three lines of code, let’s try to make them work. The first line of code uses three variables: $twig, $source, $identifier. The explanation mentions $twig is an environment and while it’s not crosslinked, the API page mentions environments and also our core read tells us that Drupal::service('twig') returns just that. That was our first variable, the second is $source is just the Twig template we want to parse. Now what’s $identifier? Mystery! Neither the API nor this page ever mentions it. I left it empty, and the tokenizer and the parser ran fine but the compiler have complained it can’t find the template. Ah ha! Where did we read about defining templates on the fly? Right, we just read the core TwigEnvironment class which in renderInline reminded us Drupal has inline templates. I have tried putting {# inline_template_start #} in front of my little Twig template, that didn’t work. I searched the Drupal codebase for this curious string and there are not many results, StringLoader::exists looks interesting and highly relevant: it looks at the template name and if it starts with this string, it declares it exists. How do we set the template name…? Well our chain started with Source, peeking into the Twig Source class confirms our suspicions: what the internals page calls $identifier is just the template name (which above already turned out to be the filename normally… what a mess). So:

$twig = \Drupal::service('twig');
$string = '{% trans %}x{% endtrans %}';
$stream = $twig->tokenize(new \Twig\Source($string, '{# inline_template_start #}'));
$nodes = $twig->parse($stream);
$twig->compile($nodes);

drush scr test.php works. We can print $nodes to see the nodes and we can print the compiled code to see. Phew! We can go bolder and do the same for:

$string = "{% trans with {'context': 'foo', 'langcode': 'bar'} %}x{% endtrans %}";

And print $nodes now tells us everything we needed: the node is of class TwigNodeTrans, the options are an ArrayExpression, the strings are wrapped in ConstantExpression, our visitor pretty much writes itself from this point.

Now we want to test this… If you followed the aforementioned renderInline call chain you would have seen

$loader = new ChainLoader([
    new ArrayLoader([$name => $template]),
    $current = $this->getLoader(),
]);

which tells us the way this template gets registered is via new ArrayLoader([$name => $template]). We learned on the API page that an environment needs a loader and we have one. So, using this info, the TrancNodeVisitorTest::testTrancNodeVisitor method almost writes itself, it’s just a little bit more than the exploratory script above. It needs the core twig extension so that the trans tag can get registered and the tranc twig extension as well but since those doesn’t depend on the actual test case, they are created in setUp. Making a core extension is stolen from the core TwigExtensionTest, just modernized slightly. Our extension needs a theme handler mock, not too hard either.

We can summarize our journey by saying Twig is extremely powerful and even worse documented than Drupal. The source code, however, is very well structured, the classes are small and almost all method names are self explanatory. Once you know how to get to the Twig nodes (which now you do! especially the test case is very generic), simply printing them out tells you everything. Who needs documentation when you have such wonderful debug features? Imagine if printing a Drupal content entity similarly printed the name of the fields, the field item list classes, the field item classes, the properties and their values. Sci-fi. On the other hand, I love spending each weekend on some interesting project. Hmmmmm…

June 21, 2020

Understanding Drupal cache contexts via history and code

The Drupal cache system is just a key-value store. Say, your key is “left sidebar”, the value is the HTML of the left sidebar. Very simple. But if the left sidebar contained the login block, then you have a different HTML string for anonymous and authenticad users. So now you have a cache ID “left sidebar for anonymous users” pointing to one piece of HTML and “left sidebar for logged in users” pointing to another. Of course Drupal is multilingual, so you will have “left sidebar for anonymous users in English” and “left sidebar for logged in users in French” and so on. Maybe you had customizable sidebars per users so now you had a different cache entry per users. Quite obviously many cached pieced of content changes on every page. And this was what Drupal 7 offered: the cache ID had parts set per the caller (say, “block” and “left_sidebar”) and then drupal_render_cid_parts added parts creating different cache IDs per role, per user, per language, per page. So’d have say “block:left_sidebar:bartik:en:fr:u1234”.

In Drupal 8, there is a much bigger flexibility. Here’s an actual cache id (truncated):

entity_view:block:alexandria_breadcrumbs:[languages:language_content]=de:[languages:language_interface]=de:[languages:language_url]=de:[route]=entity.node.canonical65a0811

The first few parts are just the same: we have a block. But then we see a very big difference: the [languages:language_content]=de part has an identifier and a value. This is a big advantage compared to the previous system where you’d only have de and basically hoped noone will manually introduce such a part causing massive confusion. And this is not all hardwired. There’s a service called cache_context.languages tagged with cache.context which implements the CacheContextInterface and the getContext() method will return the language depending on the type — all three languages present in the cache ID are calculated per the same method. Finally we see a route context. As you can guess, there’s a cache_context.route service, again tagged with cache.context and the getContext method returns the hashed route parameters appended to the route name. So if you are on a different page, the system will end up with a different cache id and so the cached content will vary per page.

Say, you have a block which is different per node type. It would be much more beneficial to solve this problem a bit more generic — let’s write a cache context which allows different blocks per the value of a field. The getContext() method is nothing more than just retrieving the entity from the route match and then converting the value of a field to a string:

public function getContext($entity_type = NULL, $field_name = NULL) {
  $entity = $this->routeMatch->get($entity_type);  
  if ($entity instanceof FieldableEntityInterface && $entity->hasField($field_name)) {
    return hash('sha256', serialize($entity->get($field_name)->getValue()));
  }
  return '';
}

this could be used as the entity:node:type cache context, for example.

June 5, 2020

Let’s learn recursive CTE SQL via paragraphs

We are going through a partial relaunch and it came up twice to find the nodes which have a certain paragraph somewhere. The first one was just a request to return a list of nodes, the second however required writing an update function to do something with the node. I implemented the first using MySQL 8.0 (MariaDB 10.2 works too) installed locally but the second one required Drupal code written for a lower database version. This provides us with the opportunity to share code with you that does the same in PHP and then in a recursive CTE. Without further ado:

<?php
$db = Drupal::database();
$ids = $db->query('SELECT entity_id FROM {paragraph__field_lc_video_ref}')->fetchCol();
$all_ids = $ids;
do {
  $parents = $db->query('SELECT parent_id FROM {paragraphs_item_field_data} WHERE id IN (:ids[]) AND parent_type = :paragraph', [':ids[]' => $ids, ':paragraph' => 'paragraph'])->fetchCol();
  $all_ids = array_merge($all_ids, $parents);
  $ids = $parents;
} while ($parents);
$nids = $db->query('
  SELECT DISTINCT parent_id 
  FROM {paragraphs_item_field_data} 
  INNER JOIN {node_field_data} n ON parent_id = nid AND n.status = 1 
  WHERE id IN (:ids[]) AND parent_type = :node', [':ids[]' => $all_ids, ':node' => 'node'])->fetchCol();

And then the CTE:

CREATE TEMPORARY TABLE x 
SELECT entity_id FROM paragraph__field_article_section_body 
UNION
SELECT entity_id FROM paragraph__field_article_section_title;
WITH RECURSIVE r AS (
  SELECT id, parent_id, parent_type
  FROM paragraphs_item_field_data
  WHERE id IN (SELECT entity_id FROM x)
  UNION ALL
  SELECT r.id, parent.parent_id, parent.parent_type
  FROM r
  INNER JOIN paragraphs_item_field_data parent ON parent.id = r.parent_id
  WHERE r.parent_type = 'paragraph'
)
SELECT DISTINCT nid
FROM r
INNER JOIN node_field_data n on nid = r.parent_id

May 15, 2020

← Newer Page 2 of 2