Free Software, Not Free Support: My Reply Template

Over the last weeks I have seen a few people tweet or write about the burden of open-source maintainership or being a public person in a programming community.

The topic of open source maintenance responsibilites has become bigger over the last years and having previously burned out on OSS work myself, I find it important that the stress of demands on people contributing their free time is something that needs to be talked about.

As an open source maintainer myself I get my fair share of e-mails and Twitter direct messages asking for support with specific problems.

For many years I usually reply along the same lines that I provide only free software and not free support. Maybe this is helpful to others and an encouragement to stand your ground:

Thank you for your message, unfortunately my limited time doesn’t allow me to help with support requests for my open-source projects, especially when they are done in private and my answer doesn’t benefit others as well.

Please ask your question in a public forum such as Stackoverflow, Doctrine or Symfony Slack or Mailinglists where community members can help you out and both question and answer will help others.

This is intentionally formulated to avoid talking the questioner down for asking an open-source maintainer for help in private, because they might not be aware of my time constraints or about open-source and support. That is ok, no one can be all knowing.

Instead my main argument is to move the support to a public forum, so that the community of free users of my software can help each other and provide something “free” in return.

Additionally this “nice and defensive” phrasing is robust for almost any kind of message and I rarely need to adjust it after copying it into an e-mail or a Twitter DM.

When to use empty in PHP? I’d say never

I realized that I am very strict about the use of PHPs empty function in code review. There isn’t really any reason to use it in my opinion:

  • it tests both for the existance and the non-falsy value and as a reader the writers intent is not clear.
  • it does not communicate what type a variable has, seeing empty($value) does not narrow down what value is enough.
  • it hides errors caused by typos in the variable name passed to empty.

This is a list of alternatives that I recommend to use instead of empty.

testing that the string has length 0

// Replace
if (empty($string)) {
}

// With
if (strlen($string) === 0) {
}

Using strlen communicates that the variable is a string.

testing that array has no elements

// Replace
if (empty($array)) {
}

// With
if (count($array) === 0) {
}

Using count communicates that the variable is an array.

testing if array has key

// Replace
if (empty($array[$key])) {
}

// With
if (array_key_exists($array, $key)) {
}

// or if you know that existance also means a non-empty value
if (isset($array[$key])) {
}

Both array_key_exists and isset more clearly communicate that the code cares about the existance of a key and not that it assumes existance and tests only for a non-empty value.

testing if a variable was declared in the scope

// Replace
if (empty($var)) {
}

// With
if (isset($var)) {
}

In this case you can also use isset, but in general if you need to check this there is something wrong with the code anyways. A variable should always be or not be declared in the current scope, there should not be uncertainty about it.

testing if a variable is null

// Replace
if (empty($var)) {
}

// With
if ($var === null) {
}

Code is more cleary a test for the variable being null and not accidently also true when 0, false or empty string.

testing if a variable is 0

// Replace
if (empty($var)) {
}

// With
if ($var === 0) {
}

Communicates that variable is integer.

testing if a variable is false

// Replace
if (empty($var)) {
}

// With
if ($var === false) {
}
if (!$var) {
}

Communicates that variable is boolean.

What if?

Now what if I want to test for existance and non emptiness for an array keys? Using two conditions makes this more clear.

Now what if I care about performance of empty being a language construct vs calling an internal function such as strlen or count? No worries, both strlen and count are optimized by Opcache if you prefix them with a backslash to directly call them in the global namespace.

Automatic Enforcement

Armed with PHP Code Sniffer and Psalm I am happy to enforce rules with my own custom plugins for either one tool.

We have a custom Psalm plugin in our Tideways code base and I have added an automatic tests for empty including a suggest for the right replacement based on the type. Its not perfect and does not yet catch all different kinds of empty usages, but its a good enough start. See this Gist for the code and how to include it in your psalm.xml.

What to look out for testing PHP8 JIT

PHP 8 is coming with a Just In Time Compiler (JIT) and people are starting to test it in more detail. Here are a few things you should be careful about when testing the JIT:

  • Make sure sure test with different trigger modes. Trigger is the 3rd number in the line: 1205 (JIT everything), 1235 (JIT hot code based on relative usage), 1255 (trace hot code for JITability).

    Best results seem to be with “opcache.jit=1255” at the moment.

  • Do not run your examples in the CLI. Use “php-cgi” with the -T flag for reptitions. This allows Opcache and JIT to optimize on the first request, and then you can see in the following repetitions how the optimized performance is. Just for experimenting it makes sense to have “php-cgi -T 2” for running one warmup and one hot request against the same script.

    Clarification: JIT works with the CLI, but I find its harder to make the exact effects visible and comparable and requires more configuration.

  • Make sure the JIT is actually running by looking at opcache_get_status(false). It has a key “jit” and it should have a key “on” set to true, and the “buffer_free” should be lower than “buffer_size”, meaning the JIT actually uses some of the allocated memory for something.

  • To test if code gets faster with JIT, don’t run it under stress/concurrency. JIT optimizes CPU time, so the improvement must already be visible when running one jitted request against one unjitted one. Repeat those runs multiple times to find out if the result is “stable” or fluctuates a lot.

  • If you want to test if JIT increases throughput, that means more requests can be handled at the same time than without JIT, then you should ramp-up the number of concurrent requests and not start with a high number. Start with 10, and go up to 100, 1000, … - At some point “regular” PHP will fall over and you can then find how much more JIT takes before it also falls over.

Testing JIT performance for Doctrine ORM I setup this Github repository with more detailed examples, especially the php-cgi call, which I want to repeat here:

/opt/php/php-src/bin/php-cgi -T 10 -dopcache.enable=1 -dopcache.jit=1255 -dopcache.jit_buffer_size=64M index.php

The -T 10 here means 10 reptitions of the index.php script. This triggers OPCache and JIT optimizations on the first request and allows testing performance of already optimized code in subsequent requests.

The ini settings explained:

  • opcache.enable=1 enables OPCache.
  • opcache.jit=1255 sets various flags, including the triggering mode, of the JIT. Defaults to 1205, which is probably inefficient JITing everything.
  • opcache.jit_buffer_size=64M must be set, because otherwise JIT is not enabled. By default this is zero. At the moment setting the buffer size is the switch to enable/disable JIT. It is planned to change INI settings slightly before the PHP 8.0 GA to simplify this step.

Looking forward to read more about the JIT experience from all of you!

Updates:

  • July 5th, clarified JIT works with CLI, added detailed explanation about JIT INI settings

Clean Code and Object Calisthenics Rules I try to Follow

I do a lot of Code Reviews, and without proper automation of most of the low level items that you are usually “remarking” to colleagues it is a frustrating experience for everyone involved and takes more time that needed.

My natural tendency is towards action over planning, so I tend to find arguments over coding style are a waste of time if sufficient standards exist that you can just adopt. In my opinion this is is something team leads and managers need to rule on to avoid that their teams become unproductive in little coding-standard and style discussions.

Choosing a set of strict rules is very important and enforcing them helps reducing time spent on unimportant details during code review and allows more junior developers to prepare their changes in a way that most feedback can be automated.

Coding Styles alone are not enough in most cases. There should also be rules with respect to what type of language constructs and building blocks are preferred over others.

Equipped with a fixed ruleset, you can automate the low-level parts of code review in such a way that every developer can perform it themselves and fix violations before you even get to see the code in the first place.

In this post I give an (probably incomplete) overview of rules and patterns we follow for Tideways code base.

A strict coding style

The first thing I put in place when Tideways hired its first engineer was PHP Code Sniffer with a slightly adopted Doctrine Standard (PSR-2 + a lot more), coupled with static analysis using Psalm. Combined with Github Checks Run API and Scotts diff filter tool all the violations are shown directly within Github commits and Pull Requests.

../../../_images/github-check1.png

Using the phpcbf command and a lot of manual work we are down to zero violations. If you start using a standard from the beginning this won’t happen, but strict enforcement of coding standard is not something I was used to before.

Violation check runs on Github include a message with a command to copy paste and auto-fix violations if available for PHP Code Sniffer, ESLint and TSLint.

We also have PHPStorm configurations to integrate each tool into the IDE.

In my opinion PSR-2 is a good first step, but it leaves a lot of details open that the Doctrine Coding Standard locks further down. A lot those rules are very strict and we had to disable them on our code base. If you are interested our phpcs rule definitions are in this Gist.

Besides “simple” code style checks, we have some clean code guidelines that we don’t check with phpcs, because they are not black and white but require some deliberation.

For Javascript we enforce the strict defaults of ESLint/TSLint and for Go the default coding standard that go fmt produces.

A static code analyzer, but not at highest level

We use Psalm for static code analysis. It automatically tries to find out all types of all variables in your application and then checks hundrets of different rules that might indicate bugs or improve the design of your application.

While type safety is awesome, without generics forcing Psalm or PHP-Stan at the highest level leads to a lot of clutter and boilerplate code that is only necessary to make the tool happy.

This is why I recommend to only fail the build on the obvious rules and leave the other ones as Warnings for the developer visible from the IDE.

We use a lenient psalm.xml and the Baseline feature to get down to zero errors. Every error fails the build.

Take very good care of API Information Hiding and Encapsulation

A lot of accidental complexity in code comes from exposing implementation details to the outer layers of your application. This is mostly due to function arguments or return values that expose the implementation details.

This concept is called Information Hiding.

When adding arguments or return values think about the client calling the function:

  • Do they only get the information required to work, or does it return additional information (extra array keys, objects) that are unncessary?
  • Does a function argument expose the internal implementation of a datastructure? Can you turn it into a simpler input that is converted to the desired variable internally?

This is extremely difficult skill to master and takes a lot of practice.

To New or Not to New?

One of the most influential blog posts on me has been To new or not to new by Misko Hevery.

It clearly defines which types of objects can be constructed with new in place (data objects, entities, value objects) and which ones should also be injected and centrally constructed by a factory object or a dependency injection container.

Separate Statements and Declartions by empty lines

Blocks of statements and blocks of declarations usually get separated by an empty line, to make a very visual distinction. Also statements with a different context tend to be split by empty lines as well. See this example from one our controllers, where each statement and declaration is in its own paragraph.

<?php

 public function deleteAction(PageContext $context, $serverName)
 {
     $server = $this->findServer($serverName, $context);

     $this->serverRepository->remove($server);

     $this->jobQueue->queue(new BalanceServerTask(['applicationId' => $context->application->getId()]));

     return $context->redirectRoute('XhprofProfilerBundle.Server.environment');
 }

I would define paragraphs are statements or declarations that belong to each other. This concept closely follows paragraphs in natural languages and is easy to grasp and follow.

Use early exits and don’t use else

Code almost never needs the else keyword. It can be written in a way to use early exits with return or continue instead. This following code could have been written with an else. This rule keeps the main/happy path of the code on the main level of the method.

<?php

 public function formatQueryString(string $query): string
 {
     if (strlen($query) === 0) {
         return '';
     }

     parse_str($query, $queryArgs);

     $queryArgs = $this->truncateArrayValues($queryArgs);

     return json_encode($queryArgs, JSON_PRETTY_PRINT);
 }

For loops we continue to the next iteration for early exits:

<?php

 private function parseErrorTrace(string $trace, $removeArguments = false)
 {
     $traceResult = [];

     $parts = explode("\n", $trace);
     foreach ($parts as $line) {
         if (strpos($line, '{main}') !== false) {
             continue;
         }

         if (strpos($line, '#') !== 0) {
             continue;
         }

         // more
     }
 }

Only one level of indentation per method and a second for early exits

Another rule I try to follow is “only one level of indentation per method” from the Object Calisthenics rulebook.

Guilherme from the Doctrine team introduced me to Object Calisthenics many years ago and I found it a very good guidestick to improve my code without having to think too much.

The deleteAction in the controller example above inhibits this rule, having no additional level of indentation. This second example shows a controller with one level of indentation, using the “Guard clause” pattern instead of nested conditions (See the refactoring)

<?php

 public function process(Task $task)
 {
     $application = $this->organizationRepository->findApplication($task->applicationId);

     if (!$application) {
         return;
     }

     $retentionDays = $application->getRetentionDays();
     $daysDiff = (new \DateTime('now'))->diff($task->date)->days;

     if ($daysDiff > $retentionDays) {
         return;
     }

     $this->historicalDataAction->compute($task->applicationId, $task->date, $task->overwrite);
 }

When methods use loops two levels of indention are ok, if the second level is used only for guard clauses.

<?php
 $lastProductionCommunication = null;

 foreach ($servers as $server) {
     if ($server->getEnvironment() !== 'production') {
         continue;
     }

     if (!$server->getLastCommunication()) {
         continue;
     }

     if ($lastProductionCommunication !== null && $lastProductionCommunication > $date)
         continue;
     }

     $lastProductionCommunication = $date;
 }

It is not easy to follow this rule to the dot, and we do have a lot of code in Tideways that doesn’t follow this rule as we don’t enforce it. But I view the violation of the one level rule as an automtaic, immediate signal of both complexity and potential for refactoring.

Don’t abbreviate names (too much)

This rule is easier in PHP than in Go (by design). As we use both languages, we force ourselves to write out what a variable, class, constant or function is doing, and avoid abbreviations.

func (previous PreviousSpans) GetLastIndex(category, summary string) int {
    if entry, ok := previous[category]; ok {
        if entry.Summary == summary {
            return entry.Index
        }
    }
    return -1
}

Certain names still get abbreviated though, for example i, idx, j and other common ones.

Prefer Null Coalesce Operator over if isset

We still work with a lot of arrays, because PHP makes it much easier to use them than complex types (objects). The risk of using arrays is not knowing if a key exists, and code being littered with if (isset($data['key'])) { statements all over the place. It is much better to use the null coalesce operator ?? to define default values for keys that you don’t know exist on an array:

<?php
 $hostName = $serverNames[$currentOccurrance['sId']] ?? 'unknown';

Don’t allow a variable to have two types

If something doesn’t exist it is NULL or false right? Wrong, if you care about simplicity of your code.

  • If a method is supposed to return an object but can’t, it may be better to throw an exception.
  • If an array is empty, then it is empty, not null or false. Iterating over an empty array can correctly do nothing.

I prefer methods written in a way, where they either only handle the default (happy path) by throwing exceptions otherwise, or by defining all variables in a way where even the “unhappy/error” paths can run successfully through the happy code path and yield the exactly right result.

Don’t re-use a variable with a different type

PHP allows this, but it can lead to very confusing code if you re-use the same variable with different types in a method or function. Avoid at all costs.

<?php
 public function deleteAction(PageContext $context, string $serverName)
 {
     $server = $this->findServer($serverName, $context);

In the previous block the argument $serverName could easily have also be called $server and therefore be both a string and an object of type Server within different parts of the method.

Typehint Collection Items with assert

PHP has no way to typehint the items of collections (for example arrays, iterators). You either typehint array, \Doctrine\Common\Collections\Collection or any other class in your methods, with no way to specify type of the key or value once you start iterating with foreach.

There are various docblock supported styles, to define these types, which I recommend using on class properties and return values such as:

<?php
 class XhprofReport extends Struct
 {
     /** @var array<string> \*/
     public $namespaces = [];
 }

But to help both PHPStorm and Psalm (our static code analyser) to understand items of a collection, we use assert() with instanceof, if we can’t provide the types ourselves on return values where we cannot define the typehint, for example when using Doctrine repositories:

<?php
 $organizations = $organizationRepository->findAll();

 foreach ($organizations as $organization) {
     assert($organization instanceof Organization);

     // ..
 }

Use structs to wrap arrays (especially from SQL results)

To avoids typical complex array problems such as not knowing what keys exist or what types values have, arrays should always be wrapped in objects extending from a Struct base class.

The struct base class maps an array to properties of the class in the constructor:

<?php
 abstract class Struct
 {
     public function __construct(array $data = [])
     {
         foreach ($data as $property => $value) {
             if (!property_exists($this, $property)) {
                 throw new \RuntimeException();
             }

             $this->$property = $value;
         }
     }
 }

Often data in Tideways is best represented in domain objects that doesn’t fit the entity or database model 100%. In those cases we manually map the database to the desired objects using this Struct pattern for simplicity.

Take this ServerConfiguration struct which is a combination of data from three Doctrine entities and is assembled in a repository from custom SQL.

<?php
class ServerConfiguration extends Struct
{
    /** @var int */
    public $id;
    /** @var string */
    public $environment;
    /** @var int */
    public $tracesPerMinute;
    /** @var bool */
    public $disabled;
    /** @var bool */
    public $newlyCreated = false;
    /*\*/
}

I am really looking forward when we deploy PHP 7.4 with typed properties where this pattern will become much more powerful and will allow us to write a “mini orm” for mapping SQL results to typed objects.

You can read more about Struct objects on the Qafoo blog and look into Kore’s DataObject package that implements this pattern in a reusable Composer package.

Firegento Hackathon Mainz 2020

This weekend I attended the Firegento Hackathon at Netz98 in Mainz. Due to the storm I had to leave early and skip presentation of my project, so I want to post a short summary of my work here on the blog.

I want to thank everyone from the Firegento organizer team, the hosts and sponsors. The event was really great and I was happy to get to know new people from the community.

Magento 2 Performance and Profiling

I proposed a project to work on Magento 2 Performance and Profiling and we started working on it on and off with a group of 6-7 people, which is quite an awesome crowd!

The biggest achievement was that we could introduce PHP profiling to a few people that have never used Profilers before. After a general overview on how different profiling approaches work we looked at Tideways and Xdebug as examples. It was really awesome to see the reaction of what is actually possible with PHP and that its possible to gather this data.

We looked at some data from production shops with performance problems on the category page with a lot of N+1 Product loads and Andreas explained how they refactored this bottleneck away.

With input from several people we poked into Magento 2 (demo shop) and found two things that seem worth investigating more:

  1. The Redis cache implementation could use a more efficient implementation for the app/config cache, where items are quite static over time. We saw 80-100ms of Redis calls for these caches across different shops (not only demo) that could be cut down by a more specialized cache implementation.

    I attempted a prototype working on this but relaized it needs probably 2-3 days of concentrated work to get a prototype working.

    One thing that Symfony + Doctrine benefit from is caches that write and execute PHP files so that Opcache can make use of interned strings, packed and immutable arrays internally.

  2. There is a ProductViewCounter UI block that renders JSON into the output of each page. This widget is used for the feature to show the last viewed/compared products and also is responsible for the view counter in the backend. This features are often not used, but the output of this UI block is still rendered into every page anyways.

    The serialization takes about 60-80ms, so there is potential here for a performance gain if there was a setting to disable this feature completly.

While working with Magneto 2 shops in Tideways, we realized there are a lot of small optimization that we can do to the callgraph profiler. There is room to add more specialized instrumentation and improve Magento 2 results. With the way Magento 2 uses interceptors (AOP) the profiling call stacks are sometimes really hard to understand, so I have written down around 10 improvement ideas we can add to the Profiler. Expect the results to go into Tideways over the next weeks and months.

This affects things like how Closures are rendered, or skipping them in the callgraph. And for a lot of calls in Magento, the actual function name should be augmented with more information, or changed to a different name to make it easier to grasp without knowing Magneto2 interceptor magic details.

More about: Magento