Reviewing a Peer's Optimization

A Code Review Summary for SPO600

For the final project of the Software Portability Course, the class was tasked with reviewing the code of a peer who’d set up a pull request for Chris’ GLIBC repository. For my code review, I decided my good friend John would be a worthy candidate for my review musings, his pull request being found here for those interested in viewing.

There were a few stylistic issues that I brought up, all of which a simple fix would remedy.

The Code

/* Copyright (C) 1991-2017 Free Software Foundation, Inc.
   This file is part of the GNU C Library.

   The GNU C Library is free software; you can redistribute it and/or
   modify it under the terms of the GNU Lesser General Public
   License as published by the Free Software Foundation; either
   version 2.1 of the License, or (at your option) any later version.

   The GNU C Library is distributed in the hope that it will be useful,
   but WITHOUT ANY WARRANTY; without even the implied warranty of
   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
   Lesser General Public License for more details.

   You should have received a copy of the GNU Lesser General Public
   License along with the GNU C Library; if not, see
   <http://www.gnu.org/licenses/>.  */

#include <string.h>
#include <stdint.h>
#include <libc-pointer-arith.h>

#undef strcspn

#ifndef STRCSPN
# define STRCSPN strcspn
#endif

/* Return the length of the maximum initial segment of S
   which contains no characters from REJECT.  */
size_t STRCSPN (const char *str, const char *reject)
{
  if (__glibc_unlikely (reject[0] == '\\0') ||
      __glibc_unlikely (reject[1] == '\\0'))
    return __strchrnul (str, reject [0]) - str;

  /* Use multiple small memsets to enable inlining on most targets.  */
  unsigned char table[256];
  unsigned char *p = memset(table, 0, 64);
  memset(p + 64, 0, 64);
  memset(p + 128, 0, 64);
  memset(p + 192, 0, 64);

  unsigned char *s = (unsigned char*)reject;
  unsigned char tmp;
  do
    p[tmp = *s++] = 1;
  while (tmp);

  s = (unsigned char*)str;
  if (p[s[0]]) return 0;
  if (p[s[1]]) return 1;
  if (p[s[2]]) return 2;
  if (p[s[3]]) return 3;
  if (p[s[4]]) return 4;
  if (p[s[5]]) return 5;
  if (p[s[6]]) return 6;
  if (p[s[7]]) return 7;

  unsigned int c0, c1, c2, c3, c4, c5, c6, c7;
  do
  {
    s += 8;
    c0 = p[s[0]];
    c1 = p[s[1]];
    c2 = p[s[2]];
    c3 = p[s[3]];
    c4 = p[s[4]];
    c5 = p[s[5]];
    c6 = p[s[6]];
    c7 = p[s[7]];
  } while ((c0 | c1 | c2 | c3 | c4 | c5 | c6 | c7) == 0);

  size_t count = s - (unsigned char *)str;

  if (c0 | c1 != 0) {
    return count - c0 + 1;
  }
  else if ((c2 | c3) != 0) {
    return count - c2 + 3;
  }
  else if ((c4 | c5) != 0) {
    return count - c4 + 5;
  }
  else {
    return count - c6 + 5;
  }

}
libc_hidden_builtin_def (strcspn)

The Stylistic Issues In Question

Throughout the GLIBC, a common coding convention can be found throughout the familiar and obscure files. In such convention, is the inclusion of a space between the function name and arguments. John’s editor perhaps did not detect this, and instead replaced all of his code with the common function sans-space argument arrangement.

As you can see below, the issue is a minor one which is easily overlooked.

Coding Convention Issue #1: Correct

memset (p + 64, 0, 64);

Coding Convention Issue #1: Incorrect

memset(p + 64, 0, 64);

Another convention issue, was the rearranging of declaration syntax for variables and functions. I won’t deny, the GLIBC’s coding style is unfamiliar to someone who’s experience with C derives from few courses, and I did question why that style of C syntax was deemed essential for the time. Perhaps to reduce line length? This idea does make sense on the low-resolution terminals utilized in the days of old, but does look rather odd to the modern programmer.

Coding Convention Issue #2: Correct

size_t
STRCSPN (const char *str, const char *reject)

Coding Convention Issue #2: Incorrect

size_t STRCSPN (const char *str, const char *reject)

Conclusion

John’s optimizations enable support for 64-bit processing, which is a big improvement to modern servers & programs alike. Having gone through his code modifications for the review, I did not see any issues regarding logic or operations, which in the end will always be the make-it / break-it factor. He did damn good work with the optimization, and with the stylistic change I’ve mentioned above, I think the upstream developers will accept his code contribution without hesitation.