A procedure to deal with big amounts of email

The problem

You probably know the problem already: A day with less than 500 emails feels like your internet connection might be lost. The amount of emails you receive can accurately predict the time of day. In my case, I’ll always receive my 300th email each day right before lunch. Imagine that I’ve spent one minute for each message, then I would have done nothing but reading emails yet. And by the time I return from lunch, more emails have found their way into my inbox. My job description is not “email reader”. It actually is one of the lesser prioritized activities of my job. But I keep most answering times low and always know the content of my inbox. You’ll seldom hear “sorry, I haven’t seen your email yet” from me. How I keep the email flood in check is the topic of this blog entry. It’s my personal procedure, so nothing fancy with a big name, but you might recognize some influence from well-known approaches like “Getting Things Done” by David Allen.

The disclaimer

Disclaimer: You might entirely disagree with my approach. That’s totally acceptable. But keep in mind that it works for at least one person for a long time now, even if it doesn’t fit your style. Email processing seems to be a delicate topic, please keep your comments constructive. By the way, I’d love to hear about your approach. I’m always eager to learn and improve.

The analogy

Let me start with a common analogy: Your email inbox is like your mailbox. All letters you receive go through your mailbox. All emails you receive go through your inbox. That’s where this analogy ends and it was never useful to begin with. Your postman won’t show up every five minutes and stuff more commercial mailings, letters, postcards and post-it notes into your mailbox (raising that little flag again that indicates the presence of mail). He also won’t announce himself by ringing your door bell (every five minutes, mind you) and proclaiming the first line of three arbirtrarily chosen letters. Also, I’ve rarely seen mailboxes that contain hundreds if not thousands of letters, some read, some years old, in different states of decay. It’s a common sight for inboxes whose owners gave up on keeping up. I’ve seen high stacks of unanswered correspondence, but never in the mailbox. And this brings me to my new analogy for your email inbox: Your email inbox is like your desk. The stacks of decaying letters and magazines? Always on desks (and around it in extreme cases). The letters you answer directly? You bring them to your desk first. Your desk is usually clear of pending work documents and this should be the case for your inbox, too.

(c) Fotolia Datei: #87397590 | Urheber: thodonal

The rules

My procedure to deal with the continuous flood of e-amils is based on three rules:

  • The inbox is the only queue of emails that needs attention. It is only filled with new emails (which require activity from my side) or emails that require my attention in the foreseeable future. The inbox is therefor only filled with pending work.
  • Email processing is done manually. I look at each email once and hopefully only once. There are no automatic filters that sort emails into different queues before I’ve seen them.
  • For every email, interaction results in an activity or decision on my side. No email gets “left there”.

Let me explain the context of the rules in a bit more detail:

My email account has lots and lots of folders to store all emails until eternity. The folders are organized in a hierarchy, but that doesn’t really matter, because every folder can hold emails. The hierarchy of folders isn’t pre-planned, it emerges from the urge to group emails together. It’s possible that I move specific emails from one folder to another because the hierarchy has changed. I will use automatic filters to process emails I’ve already read. But I will mark every email as read by myself and not move unread emails around automatically. This narrows the place to look for new emails to one place: the inbox. Every other folder is only for archivation, not for processing.

The sweeps

The amount of unread emails in my inbox is the amount of work I need to do to return to the “only pending work” state. Let’s say that I opened my email reader and it shows 50 new messages in the inbox. Now I’ll have to process and archivate these 50 emails to be in the same state as before I had opened it. I usually do three separate processing steps:

  • The first sweep is to filter out any spam messages by immediate tell-tale signs. This is the only automatic filter that I’ll allow: the junk filter. To train it, I mark any remaining junk mail as spam and let the filter deal with it. I’m still not sure if the junk filter really makes it easier for me, because I need to scan through the junk regularly to “rescue” false positives (legitimate mails that were wrongfully sorted out), but the junk filter in combination with my fast spam sweep will lower the message count significally. In our example, we now have 30 mails left.
  • The second sweep picks every email that is for information purpose only. Usually those mails are sent by software tools like issue trackers, wikis, code review tools or others. Machines don’t feel the effort of writing an email, so they’ll write a lot. Most of the time, the message content is only a few lines of text. I grab each of these mails and drag them into their corresponding folder. While I’m dragging, I read (and memorize) the content of the email. The problem with this kind of information is that it’s a lot of very small chunks of data for a lot of different contexts. In order to understand those messages, you have to switch your mental context in a matter of seconds. You can do it, but only if you aren’t interrupted by different mental states. So ignore any email that requires more than a few seconds of focused attention from you. Let them sit into your inbox along with the emails that require an answer. The only activity for mails included in your second sweep is “drag to folder & memorize”. Because machines write often, we now have 10 mails left in our example.
  • The third sweep now attends to each remaining mail independently. Here, the three-minutes rule applies: If it takes less than three minutes to reply to the email right now, then do it right now and archivate the mail in a suitable folder (you might even create a new folder for it). Remember: if you’ve processed an email, it leaves the inbox. If it takes more than three minutes, you need to schedule an attention slot for this particular message on your todo-list. This is the only time the email remains in the inbox, because it’s a signal of pending work. In our example, 7 emails could be answered with short replies, but 3 require deep concentration or some more text for the answer.

After the three sweeps, only emails that indicate pending work remain in the inbox. They only leave the inbox after I’ve dealt with them. I need to schedule a timebox to work on them, but after that, they’ll find themselves out of the inbox and in a folder. As soon as an email is in a folder, I forget about its existence. I need to remember the information that were in it, but not the message itself.

The effects

While dealing with each email manually sounds painfully slow at first, it becomes routine after only a short while. The three sweeps usually take less than five minutes for 50 emails, excluding the three major correspondence tasks that make their way as individual items on my work schedule. Depending on your ratio of spam to information messages to real correspondence, your results may vary.

The big advantage of dealing manually with each email is that I’ve seen each message with my own eyes. Every email that an automatic filter grabs and hides before you can see it should not have been sent in the first place – it’s simulating a pull notification scheme (you decide when to receive it by opening the folder) rather than leveraging the push notification scheme (you need to deal with it right now, not later) that emails are inherently. Things like timelines, activity streams or message boards are pull-oriented presentations of presumably the same information, perhaps that’s what you should replace your automatically hidden emails with.

You’ll have an ever-growing archive with lots of folders for different things (think of a shelf full of document files in real life), but you’ll never look into them as long as you don’t desperately search “that one mail from 3 months ago”. You’ll also have a clean inbox, preferably in “blank slate” condition or at least with only emails that require actions from you. So you’ll have a clear overview of your pending work (the things in your inbox) and the work already done (the things  in your folders).

The epilogue

That’s when you discover that part of your work can be described as “look at each email and move it to a folder” as if you were an official in charge for virtual paper. We virtualized our paperwork, letters, desks, shelves and document files. But the procedures to deal with them is still the same.

Your turn now

How do you process your emails? What are your rules and habits? What are your experiences with folders vs. tags? I would love to hear from you – in a comment, not an email.

C++ Coroutines on Windows with the Fiber API

Last week, I had the chance to try out coroutines as a way to cooperatively interleave long tasks with event-processing. Unlike threads, where you can have interaction between between threads at any time, coroutines need to yield control explicitly, whic arguably makes synchronisation a little simpler. Especially in (legacy) systems that are not designed for concurrency. Of course, since coroutines do not run at the same time, you do not get the perks from concurrency either.

If you don’t know coroutines, think of them as functions that can be paused and resumed.

Unlike many other languages, C++ does not have built-in support for coroutines just yet. There are, however, several alternatives. On Windows, you can use the Fiber API to implement coroutines easily.

Here’s some example code of how that works:

auto coroutine=make_shared<FiberCoroutine>();
coroutine->setup([](Coroutine::Yield yield)
{
  for (int i=0; i<3; ++i)
  {
    cout << "Coroutine " 
         << i << std::endl;
    yield();
  }
});

int stepCount = 0;
while (coroutine->step())
{
  cout << "Main " 
       << stepCount++ << std::endl;
}

Somewhat surprisingly, at least if you have never seen coroutines, this will output the two outputs alternatingly:

Coroutine 0
Main 0
Coroutine 1
Main 1
Coroutine 2
Main 2

Interface

Since fibers are not the only way to implement coroutines and since we want to keep our client code nicely insulated from the windows API, there’s a pure-virtual base class as an interface:

class Coroutine
{
public:
  using Yield = std::function<void()>;
  using Run = std::function<void(Yield)>;

  virtual ~Coroutine() = default;
  virtual void setup(Run f) = 0;
  virtual bool step() = 0;
};

Typically, creation of a Coroutine type allocates all the resources it needs, while setup “primes” it with an inner “Run” function that can use an implementation-specific “Yield” function to pass control back to the caller, which is whoever calls step.

Implementation

The implementation using fibers is fairly straight-forward:

class FiberCoroutine
  : public Coroutine
{
public:
  FiberCoroutine()
  : mCurrent(nullptr), mRunning(false)
  {
  }

  ~FiberCoroutine()
  {
    if (mCurrent)
      DeleteFiber(mCurrent);
  }

  void setup(Run f) override
  {
    if (!mMain)
    {
      mMain = ConvertThreadToFiber(NULL);
    }
    mRunning = true;
    mFunction = std::move(f);

    if (!mCurrent)
    {
      mCurrent = CreateFiber(0,
        &FiberCoroutine::proc, this);
    }
  }

  bool step() override
  {
    SwitchToFiber(mCurrent);
    return mRunning;
  }

  void yield()
  {
    SwitchToFiber(mMain);
  }

private:
  void run()
  {
    while (true)
    {
      mFunction([this]
                { yield(); });
      mRunning = false;
      yield();
    }
  }

  static VOID WINAPI proc(LPVOID data)
  {
    reinterpret_cast<FiberCoroutine*>(data)->run();
  }

  static LPVOID mMain;
  LPVOID mCurrent;
  bool mRunning;
  Run mFunction;
};

LPVOID FiberCoroutine::mMain = nullptr;

The idea here is that the caller and the callee are both fibers: lightweight threads without concurrency. Running the coroutine switches to the callee’s, the run function’s, fiber. Yielding switches back to the caller. Note that it is currently assumed that all callers are from the same thread, since each thread that participates in the switching needs to be converted to a fiber initially, even the caller. The current version only keeps the fiber for the initial thread in a single static variable. However, it should be possible to support this by replacing the single static fiber pointer with a map that maps each thread to its associated fiber.

Note that you cannot return from the fiberproc – that will just terminate the whole thread! Instead, just yield back to the caller and either re-use or destroy the fiber.

Assessment

Fiber-based coroutines are a nice and efficient way to model non-linear control-flow explicitly, but they do not come without downsides. For example, while this example worked flawlessly when compiled with visual studio, Cygwin just terminates without even an error. If you’re used to working with the visual studio debugger, it may surprise you that the caller gets hidden completely while you’re in the run function. The run functions stack completely replaces the callers stack until you call yield(). This means that you cannot find out who called step(). On the other hand, if you’re actually doing a lot of processing in the run function, this is quite nice for profiling, as the “processing” call tree seemingly has its own root in the call-tree.

I just wish the visual studio debugger had a way to view the states of the different fibers like it has for threads.

Alternatives

  • On Linux, you can use the ucontext.
  • Visual Studio 2015 also has another, newer, implementation.
  • Coroutines can be implemented using threads and condition-variables.
  • There’s also Boost.Coroutine, if you need an independent implementation of the concept. From what I gather, they only use Fibers optionally, and otherwise do the required “trickery” themselves. Maybe this even keeps the caller-stack visible – it is certainly worth exploring.
  • Using passwords with Jenkins CI server

    For many of our projects the Jenkins continuous integration (CI) server is one important cornerstone. The well known “works on my machine” means nothing in our company. Only code in repositories and built, tested and packaged by our CI servers counts. In addition to building, testing, analyzing and packaging our projects we use CI jobs for deployment and supervision, too. In such jobs you often need some sort of credentials like username/password or public/private keys.

    If you are using username/password they do not only appear in the job configuration but also in the console build logs. In most cases this is undesirable but luckily there is an easy way around it: using the Environment Injector Plugin.

    In the plugin you can “inject passwords to the build as environment variables” for use in your commands and scripts.inject-passwords-configuration

    The nice thing about this is that the passwords are not only masked in the job configuration (like above) but also in the console logs of the builds!inject-passwords-console-log

    Another alternative doing mostly the same is the Credentials Binding Plugin.

    There is a lot more to explore when it comes to authentication and credential management in Jenkins as you can define credentials at the global level, use public/private key pairs and ssh agents, connect to a LDAP database and much more. Just do not sit back and provide security related stuff plaintext in job configurations or your deployments scripts!

    A zine for the modern developer

    Martin Fowler once said:

    Test for value of an article isn’t novelty of its ideas, but how many who could benefit who haven’t considered them, or are using them badly

    So I thought that I should share some things I am learning and have learned as a software developer. Maybe one time I will write a book but this requires a huge effort. Luckily Julia Evans (thank you!) just wrote a post having the same dilemma. She decided to create and publish a lean version of a book: a zine.
    This pushed me to do my own zine. I believe that a modern software developer is confronted with many topics besides programming and so the topics will range from software architecture over user experience (UX) to marketing.
    My first zine is about CQRS, an architecture I am excited about. Enjoy (PDF download).

    Recap of the Schneide Dev Brunch 2016-08-14

    brunch64-borderedTwo weeks ago at sunday, we held another Schneide Dev Brunch, a regular brunch on the second sunday of every other (even) month, only that all attendees want to talk about software development and various other topics. This brunch had its first half on the sun roof of our company, but it got so sunny that we couldn’t view a presentation that one of our attendees had prepared and we went inside. As usual, the main theme was that if you bring a software-related topic along with your food, everyone has something to share. We were quite a lot of developers this time, so we had enough stuff to talk about. As usual, a lot of topics and chatter were exchanged. This recapitulation tries to highlight the main topics of the brunch, but cannot reiterate everything that was spoken. If you were there, you probably find this list inconclusive:

    Open-Space offices

    There are some new office buildings in town that feature the classic open-space office plan in combination with modern features like room-wide active noise cancellation. In theory, you still see your 40 to 50 collegues, but you don’t necessarily hear them. You don’t have walls and a door around you but are still separated by modern technology. In practice, that doesn’t work. The noise cancellation induces a faint cheeping in the background that causes headaches. The noise isn’t cancelled completely, especially those attention-grabbing one-sided telephone calls get through. Without noise cancellation, the room or hall is way too noisy and feels like working in a subway station.

    We discussed how something like this can happen in 2016, with years and years of empirical experience with work settings. The simple truth: Everybody has individual preferences, there is no golden rule. The simple conclusion would be to provide everybody with their preferred work environment. Office plans like the combi office or the flexspace office try to provide exactly that.

    Retrospective on the Git internal presentation

    One of our attendees gave a conference talk about the internals of git, and sure enough, the first question of the audience was: If git relies exclusively on SHA-1 hashes and two hashes collide in the same repository, what happens? The first answer doesn’t impress any analytical mind based on logic: It’s so incredibly improbable for two SHA-1 hashes to collide that you might rather prepare yourself for the attack of wolves and lightning at the same time, because it’s more likely. But what if it happens regardless? Well, one man went out and explored the consequences. The sad result: It depends. It depends on which two git elements collide in which order. The consequences range from invisible warnings without action over silently progressing repository decay to immediate data self-destruction. The consequences are so bitter that we already researched about the savageness of the local wolve population and keep an eye on the thunderstorm app.

    Helpful and funny tools

    A part of our chatter contained information about new or noteworthy tools to make software development more fun. One tool is the elastic tabstop project by Nick Gravgaard. Another, maybe less helpful but more entertaining tool is the lolcommits app that takes a mugshot – oh sorry, we call that “aided selfie” now – everytime you commit code. That smug smile when you just wrote your most clever code ever? It will haunt you during a git blame session two years later while trying to find that nasty heisenbug.

    Anonymous internet communication

    We invested a lot of time on a topic that I will only decribe in broad terms. We discussed possibilities to communicate anonymously over a compromised network. It is possible to send hidden messages from A to B using encryption and steganography, but a compromised network will still be able to determine that a communication has occured between A and B. In order to communicate anonymously, the network must not be able to determine if a communication between A and B has happened or not, regardless of the content.

    A promising approach was presented and discussed, with lots of references to existing projects like https://github.com/cjdelisle/cjdns and https://hyperboria.net/. The usual suspects like the TOR project were examined as well, but couldn’t hold up to our requirements. At last, we wanted to know how hard it is to found a new internet service provider (ISP). It’s surprisingly simple and well-documented.

    Web technology to single you out

    We ended our brunch with a rather grim inspection about the possibilities to identify and track every single user in the internet. To use completely exotic means of surfing is not helpful, as explained in this xkcd comic. When using a stock browser to surf, your best practice should be to not change the initial browser window size – but just see for yourself if you think it makes a difference. Here is everything What Web Can Do Today to identify and track you. It’s so extensive, it’s really scary, but on the other hand quite useful if you happen to develop a “good” app on the web.

    Epilogue

    As usual, the Dev Brunch contained a lot more chatter and talk than listed here. The number of attendees makes for an unique experience every time. We are looking forward to the next Dev Brunch at the Softwareschneiderei. And as always, we are open for guests and future regulars. Just drop us a notice and we’ll invite you over next time.

    For the gamers: Schneide Game Nights

    Another ongoing series of events that we established at Softwareschneiderei are the Schneide Game Nights that take place at an irregular schedule. Each Schneide Game Night is a saturday night dedicated to a new or unknown computer game that is presented by a volunteer moderator. The moderator introduces the guests to the game, walks them through the initial impressions and explains the game mechanics. If suitable, the moderator plays a certain amount of time to show more advanced game concepts and gives hints and tipps without spoiling too much suprises. Then it’s up to the audience to take turns while trying the single player game or to fire up the notebooks and join a multiplayer session.

    We already had Game Nights for the following games:

    • Kerbal Space Program: A simulator for everyone who thinks that space travel surely isn’t rocket science.
    • Dwarf Fortress: A simulator for everyone who is in danger to grow attached to legendary ASCII socks (if that doesn’t make much sense now, lets try: A simulator for everyone who loves to dig his own grave).
    • Minecraft: A simulator for everyone who never grew out of the LEGO phase and is still scared in the dark. Also, the floor is lava.
    • TIS-100: A simulator (sort of) for everyone who thinks programming in Assembler is fun. Might soon be an olympic discipline.
    • Faster Than Light: A roguelike for everyone who wants more space combat action than Kerbal Space Program can provide and nearly as much text as in Dwarf Fortress.
    • Don’t Starve: A brutal survival game in a cute comic style for everyone who isn’t scared in the dark and likes to hunt Gobblers.
    • Papers, Please: A brutal survival game about a bureaucratic hero in his border guard booth. Avoid if you like to follow the rules.
    • This War of Mine: A brutal survival game about civilians in a warzone, trying not to simultaneously lose their lives and humanity.
    • Crypt of the Necrodancer: A roguelike for everyone who wants to literally play the vibes, trying to defeat hordes of monsters without skipping a beat.
    • Undertale: A 8-bit adventure for everyone who fancies silly jokes and weird storytelling. You’ll feel at home if you’ve played the NES.

    The Schneide Game Nights are scheduled over the same mailing list as the Dev Brunches and feature the traditional pizza break with nearly as much chatter as the brunches. The next Game Night will be about:

    • Factorio: A simulator that puts automation first. Massive automation. Like, don’t even think about doing something yourself, let the robots do it a million times for you.

    If you are interested in joining, let us know.

    Generating a spherified cube in C++

    In my last post, I showed how to generate an icosphere, a subdivided icosahedron, without any fancy data-structures like the half-edge data-structure. Someone in the reddit discussion on my post mentioned that a spherified cube is also nice, especially since it naturally lends itself to a relatively nice UV-map.

    The old algorithm

    The exact same algorithm from my last post can easily be adapted to generate a spherified cube, just by starting on different data.

    cube

    After 3 steps of subdivision with the old algorithm, that cube will be transformed into this:

    split4

    Slightly adapted

    If you look closely, you will see that the triangles in this mesh are a bit uneven. The vertical lines in the yellow-side seem to curve around a bit. This is because unlike in the icosahedron, the triangles in the initial box mesh are far from equilateral. The four-way split does not work very well with this.

    One way to improve the situation is to use an adaptive two-way split instead:
    split2

    Instead of splitting all three edges, we’ll only split one. The adaptive part here is that the edge we’ll split is always the longest that appears in the triangle, therefore avoiding very long edges.

    Here’s the code for that. The only tricky part is the modulo-counting to get the indices right. The vertex_for_edge function does the same thing as last time: providing a vertex for subdivision while keeping the mesh connected in its index structure.

    TriangleList
    subdivide_2(ColorVertexList& vertices,
      TriangleList triangles)
    {
      Lookup lookup;
      TriangleList result;
    
      for (auto&& each:triangles)
      {
        auto edge=longest_edge(vertices, each);
        Index mid=vertex_for_edge(lookup, vertices,
          each.vertex[edge], each.vertex[(edge+1)%3]);
    
        result.push_back({each.vertex[edge],
          mid, each.vertex[(edge+2)%3]});
    
        result.push_back({each.vertex[(edge+2)%3],
          mid, each.vertex[(edge+1)%3]});
      }
    
      return result;
    }
    

    Now the result looks a lot more even:
    split2_sphere

    Note that this algorithm only doubles the triangle count per iteration, so you might want to execute it twice as often as the four-way split.

    Alternatives

    Instead of using this generic of triangle-based subdivision, it is also possible to generate the six sides as subdivided patches, as suggested in this article. This approach works naturally if you want to have seams between your six sides. However, that approach is more specialized towards this special geometry and will require extra “stitching” if you don’t want seams.

    Code

    The code for both the icosphere and the spherified cube is now on github: github.com/softwareschneiderei/meshing-samples.

    My favorite Unix tool

    Awk is a little language designed for the processing of lines of text. It is available on every Unix (since V3) or Linux system. The name is an acronym of the names of its creators: Aho, Weinberger and Kernighan.

    Since I spent a couple of minutes to learn awk I have found it quite useful during my daily work. It is my favorite tool in the base set of Unix tools due to its simplicity and versatility.

    Typical use cases for awk scripts are log file analysis and the processing of character separated value (CSV) formats. Awk allows you to easily filter, transform and aggregate lines of text.

    The idea of awk is very simple. An awk script consists of a number of patterns, each associated with a block of code that gets executed for an input line if the pattern matches:

    pattern_1 {
        # code to execute if pattern matches line
    }
    
    pattern_2 {
        # code to execute if pattern matches line
    }
    
    # ...
    
    pattern_n {
        # code to execute if pattern matches line
    }
    

    Patterns and blocks

    The patterns are usually regular expressions:

    /error|warning/ {
        # executed for each line, which contains
        # the word "error" or "warning"
    }
    
    /^Exception/ {
        # executed for each line starting
        # with "Exception"
    }
    

    There are some special patterns, namely the empty pattern, which matches every line …

    {
        # executed for every line
    }
    

    … and the BEGIN and END patterns. Their blocks are executed before and after the processing of the input, respectively:

    BEGIN {
        # executed before any input is processed,
        # often used to initialize variables
    }
    
    END {
        # executed after all input has been processed,
        # often used to output an aggregation of
        # collected values or a summary
    }
    

    Output and variables

    The most common operation within a block is the print statement. The following awk script outputs each line containing the string “error”:

    /error/ { print }
    

    This is basically the functionality of the Unix grep command, which is filtering. It gets more interesting with variables. Awk provides a couple of useful built-in variables. Here are some of them:

    • $0 represents the entire current line
    • $1$n represent the 1…n-th field of the current line
    • NF holds the number of fields in the current line
    • NR holds the number of the current line (“record”)

    By default awk interprets whitespace sequences (spaces and tabs) as field separators. However, this can be changed by setting the FS variable (“field separator”).

    The following script outputs the second field for each line:

    { print $2 }
    

    Input:

    John 32 male
    Jane 45 female
    Richard 73 male
    

    Output:

    32
    45
    73
    

    And this script calculates the sum and the average of the second fields:

    {
        sum += $2
    }
    
    END {
        print "sum: " sum ", average: " sum/NR
    }
    

    Output:

    sum: 150, average: 50
    

    The language

    The language that can be used within a block of code is based on C syntax without types and is very similar to JavaScript. All the familiar control structures like if/else, for, while, do and operators like =, ==, >, &&, ||, ++, +=, … are there.

    Semicolons at the end of statements are optional, like in JavaScript. Comments start with a #, not with //.

    Variables do not have to be declared before usage (no ‘var’ or type). You can simply assign a value to a variable and it comes into existence.

    String concatenation does not have an explicit operator like “+”. Strings and variables are concatenated by placing them next to each other:

    "Hello " name ", how are you?"
    # This is wrong: "Hello" + name + ", how are you?"
    

    print is a statement, not a function. Parentheses around its parameter list are optional.

    Functions

    Awk provides a small set of built-in functions. Some of them are:

    length(string), substr(string, index, count), index(string, substring), tolower(string), toupper(string), match(string, regexp).

    User-defined functions look like JavaScript functions:

    function min(number1, number2) {
        if (number1 < number2) {
            return number1
        }
        return number2
    }
    

    In fact, JavaScript adopted the function keyword from awk. User-defined functions can be placed outside of pattern blocks.

    Command-line invocation

    An awk script can be either read from a script file with the -f option:

    $ awk -f myscript.awk data.txt
    

    … or it can be supplied in-line within single quotes:

    $ awk '{sum+=$2} END {print "sum: " sum " avg: " sum/NR}' data.txt
    

    Conclusion

    I hope this short introduction helped you add awk to your toolbox if you weren’t familiar with awk yet. Awk is a neat alternative to full-blown scripting languages like Python and Perl for simple text processing tasks.