Shell Metaprogramming
Unix epoch is a great way to encode time, since we don't need to do complicated parsing. So it's not suprised to use it as environment variable. On the other hand it's not great for human to read. But I still need to verify and report that environment configuration with my coworker by decoding back to readable format. After searching the internet, I found this snippet on QnA forum (Sorry, I can't remember the exact link).
sed 's/^/echo "/; s/\([0-9]\{10\}\)/`date -d @\1`/; s/$/"/' | bash
I am pretty busy back then, I use it without thinking about it much. Now I have time to not only trying to understand, but to talk about it. There are four main program in that snippet, that is echo
, date
, bash
, and sed
. Three of them are auxiliary:
echo
: printing to terminaldate
: shell utility to process datebash
: running the command
The last one, or sed
is our main focus. It's a abbreviation of stream editor
and heavily used regex to manipulate string. One way to think about about programing (especially Funtional Programming), is about transforming something. Mark that I use word something instead of data, since you can argue about difference between code and data.
From asking some of my friend, they see code as the "verb" and data as the "object" in a sentence structure. I am not outright saying that this is wrong, even an entire paradigm exist trying to organize both of these (I am looking at you OOP). But, how about we see a "source code" as input data for a compiler to transform into machine code, and "configuration data" as specific code to control behaviour of a program?
We can transform something as data then run it as code. Or you can forget both term altogether and just transform something. By thinking this way, it's easier to work using shell, since sed
pretty good at transforming something. And I don't think full blown OOP support will be implemented in shell language anytime soon.
I know, you can setup your own scripting language to do the job. But sometimes doing this isn't worth it for one time use. Not to mention if we work on newly fresh server, without our favourite tools installed.
Decode the Encoded
Now, back to our "decoding the encoded time" problem. One way to do it is using date
command with -d @
flag to convert unix epoch to readable format. Try to run date -d @1
on your CLI, this will echo readable time, one second more relative to epoch time. You can use TZ
environment variable to set the timezone.
Suppose we have a string that include multiple unix epoch timestamp. Extracting and converting it one by one is a tedious process. And this method is error prone to human error, since by extracting it, we will lose the context about the location of that timestamp in the string. So, we want in place conversion (or substitution). The end result won't be exactly the same with the snippet, but still demonstrate my point.
We can embed command inside string using $(command)
or `command`
. My preference is the first method. For example, to embed date, we can use echo "Date: $(date -d @1)"
. So, the first thing to do is to replace all unix epoch timestamp into command substitution.
This can be done using sed -E 's/([0-9]+)/$(date -d @\1)/g'
. Here the explanation:
- Flag
-E
used for extended regex, so we don't need to escape the parentheses - Subtitution has syntax of
s/PATTERN/REPLACEMENT/OPTIONS
- Pattern
([0-9]+)
means match at least a digit and take more. This parentheses called group and can be referenced at replacement stage - Replacement
$(date -d @\1)
can be taken literally, but with reference to\1
, or the first matched group. This called backreference - Options
g
to allow replace multiple occurence in one line
You can learn to create your own substitution with regex101. You can use |
, or pipe operator to test it. This operator will use previous command output as input for current command. If we try it, it will print substituted string. But we haven't got our readable date yet. And if we try to execute it, we will get error, because the whole string treated as command.
> echo "A Date: 123 \nAnother Date: 456" | sed -E 's/([0-9]+)/$(date -d @\1)/g'
A Date: $(date -d @123)
Another Date: $(date -d @456)
> echo "A Date: 123 \nAnother Date: 456" | sed -E 's/([0-9]+)/$(date -d @\1)/g' | bash
bash: line 1: A: command not found
bash: line 2: Another: command not found
This is why we need echo
, to treat the input as string and print it back to terminal. Make sure to wrap the string inside quote to preserve whitespace. We just need to add echo "
at the start of the line, and "
at the end of the line. One way to do it is by chaining multiple sed
command using pipe operator.
There is another way to do it, since sed
support chaining substitution. Multiple subtitution can be done by adding ;
, or semicolon in between substitution. Start and end of the line in the regex defined as ^
and $
respectively. Implementing this, we got:
> echo "A Date: 123\nAnother Date: 456" | sed -E 's/([0-9]+)/$(date -d @\1)/g; s/^/echo \"/; s/$/\"/' | bash
A Date: Thu Jan 1 00:02:03 UTC 1970
Another Date: Thu Jan 1 00:07:36 UTC 1970
And that's what we want. (If the string include "
, or quotes, this command won't work. You can try to solve this by escaping that character using sed
as transformation)
Renaming Batch File
Transforming string for readability isn't really that exciting. Let's try to do something with some impact. I have Jellyfin server on my Homelab. After acquiring the media, I need to follow it's naming system. Renaming file one by one will take a while, but we know about some pattern in original filename, for example it includes season and episode number in it's filename. Here is my approach:
ls -1 | grep mkv | sed -E 's/(.*)S([0-9]+)E([0-9]+)(.*)/mv \"&\" \"Title-S\2E\3.mkv\"/' | bash
I added grep
to filter the files in the folder, using multiple group then backreference, and &
in replacement symbol mean the full input string. Remember, each line processed separately here. This method good enough for me, and I don't need to setup complicated environment to do this.
Metaprogramming?
Previously, we just do metaprogramming. We have some data, transform it, then use it as a program. Isn't that what C macro is? Not only for scripting, but we can abstract away repetition and use metaprogramming. Sometimes people doing it too much, sacrificing code readability, and this what we call macro abuse. In case of one time scripting, this isn't really a problem, since the script aren't meant to be maintainable.