2016-01-24
I have been working with regular expressions a lot lately and thought I should
share some ideas on how to make regular expressions faster in Java. Most of this
ideas are quite simple yet may give you very good improvements in practice.
Low hanging fruits
Compile Pattern
s you use frequently.
Use small capture groups to prevent long backtracking.
Whenever possible, use non-capturing groups. (?:P)
instead of (P)
.
Reluctant, greedy, and possessive quantifiers
Usually, reluctant quantifiers will run faster than greedy quantifiers. I am
talking about +?
and *?
being faster than +
and *
. When the engine finds
a greedy quantifier, it will go for the biggest possible match and start
backtracking. In big strings this is usually extremely slow if you are trying to
match a small substring (which is often the case), so going for reluctant
quantifiers - that “grow” the match - instead of greedy ones - that “shrink” the
substring they are trying to match - is very likely to improve performance.
Possessive quantifiers improve regular expression performance a lot. Therefore,
you should use them whenver you can. P?+
, P*+
, P++
will match the pattern
as any greedy quantifier would, but will never backtrack once the pattern has
been matched, even if refusing to do so means that whole expression will not
match.
2016-01-22
This is a very short post to show one of my favorite system monitoring tools:
NetHogs. If you are on a Linux box, it should be fairly easy to get it no
matter what is your Linux distribution.
Once you have it, just run it as root and you will have a minimalistic,
readable, and very useful list of which processes are consuming your bandwidth
and how much of it each process is using.
You can also run it with the -v1
flag to get the sum of KB instead of the
KB/s rate of each process.
2016-01-21
When working on dungeon license
notes I came to notice that we are in 2016 and this means that all files that I
change right now require their license notes to be updated.
I wondered if I actually needed that, so I went to Programmers Stack
Exchange and asked about
it.
I got quite helpful answers and comments within a few hours. Essentially, it
boils down to
- Copyright expires, therefore you must provide a license date.
- The format
Copyright (C) [years] [name]
should not be changed.
Therefore, I must set up Git Hooks that will update all license notes of the
files with a list of years in which the file has been modified.
The solution will likely be open-sourced so it benefits a bigger group of
people.
2016-01-20
In this post I briefly describe how good text justification is done using
dynamic programming. In the future, I may post code for this problem.
Start by writing a ‘badness’ function that produces a value corresponding to
how bad it is to have a certain set of words forming a line.
LaTeX uses the following formula for badness
badness(i, j) = (page_width - total_width)³ if [i, j] fits
∞ otherwise
where
i is the index of the first word of the line
j is the index of the last word of the line
page_width is the width of the document page
total_width is the width occupied by the letters of the words [i, j]
Then, by using dynamic programming techniques, you need to find the line
combination that gives the minimum badness sum.
2016-01-19
This post is specific to ext3 and ext4 partitions.
Yesterday I accidentally invoked bunzip2
without the --keep
flag. This
made me lose the original, compressed file. It would be complicated to get it
back, therefore I used to extundelete
to recover it. This post gives an
example of the usage of extundelete
.
First of all, I needed to have the partition mount as read-only, but I was
using it to run my operating system, so I had to reboot and start from a live
image.
Once I had it running, getting extundelete
was easy
sudo dnf install extundelete
After that, I simply invoked
sudo extundelete --recover-file <path-to-file> <device>
In my case,
sudo extundelete --recover-file mg/wiki/dump.bz2 /dev/sda7
Finally, I copied the recovered file (found on a RECOVERED_FILES
directory)
to its final destination with cp
and was done with it.
Hopefully this will be helpful to you someday.