2016-03-19
This is my first variable post. By variable post I mean a post that might
change at every recompilation of the website.
I had this idea when thinking about getting a word count of the project.
Instead of just computing the word count for myself, I decided to make it into
an always up-to-date blog post.
Basic statistics
Word count |
36644 |
Post count |
51 |
Page count |
71 |
Mean word count per post |
244 |
Mean word count per page |
340 |
Extrapolated statistics
The average English reader would take 123
minutes to read the whole website, whose content amount is equivalent to that of
183 songs or
7 short stories.
2016-03-14
This post covers fundamental cache concepts such as locality of reference,
mappings, and why caching is important.
Caching basics
Caching is used to improve CPU performance. Main memory is far from the CPU and
also operates in a much lower frequency. Storing the data and the code that the
processor is most likely to use next close to it and in a faster memory
improves its overall performance.
It is worth noting that some designs split the cache into I-cache (instruction
cache) and D-cache (data cache).
Locality of reference
Caching is based on locality of reference. This means that usually access
patterns exhibit temporal locality (data that has already been requested is
likely to be requested again soon) and spatial locality (data physically near
data that has been accessed recently is more likely to be accessed).
Notes on locality of reference
In some situations, temporal locality works against us. For instance, when
playing media files storing already played memory blocks in the cache is usually
is a waste of space.
While temporal locality implies spatial locality (reused data always presents
some physical pattern), spatial locality does not imply temporal locality. Take
for instance different functions in a business application. The user may never
request more than two of the three hundred available operations, rendering most
of the caching that could be performed based on spacial patterns useless.
Many applications have very different memory access patterns for code and data.
Take, for instance, a media player which has an excellent temporal locality for
code (the code that renders a frame is always reused) but a terrible spatial
locality for data (the bytes that make up a frame are hardly ever useful
again). This explains why some designers split the cache into I-cache and
D-cache as mentioned above.
Tag RAM
When the CPU requests a byte from a particular block from RAM, it needs to
determine if it is in the cache or not and, if it is, where it is. For this
reason there exists a piece of memory called a tag associated with each frame
of the cache. This piece of information allows the CPU to know about the blocks
currently held in that cache frame. This memory needs to be very fast in order
to reduce the overhead of cache lookup.
Mappings
Essentially, there are three ways to use tags to map RAM blocks to block frames
in the CPU cache.
Fully Associative Mapping
This is arguably the simplest possible mapping scheme. This mapping allows for
any memory block to occupy any cache frame. This simplicity comes at a price: in
order to determine whether or not a memory block is in the cache, the CPU has to
search all of the cache frames.
Direct Mapping
In direct mapping, each cache frame can only contain a memory block from a
subset of all memory blocks. The memory is evenly divided among the available
cache frames, in such a manner that if the CPU needs to fetch a certain memory
block there is only one cache frame it needs to check. This method solves the
problem of lengthy cache scans of fully associative mapping at the expense of
not allowing memory blocks that are mapped to the same cache frame to be in the
cache at the same time, which can severely hinder performance in some scenarios.
N-Way Set Associative Mapping
The third type of cache mapping tries to lessen the conflicts introduced by
direct mapping. This method may be seen as a mixture of both methods mentioned
above. In set associative mapping, there are multiple cache blocks in each cache
frame, and each cache frame is responsible by a subset of main memory. This way
the same memory block has more than one cache block it can be assigned to and
the chance of it evicting another relevant block is reduced. One can imagine,
for the sake of visualization, set associative mapping as multiple smaller fully
associative mappings.
This is the most commonly used form of mapping.
Cache misses
Cache miss is a state where the data requested by the CPU is not found in the
cache. There are basically three types of cache miss:
Compulsory miss
This is what the miss is called because the sought after data was never loaded
into cache. This type of miss is also known as a cold miss.
Conflict miss
A conflict miss occurs when the CPU tries to find a memory block in the
corresponding cache frame but the frame is already filled with other memory
blocks.
Capacity miss
This last type of miss happens when the block containing the needed data has
already been evicted from the cache.
Writing policies
So far only reading from cache has been discussed. However, data is also being
written as most computers store the results of their computations.
There are essentially two writing policies for caches, and different
combinations of them may be used. Write-through is the simplest one and the
safest, from a multithreaded point of view. It consists of writing changes to
all levels of the cache back to main memory. The other solution is
write-back, which consists of only writing the changes to the cache level
below once the cache block is evicted. This model makes guaranteeing proper
synchronization much more complex in order to improve performance.
2016-02-16
Last weekend I was determined to solve Dungeon’s licensing issues. However, after seeing how much work
figuring out in which year each file was modified from Git history would be, I
decided to take a simpler route that I believe will make Dungeon an even better
project. I changed to the BSD 3-Clause license, which seems to accept having a
single license notice for the whole project. Thus, individual files no longer
need their own preambles.
From the GNU
website:
You should put a notice at the start of each source file, stating what
license it carries, in order to avoid risk of the code’s getting disconnected
from its license.
Which is a good point. However, it does not really apply to Dungeon. Most of
the code is very project-specific and wouldn’t be of much use elsewhere.
Also, if you want to copy a generic solution, such as our CircularList
and
never give back, I am OK with it and hope you make good use of my code.
The mentioned page also states that
This has nothing to do with the specifics of the GNU GPL. It is true for any
free license.
As I said, I don’t have enough reasons to do this as I am already using a
permissive license. One could argue that the no warranty clause should always
be present, but I don’t think it is an important issue right now, as to make me
responsible for damage caused by my software would require acknowledging that
the code came from the repository, which has a license note of the BSD 3-Clause
license.
I’ve come to like permissive licenses a lot more in the last two years. I think
that the most open and free the code is, the more value it has as more people
may benefit from it. And this may be more easily achieved by using permissive
(non copyleft) licenses, such as the BSD licenses.
In the end, I just hope this is for the best of my species and that everyone
can benefit, or, at least, not be damaged by the license change.
2016-02-08
I believe that data should never be lost. Specially data that are costly to make
and have memory value. Texts, drawings, photographs, songs, programs, and videos
should always be preserved, even if you are not interested on them anymore.
They may live in a compressed file inside that old 512 GB HDD you hardly ever
mount, but you have no reason to erase them. In fact, I think you should store
them more safely than that, as there are several reasons to preserve them. Time
flies, and as you see yourself aging you may want to remember how life was when
you were younger. Not just by one yellowed picture. But by what you created.
What you produce in your limited lifetime is your legacy. All of it. It
tells about you. It describes you in different stages of your life and should be
safely stored for remembering moments that - no matter how sad or how happy -
made up your existence. This post is to highlight the importance of writing,
drawing, photographing, composing, creating, and, ultimately, preserving.
Not content just for the sake of it, but your life, for the importance of it.
Even if you don’t care, maybe your family and close relatives one day will.
Maybe historians years after from now will. Maybe another species someday will.
Lastly, I would ask you to keep as much as you find convenient of this data
public. Share it, present it, distribute it, publish it, index it, so you
can live for much more than your organic survival and let what you created
during your lifetime contribute to the lives of others.
2016-01-31
Producing a truth table from a boolean expression is a trivial task, as you
just need to evaluate the expression for each combination of inputs.
However, the opposite is not so simple.
A way to convert a truth table into a boolean expression is writing an
expression that evaluates to true for each and every row that evaluates to true
and chain them together with the OR operator. This expression may later be
simplified by using logical identities.
Note that this does not give you the simplest possible boolean expression,
which is an NP-complete problem.
Example
Let us obtain a boolean expression from the following truth table
A |
B |
C |
= |
0 |
0 |
0 |
0 |
0 |
0 |
1 |
0 |
0 |
1 |
0 |
0 |
0 |
1 |
1 |
1 |
1 |
0 |
0 |
0 |
1 |
0 |
1 |
1 |
1 |
1 |
0 |
1 |
1 |
1 |
1 |
1 |
Step 1 - Write an expression for each row that evaluates to true
¬A AND B AND C
A AND ¬B AND C
A AND B AND ¬C
A AND B AND C
Step 2 - Join them all with the OR operator
¬A AND B AND C OR
A AND ¬B AND C OR
A AND B AND ¬C OR
A AND B AND C
Step 3 - Simplify to the best of your ability
¬A AND B AND C OR
A AND ¬B AND C OR
A AND B AND ¬C OR
A AND B AND C
-----------------------
¬A AND B AND C OR
¬B AND A AND C OR
¬C AND A AND B OR
A AND B AND C
-----------------------
(A AND B) OR (A AND C) OR (B AND C)
Your efficiency on step 3 will depend on several factors such as your
persistence, previous practice, and pattern recognition skills.