Secret Codes And SQL Server, Part 1: Writing Our Own Encryption Algorithm and Cracking it.

ZenMate

After my last post on  The Arcane Science of Bitwise Logic and SQL Server I was asked a couple of questions on where it’s used so I thought I’d write a follow up post.  At first I was going to write something on bitmaps but then a thought of another area where bitwise logic is heavily used and that’s in cryptography.

This isn’t really a serious post about SQL Server but rather me playing around and seeing what I could do with it.  Because of that I’m not going to say that what I’ve done is the best way of doing it but it is a way.  There are plenty of encryption options in SQL Server so I’ve got no idea why you’d ever want to write your own but just for fun, that’s exactly what we’re going to look at here.

And then we’ll try to crack it.

Undercover Encryption Version 1: XOR Cypher

As this post was prompted by my post on bitwise logic, we’re going to base our algorithm around the XOR cypher.  Basically this cypher works by taking a key which for simplicity sake will be a single byte and XOR-ing that against the message (or plain text) that we want to encrypt.

Let’s look at an example of how this is going to work.

Let’s say that we want to encrypt the plain text ‘SQLUndercover’.  How are we going to do this?  Firstly we need to remember that all text characters are represented by a single byte as an ASCII code.  ‘SQLUndercover’ is represented by the following set of ASCII codes, 83 81 76 85 110 100 101 114 99 111 118 101 114.

In order to perform encryption or decryption, we’re going to need a key value.  Let’s call that 109.

We’ll now XOR our key against each character of our plain text, the result will make up our encrypted message (or cypher text).  Firstly we need to XOR S or 83 against our key, to see this in action, we’re going to need to look at these values in their binary form.

cypher1

The answer is 00111110 or >.

To decrypt we just have to do the opposite and XOR the cypher text against the key.

cypher2

01010011 just so happens to be 83 or ‘S’

Let’s see this in action in SQL, the following procedure will accept text as @PlainText and a single byte binary value as the key.  The XOR will be performed as described above on the ASCII value of each character and the output printed.


CREATE PROC EncryptData_v1
 @PlainText VARCHAR(MAX),
 @Key BINARY(1)

AS

BEGIN

DECLARE @CypherText VARCHAR(MAX)
 SET @CypherText = ''

WHILE DATALENGTH(@CypherText) < DATALENGTH(@PlainText)
 BEGIN

SET @CypherText = @CypherText + CHAR(ASCII(SUBSTRING(@PlainText,DATALENGTH(@CypherText)+1, 1)) ^ @Key)

END

PRINT @CypherText

END

So to encrypt ‘SQLUndercover’ as above we’d need to run the following code and we’ll use the key of 45

 EXEC EncryptData_v1 'SQLUndercover', 45[/code"]</pre>
The resultant cypher text is <strong><span style="color: #ff6600;">~|axCIH_NB[H_ </span></strong>

To decrypt is a simple matter of running the same proc but this time we need to plug in the cypher text instead of the plain text.
<pre>
<pre> EXEC EncryptData_v1 '~|axCIH_NB[H_', 45 

...and out pops our decrypted text, SQLUndercover

Cool!

But it seems that the clever boys at MI5 have managed to intercept and decrypt our message.

But how?

See, this algorithm has one big weakness and that's the key.  We're using a single byte key which means that there's only 256 possible key values.

Brute Force Attack

cypher3

Our algorithm is very vulnerable to what's known as a brute force attack, we can easily just work our way through all the possible key values until we get something that looks sensible.


DECLARE @Key BINARY
SET @Key = 0
WHILE @Key <= 255
BEGIN
EXEC EncryptData_v1 '~|axCIH_NB[H_', @Key
SET @Key = @Key + 1
END

It really doesn't take much to scan through the results to find what we're looking for.

...
ZXE\gml{jfl{
[YD]flmzkg~mz
XZG^eonyhd}ny
Y[F_dnoxie|ox
VTIPka`wfjs`w
WUHQj`avgkrav
TVKRicbudhqbu
UWJShbcteipct
RPMToedsbnwds
SQLUndercover
PROVmgfq`lufq
QSNWlfgpamtgp
NLQHsyxo~rkxo
OMPIrxynsjyn
LNSJq{zm|pizm
MORKpz{l}qh{l
JHULw}|kzvo|k
KITMv|}j{wn}j
HJWNu~ixtm~i
IKVOt~hyulh
...

Drat, we've been foiled!

How can we make this code harder to crack?  We could increase the length of the key, the longer the key is the more possible values a brute force attack would have to go through and the slower it would be.  Another option to make this more complex would be to pair it up with a different form of encryption.

Undercover Encryption Version 2: Caesar Cypher

There's another type of cypher that was invented by the Roman Emperor, known as the Caesar Cypher.  The Caesar cypher works by simply shifting the characters of the plain text a set number of letters up or down the alphabet.

For example, to encrypt 'SQLUndercover' we could simply shift by two letters and get 'USNWpfgteqxgt'.

Now let's be honest, that's no safer than our last attempt and is still very vulnerable to a brute force attack.  But what if we were to combine the two approaches?  What if we were to shift the character and then apply the key?  Lets look at the numbers...

There are 256 possible ASCII values so that's 256 possible shift positions.  There are 256 possible key values which means that all in all that's 256 * 256 =  65,536 possible shift and key combinations, that's makes our new algorithm far stronger than when we were just using XOR.  OK, I know that MI5 wouldn't have too much trouble brute force attacking this and I reckon even the CIA could manage it but for the sake of this demo I'm not going to go through over 65,000 possible iterations and will just say that it's strong enough to withstand a brute force attack.

So let's implement this in SQL,  this time we have to take a slightly different approach when encrypting then when decrypting because the character shift needs to be in a different direction.


CREATE PROC [dbo].[EncryptData_v2]
@PlainText VARCHAR(MAX),
@Key BINARY(1),
@Shift INT,
@Decrypt BIT = 0,
@CypherText VARCHAR(MAX) = '' OUTPUT
AS
BEGIN
  SET @CypherText = ''

  WHILE DATALENGTH(@CypherText) < DATALENGTH(@PlainText)
  BEGIN
     IF @Decrypt = 0
     SET @CypherText = @CypherText + CHAR((ASCII(SUBSTRING(@PlainText,DATALENGTH(@CypherText)+1, 1))+ @Shift) ^ @Key)
     ELSE
     SET @CypherText = @CypherText + CHAR((ASCII(SUBSTRING(@PlainText,DATALENGTH(@CypherText)+1, 1)) ^ @Key) - @Shift)
  END

PRINT @CypherText
END

Let's have a look at how this is working in practice.  We give the proc our plain text and key as usual but this time we also have to give it the shift as well as set the decrypt flag.  The flag is because encryption and decryption are handled differently as mentioned above.  Set it to 1 to encrypt and 0 to decrypt.


EXEC EncryptData_v2 'SQLUndercover', 45, 3, 0

The result, {ybu\JEXK_TEX

and now to decrypt..


EXEC EncryptData_v2 '{ybu\JEXK_TEX', 45, 3, 1

SQLUndercover

As with before, you need both the correct key and shift values to decrypt the message.  So there we have it, a secure encryption algorithm that's resilient against brute force attacks.

...hold on one cotton picking minute, while we seem to have the CIA stumped those fellas at MI5 are on to use again.  How'd they do it this time?

Statistical Analysis Attack

cypher4

We thought that our algorithm was secure but it's got one big weakness and that is that that it's a straight substitution algorithm, what that means is that every encrypted character will always correspond exactly to its unencrypted counterpart.  For example, when we encrypted SQLUndercover, the S was encrypted as {, every time S is encrypted, it'll be {.  The problem with that of course is that if we can work out what a character represents once, it'll be the same throughout the encrypted text.

How do we go about exploiting this weekness?

Through a statistical analysis attack.

There are certain letters and patterns in the English language that crop up more often than others.  By hunting out these patterns we've got a good chance of figuring out what the encrypted characters correspond to.

To see this in action we're going to need to encrypt a bigger message.  We're going to use a snippet from the BBC News.


EXEC EncryptData_v2 'The author of a government review into work practices is calling for the end of the "cash in hand economy".
Matthew Taylor, whose report is out on Tuesday, said cash jobs like window cleaning and decorating were worth up to £6bn a year, much of it untaxed.
Instead, the work should be paid through "payment platforms", Mr Taylor told BBC economics editor Kamal Ahmed.
The review, commissioned by Theresa May, also tackles low-paid work, zero hours contracts and the gig economy.
Mr Taylor, who is chief executive of the Royal Society of Arts and a former Tony Blair advisor, is set to call for cash jobs to be paid through platforms such as credit cards, contactless payments and PayPal.
This would make it harder for customers and workers to avoid paying tax.
Properly protected
The recommendations are part of a much wider review into modern working practices, including the gig economy.
Mr Taylors report recommends a new category of worker called a "dependent contractor", who should be given extra protections by firms like Uber and Deliveroo.
It also says low-paid workers should not be "stuck" at the minimum living wage or face insecurity.
Minimum wage push for gig economy workers
Deliveroo moves on benefits for riders
What is the gig economy?
Speaking at its launch, the prime minister will say the Taylor report confronts issues that "go right to the heart of this governments agenda and right to the heart of our values as a people".
Mrs May will say: "I am clear that this government will act to ensure that the interests of employees on traditional contracts, the self-employed and those people working in the gig economy are all properly protected."',
 10, 2

Which give us the cypher text...

\`m(i}|`{~({b(i(c{rm~zemz|(~mrams(az|{(s{~g(x~io|aom(a(oiddazc(b{~(|`m(mzl({b(|`m(.oi`(az(`izl(mo{z{eq.:Ei||`ms(\iqd{~$(s`{m(~mx{~|(a({}|({z(\}mliq$(ial(oi`(f{n(dagm(sazl{s(odmizazc(izl(lmo{~i|azc(sm~m(s{~|`(}x(|{(¯2nz(i(qmi~$(e}o`({b(a|(}z|ipml:Az|mil$(|`m(s{~g(`{}dl(nm(xial(|`~{}c`(.xiqemz|(xdi|b{~e.$(E~(\iqd{~(|{dl(NNO(mo{z{eao(mla|{~(Gieid(I`eml:\`m(~mrams$(o{eeaa{zml(nq(\`m~mi(Eiq$(id{(|iogdm(d{s%xial(s{~g$(vm~{(`{}~(o{z|~io|(izl(|`m(cac(mo{z{eq:E~(\iqd{~$(s`{(a(o`amb(mpmo}|arm({b(|`m(^{qid(_{oam|q({b(I~|(izl(i(b{~em~(\{zq(Ndia~(ilra{~$(a(m|(|{(oidd(b{~(oi`(f{n(|{(nm(xial(|`~{}c`(xdi|b{~e(}o`(i(o~mla|(oi~l$(o{z|io|dm(xiqemz|(izl(XiqXid:\`a(s{}dl(eigm(a|(`i~lm~(b{~(o}|{em~(izl(s{~gm~(|{(ir{al(xiqazc(|ip:X~{xm~dq(x~{|mo|ml\`m(~mo{eemzli|a{z(i~m(xi~|({b(i(e}o`(salm~(~mrams(az|{(e{lm~z(s{~gazc(x~io|aom$(azod}lazc(|`m(cac(mo{z{eq:E~(\iqd{~(~mx{~|(~mo{eemzl(i(zms(oi|mc{~q({b(s{~gm~(oiddml(i(.lmxmzlmz|(o{z|~io|{~.$(s`{(`{}dl(nm(carmz(mp|~i(x~{|mo|a{z(nq(ba~e(dagm(]nm~(izl(Lmdarm~{{:A|(id{(iq(d{s%xial(s{~gm~(`{}dl(z{|(nm(.|}og.(i|(|`m(eazae}e(darazc(sicm({~(biom(azmo}~a|q:Eazae}e(sicm(x}`(b{~(cac(mo{z{eq(s{~gm~Lmdarm~{{(e{rm({z(nmzmba|(b{~(~alm~S`i|(a(|`m(cac(mo{z{eqK_xmigazc(i|(a|(di}zo`$(|`m(x~aem(eaza|m~(sadd(iq(|`m(\iqd{~(~mx{~|(o{zb~{z|(a}m(|`i|(.c{(~ac`|(|{(|`m(`mi~|({b(|`a(c{rm~zemz|(icmzli(izl(~ac`|(|{(|`m(`mi~|({b({}~(rid}m(i(i(xm{xdm.:E~(Eiq(sadd(iq6(.A(ie(odmi~(|`i|(|`a(c{rm~zemz|(sadd(io|(|{(mz}~m(|`i|(|`m(az|m~m|({b(mexd{qmm({z(|~ila|a{zid(o{z|~io|$(|`m(mdb%mexd{qml(izl(|`{m(xm{xdm(s{~gazc(az(|`m(cac(mo{z{eq(i~m(idd(x~{xm~dq(x~{|mo|ml:.

How do we go about unscrambling that?!

First of all we need to figure out the frequency of the letters, we'll create a temp table to hold this as well holding what we think the predicted plain text value is.

CREATE TABLE #CharacterFrequency
(Character VARCHAR(10),
Frequency INT,
FrequencyRank INT,
PredictedCharacter CHAR(1))

We then need to work our way through the string counting up the number of occurrences of each character.

By loading the cypher text into the variable @text we can run the following to populate our #CharacterFrequency table.


DECLARE @frequency INT
DECLARE @Character VARCHAR(10)
DECLARE @TempText VARCHAR(MAX) = @text

--get character frequency
WHILE DATALENGTH(@TempText)
BEGIN
  SET @Character = SUBSTRING(@TempText, 1,1)
  SET @frequency = DATALENGTH(@TempText) -DATALENGTH(REPLACE(@TempText,@Character,''))

  INSERT INTO #CharacterFrequency(Character,Frequency)
VALUES(@Character,@frequency)

  SET @TempText = REPLACE(@TempText,@Character,'')
END
--find frequency ranks
UPDATE #CharacterFrequency
SET FrequencyRank = a.FrequencyRank
FROM(SELECT Character, ROW_NUMBER() OVER (ORDER BY Frequency DESC) AS FrequencyRank
FROM #CharacterFrequency) a
JOIN #CharacterFrequency b ON a.Character = b.Character

Now that we've got a list of characters and how often they appear, we can start doing some analysis.

cypher5

One of the things that we do know is that the most common ASCII character that we're likely to be seeing is the space.  Looking at the frequency table, it's fairly safe to assume that '(' represents a space in our cypher text.  Let's update the table and we're on our way to cracking this thing (I'm not going to give you the code for updating the table, I assume that you can do that and besides, this post is getting far too lang as it is...).

Now that we can identify the spaces, we can start looking for specific letter combinations.  The simplest thing to look for now is the most common single letter word by far, 'a'.

We need to count up all sequences of SPACE LETTER SPACE, the most common letter should be an 'a'


DECLARE @Counter INT = 1
SET @TempText = @text

WHILE DATALENGTH(@Text) >= @Counter
BEGIN
SET @Character = SUBSTRING(@TempText, @Counter,3)
IF @Character LIKE (@Space + '_' + @Space)
BEGIN
IF NOT EXISTS (SELECT 1 FROM #CharacterFrequency WHERE Character = @Character)
BEGIN
SET @frequency = (DATALENGTH(@TempText) - DATALENGTH(REPLACE(@TempText,@Character,''))) / 3
INSERT INTO #CharacterFrequency(Character,Frequency)
VALUES(@Character,@frequency)
END
SET @TempText = @text
END
ET @Counter = @Counter + 1
END

We can easily check for that sequence in our table...


 SELECT *
 FROM #CharacterFrequency
 WHERE DATALENGTH(Character) = 3

cypher6

From that I think it's quite clear to see that 'i' represents an 'a'.  We can update our #CharacterFrequency table with that one.

This was only supposed to be a quick introduction into writing an encryption algorithm in SQL so I'm not going to carry on demonstrating as I'm sure that you've got the idea and we'll be here all day but you can continue looking for common letter combinations and gradually piece together the puzzle.

Combinations to look for are...

To identify 'n' and 'd' - the most common three letter word beginning with 'a' is 'and'
The most common three letter word is 'the'
When you know 't' you should be able to find 'to' and therefore identify 'o'
When you know 'n' and 't' we can hunt for 'it' and 'in' to identify 'i'

Once we've got those worked out we can substitute them into our encrypted message.


DECLARE @CryptString VARCHAR(MAX) = @text
DECLARE @CryptChar CHAR(1)
SET @Counter = 1

WHILE @Counter <= DATALENGTH(@CryptString)
BEGIN
SET @CryptChar = SUBSTRING(@CryptString, @Counter, 1)
SELECT @CryptString = STUFF(@CryptString,@Counter,1,COALESCE(PredictedCharacter,'*'))
FROM #CharacterFrequency
WHERE Character = @CryptChar

SET @Counter = @Counter + 1
END

PRINT @text
PRINT @CryptString

Let's have a look at just snippet of the cypher text compared to what we now know of the plain text...

|`m(mzl({b(|`m(.oi`(az(`izl

the end o* the **a*h in hand

Even though we haven't got all that many letters yet, the message is already starting to take shape and we're at the point where you can probably start to recognise words and letters by eye.  Here are a couple of examples that I've spotted...

edito* = editor
*ode*n = modern
t*aditiona* = traditional

Once you've worked out a couple of common letters, things really start falling into place.

Drat it again, the second version of our encryption algorithm has been foiled.

Well that was a quick look at creating an encryption routine in SQL Server and then cracking it.  We need to think of something that can't be cracked easily using statistical analysis.

See part 2 at Secret Codes And SQL Server, Part 2: Pimp Your Encryption Algorithms.

5 thoughts on “Secret Codes And SQL Server, Part 1: Writing Our Own Encryption Algorithm and Cracking it.

Add yours

  1. The “One Time Pad” cypher is simply XOR of the plain text with a single-use random key of the same bit length. Implemented properly – never reusing any key sequence – there is no known way to crack it. https://en.wikipedia.org/wiki/One-time_pad

    One Time Pad is impractical mainly for logistical reasons: for security the keys must be truly random (as opposed to pseudo-random), and long key sequences must be communicated securely from the sender(s) to the receiver(s). Historically, that involved couriers carrying tapes of encoding “noise” around the world.

    Like

Leave a comment

Create a website or blog at WordPress.com

Up ↑