/[theodore]/pyBB/modules/dataenc.py


UCC Code Repository

Contents of /pyBB/modules/dataenc.py

Parent Directory Parent Directory | Revision Log Revision Log


Revision 1 - (show annotations) (download) (as text)
Tue Jan 29 14:32:01 2008 UTC (12 years, 2 months ago) by svn-admin
File MIME type: text/x-python
File size: 48756 byte(s)
Re-import of repository after repository database corruption.

1 #13-09-04
2 # v1.1.5
3
4 # **build 3**
5
6 # dataenc.py
7 # password encoding and comparing module
8 # The strength of the SHA hashing module - in an ascii safe, timestamped format.
9
10 # uses a binary to ascii encoding
11 # and will timestamp the encodings as well.
12 # (Binary watermarking of data).
13
14 # used in a CGI called 'custbase' - to check logins are correct
15 # (For the CGI to function it needs the users password (or it's SH5 hash) to be encoded into
16 # each page as a hidden form field. This exposes the encrypted password in the HTML source of each page.
17 # This module provides functions to interleave a timestamp *into* the hash.
18 # Even if the encoded 'timestamped hash' is extracted from the HTML source, the CGI can tell
19 # that the password has expired).
20
21
22 # Contains functions to :
23
24 # do binary to ascii encoding using a TABLE mapping (and ascii to binary)
25 # binary interleave - to disperse one set of binary data into another (e.g. as a 'watermark' or date/time stamp)
26 # to extract the watermark again
27 # convert a decimal value to base 64 digits, and base 64 digits back to 8 bit digits..
28 # Creating and retrieving a timestamp from the current time/date
29 # functions for testing and setting bits in a byte (or larger value)
30 # (including a bitwise operator object that comes from the python cookbook
31 # and is no longer used here, but included for reference).
32
33 # Wrapping all that together to return an ascii encoded, date stamped SHA hash of a string
34
35
36 # Copyright Michael Foord 2004
37 # dataenc.py
38 # Functions for encoding and interleaving data.
39
40 # http://www.voidspace.org.uk/python/modules.shtml
41
42 # Released subject to the BSD License
43 # Please see http://www.voidspace.org.uk/documents/BSD-LICENSE.txt
44
45 # For information about bugfixes, updates and support, please join the Pythonutils mailing list.
46 # http://voidspace.org.uk/mailman/listinfo/pythonutils_voidspace.org.uk
47 # Comments, suggestions and bug reports welcome.
48 # Scripts maintained at http://www.voidspace.org.uk/python/index.shtml
49 # E-mail fuzzyman@voidspace.org.uk
50
51
52 """
53 DOCS for dataenc as a module
54
55 When run it should go through a few basic tests - see the function test()
56
57 This module provides low-level functions to interleave two bits of data into each other and separate them.
58 It will also encode this binary data to and from ascii - for inclusion in HTML, cookies or email transmission.
59
60 It also provides high level functions to use these functions for time stamping passwords and password hashes,
61 and also to check that a 'time-stamped hash' is both valid and unexpired.
62
63 The check_pass function is interesting. Given an encoded and timestamped hash it compares it with the hash (using SD5) of a password.
64 If it matches *and* is unexpired (you set the time limit) it returns a new encoded time stamp of the hash with the current time.
65 I use this for secure, time limited, logins over CGI. (Could be stored in a cookie as well).
66 (On the first login you will need to compare the password with the stored hash and use that to generate a time stamped hash to include in the page returned.
67 Thereafter you can just use the check_pass function and include the time-stamped hash in a hidden form field for every action.)
68
69 The binary data is interleaved on a 'bitwise' basis - every byte is mangled.
70
71 --
72
73 CONSTANTS
74
75 The main constant defined in dataenc.py is :
76
77 TABLE = '_-0123456789' + \
78 'abcdefghijklmnopqrstuvwxyz'+ \
79 'NOPQRSTUVWXYZABCDEFGHIJKLM'
80 TABLE should be exactly 64 printable characters long... or we'll all die horribly
81 Obviously the same TABLE should be used for decoding as for encoding....
82 note - changing the order of the TABLE here can be used to change the mapping.
83 Versions 1.1.2+ of TABLE uses only characters that are safe to pass in URLs
84 (e.g. using the GET method for passing FORM data)
85
86 OLD_TABLE is the previous encoding map used for versions of dataenc.py previous to 1.1.2
87 See the table_dec function for how to decode data encoded with that map.
88
89 PSYCOIN = 1
90 This decides if we attempt to import psyco or not (the specialising compiler). Set to 0 to not import.
91 If we attempt but fail to import psyco then this value will be set to 0.
92
93 DATEIN = 1
94 As above but for the dateutils and time module.
95 We need to import dateutils for the expired and pass_enc functions (amongst others) to work fully.
96
97
98 FUNCTIONS
99
100 Following are the docstrings extracted from the public functions :
101
102 pass_enc(instring, indict = {}, **keywargs)
103 Returns an ascii version of an SHA hash or a string, with the date/time stamped into it.
104 e.g. For ascii safe storing of password hashes.
105
106 It also accepts the following keyword args (or a dictionary conatining the following keys).
107 (Keywords shown - with default values).
108
109 lower = False, sha_hash = False, daynumber = None, timestamp = None, endleave = False
110
111 Setting lower to True makes instring lowercase before hashing/encoding.
112
113 If sha_hash is set to True then instead of the actual string passed in being encoded, it's SHA hash
114 is encoded. (In either case the string can contain any binary data).
115
116 If a daynumber is passed in then the daynumber will be encoded into the returned string.
117 (daynumber is an integer representing the 'Julian day number' of a date - see the dateutils module).
118 This can be used as a 'datestamp' for the generated code and you can detect anyone reusing old codes this way.
119 If 'daynumber' is set to True then today's daynumber will automatically be used.
120 (dateutils module required - otherwise it will be ignored).
121
122 Max allowed value for daynumber is 16777215 (9th May 41222)
123 (so daynumber can be any integer from 1 to 16777215 that you want to 'watermark' the hash with
124 could be used as a session ID for a CGI for example).
125
126 If a timestamp is passed in it should either be timestamp = True meanining use 'now'.
127 Or it should be a tuple (HOUR, MINUTES).
128 HOUR should be an integer 0-23
129 MINUTES should be an integer 0-59
130
131 The time and date stamp is *binary* interleaved, before encoding, into the data.
132
133 If endleave is set to True then the timestamp is interleaved more securely. Shouldn't be necessary in practise
134 because the stamp is so short and we subsequently encode using table_enc.
135 If the string is long this will slow down the process - because we interleave twice.
136
137
138 pass_dec(incode)
139 Given a string encoded by pass_enc - it returns it decoded.
140 It also extracts the datestamp and returns that.
141 The return is :
142 (instring, daynumber, timestamp)
143
144
145 expired(daynumber, timestamp, validity)
146 Given the length of time a password is valid for, it checks if a daynumber/timestamp tuple is
147 still valid.
148 validity should be an integer tuple (DAYS, HOURS, MINUTES).
149 Returns True for valid or False for invalid.
150 Needs the dateutils module to get the current daynumber.
151
152 unexpired is an alias for expired - because it makes for better tests.
153 (The return results from the expired function are logically the wrong way round, expired returns True if the timestamp is *not* expired..)
154
155
156 check_pass(inhash, pswdhash, EXPIRE)
157 Given the hash (possibly from a webpage or cookie) it checks that it is still valid and matches the password it is supposed to have.
158 If so it returns a new hash - with the current time stamped into it.
159 EXPIRE is a validity tuple to test for (see expired function)
160 e.g. (0, 1, 0) means the supplied hash should be no older than 1 hour
161
162 If the hash is expired it returns -1.
163 If the pass is invalid or doesn't match the supplied pswdhash it returns False.
164 This is a high level function that can do all your password checking and 'time-stamped hash' generation after initial login.
165
166
167 makestamp(daynumber, timestamp)
168 Receives a Julian daynumber (integer 1 to 16777215) and an (HOUR, MINUTES) tuple timestamp.
169 Returns a 5 digit string of binary characters that represent that date/time.
170 Can receive None for either or both of these arguments.
171
172 The function 'daycount' in dateutils will turn a date into a daynumber.
173
174
175 dec_datestamp(datestamp)
176 Given a 5 character datestamp made by makestamp, it returns it as the tuple :
177 (daynumber, timestamp).
178 daynumber and timestamp can either be None *or*
179 daynumber is an integer between 1 and 16777215
180 timestamp is (HOUR, MINUTES)
181
182 The function 'counttodate' in dateutils will turn a daynumber back into a date.
183
184
185 sixbit(invalue)
186 Given a value in it returns a list representing the base 64 version of that number.
187 Each value in the list is an integer from 0-63...
188 The first member of the list is the most significant figure... down to the remainder.
189 Should only be used for positive values.
190
191
192 sixtoeight(intuple)
193 Given four base 64 (6-bit) digits... it returns three 8 bit digits that represent
194 the same value.
195 If length of intuple != 4, or any digits are > 63, it returns None.
196
197 **NOTE**
198 Not quite the reverse of the sixbit function.
199
200
201 table_enc(instring, table=TABLE)
202 The actual function that performs TABLE encoding.
203 It takes instring in three character chunks (three 8 bit values)
204 and turns it into 4 6 bit characters.
205 Each of these 6 bit characters maps to a character in TABLE.
206 If the length of instring is not divisible by three it is padded with Null bytes.
207 The number of Null bytes to remove is then encoded as a semi-random character at the start of the string.
208 You can pass in an alternative 64 character string to do the encoding with if you want.
209
210
211 table_dec(instring, table=TABLE)
212 The function that performs TABLE decoding.
213 Given a TABLE encoded string it returns the original binary data - as a string.
214 If the data it's given is invalid (not data encoded by table_enc) it returns None
215 (definition of invalid : not consisting of characters in the TABLE or length not len(instring) % 4 = 1).
216 You can pass in an alternative 64 character string to do the decoding with if you want.
217
218
219 return_now()
220 Returns the time now.
221 As (HOUR, MINUTES).
222
223
224 binleave(data1, data2, endleave = False)
225 Given two strings of binary data it interleaves data1 into data2 on a bitwise basis
226 and returns a single string combining both. (not just the bytes interleaved).
227 The returned string will be 4 bytes or so longer than the two strings passed in.
228 Use bin_unleave to return the two strings again.
229 Even if both strings passed in are ascii - the result will contain non-ascii characters.
230 To keep ascii-safe you must subsequently encode with table_enc.
231
232 Max length for the smallest data string (one string can be of unlimited size) is about 16meg
233 (increasing this would be easy if anyone needed it - but would be very slow anyway).
234
235 If either string is empty (or the smallest string greater than 16meg) - we return None.
236 The first 4 characters of the string returned 'define' the interleave. (actually the size of the watermark)
237 For added safety you could remove this and send seperately.
238
239 Version 1.0.0 used a bf (bitfield) object from the python cookbook. Version 1.1.0 uses the binary and & and or |
240 operations and is about 2.5 times faster. On my AMD 3000, leaving and unleaving two 20k files took 1.8 seconds.
241 (instead of 4.5 previously - with Psyco enabled this improved to 0.4 seconds.....)
242
243 Interleaving a file with a watermark of pretty much any size makes it unreadable - this is because *every* byte is changed.
244 (Except perhaps a few at the end - see the endleave keyword). However it shouldn't be relied on if you need
245 a really secure method of encryption. For many purposes it will be sufficient however.
246
247 In practise any file not an exact multiple of the size of the watermark will have a chunk at the end that is untouched.
248 To get round this you can set endleave = True.. which then releaves the end data back into itself.
249 (and therefore takes twice as long - it shouldn't be necessary where you have a short watermark.)
250
251 data2 ought to be the smaller string - or they will be swapped round internally.
252 This could cause you to get them back in an unexpected order from binunleave.
253
254
255 binunleave(data)
256 Given a chunk of data woven by binleave - it returns the two seperate pieces of data.
257
258
259 For the binary operations of binleave and binunleave, version 1.0.0 used a bf (bitfield) object from
260 the python cookbook.
261
262 class bf(object)
263 the bf(object) from activestate python cookbook - by Sebastien Keim - Many Thanks
264 http://aspn.activestate.com/ASPN/Cookbook/Python/Recipe/113799
265
266 Version 1.1.0 replaced these with specific binary AND & and OR | operations that are about 2.5 times faster.
267 They are 'inline' in the functions for speed (avoiding function calls) but are available separately as well.
268
269 def bittest(value, bitindex)
270 This function returns the setting of any bit from a value.
271 bitindex starts at 0.
272
273 def bitset(value, bitindex, bit)
274 Sets a bit, specified by bitindex, in in 'value' to 'bit'.
275 bit should be 1 or 0
276
277
278
279 There are also the 'private functions' which actually contain the substance of binleave and binunleave,
280 You are welcome to 'browse' them - but you shouldn't need to use them directly.
281
282
283 Any comments, suggestions and bug reports welcome.
284
285 Regards,
286
287 Fuzzy
288
289 michael AT foord DOT me DOT uk
290
291
292 """
293
294 import sha
295 from random import random
296
297 DATEIN = 1
298 if DATEIN:
299 try: # try to import the dateutils and time module
300 from time import strftime
301 from dateutils import daycount, returndate # ,counttodate # counttodate returns a daynumber as a date
302 except:
303 DATEIN = 0
304
305 # If PSYCOON is set to 0 then we won't try and import the psyco module
306 # IF importing fails, PSYCOON is set to 0
307 PSYCOON = 1
308 if PSYCOON:
309 try:
310 import psyco
311 psyco.full()
312 from psyco.classes import *
313 try:
314 psyco.cannotcompile(re.compile) # psyco hinders rather than helps regular expression compilation
315 except NameError:
316 pass
317 except:
318 PSYCOON = 0
319
320 # note - changing the order of the TABLE here can be used to change the mapping.
321 TABLE = '_-0123456789' + \
322 'abcdefghijklmnopqrstuvwxyz'+ \
323 'NOPQRSTUVWXYZABCDEFGHIJKLM'
324 # table should be exactly 64 printable characters long... or we'll all die horribly
325 # Obviously the same TABLE should be used for decoding as for encoding....
326 # This version of TABLE (v1.1.2) uses only characters that are safe to pass in URLs
327 # (e.g. using the GET method for passing FORM data)
328
329
330 OLD_TABLE = '!$%^&*()_-+=' + \
331 'abcdefghijklmnopqrstuvwxyz'+ \
332 'NOPQRSTUVWXYZABCDEFGHIJKLM'
333 # OLD_TABLE is the old encoding. If anyone has stuff encoded with this then it can be decoded using :
334 # data = table_dec(encodedstring, OLD_TABLE)
335
336
337 def pass_enc(instring, indict=None, **keywargs):
338 """Returns an ascii version of an SHA hash or a string, with the date/time stamped into it.
339 e.g. For ascii safe storing of password hashes.
340
341 It also accepts the following keyword args (or a dictionary conatining the following keys).
342 (Keywords shown - with default values).
343
344 lower = False, sha_hash = False, daynumber = None, timestamp = None, endleave = False
345
346 Setting lower to True makes instring lowercase before hashing/encoding.
347
348 If sha_hash is set to True then instead of the actual string passed in being encoded, it's SHA hash
349 is encoded. (In either case the string can contain any binary data).
350
351 If a daynumber is passed in then the daynumber will be encoded into the returned string.
352 (daynumber is an integer representing the 'Julian day number' of a date - see the dateutils module).
353 This can be used as a 'datestamp' for the generated code and you can detect anyone reusing old codes this way.
354 If 'daynumber' is set to True then today's daynumber will automatically be used.
355 (dateutils module required - otherwise it will be ignored).
356
357 Max allowed value for daynumber is 16777215 (9th May 41222)
358 (so daynumber can be any integer from 1 to 16777215 that you want to 'watermark' the hash with
359 could be used as a session ID for a CGI for example).
360
361 If a timestamp is passed in it should either be timestamp = True meanining use 'now'.
362 Or it should be a tuple (HOUR, MINUTES).
363 HOUR should be an integer 0-23
364 MINUTES should be an integer 0-59
365
366 The time and date stamp is *binary* interleaved, before encoding, into the data.
367
368 If endleave is set to True then the timestamp is interleaved more securely. Shouldn't be necessary in practise
369 because the stamp is so short and we subsequently encode using table_enc.
370 If the string is long this will slow down the process - because we interleave twice.
371 """
372 if indict == None: indict = {}
373 arglist = {'lower' : False, 'sha_hash' : False, 'daynumber' : None, 'timestamp' : None, 'endleave' : False}
374
375 if not indict and keywargs: # if keyword passed in instead of a dictionary - we use that
376 indict = keywargs
377 for keyword in arglist: # any keywords not specified we use the default
378 if not indict.has_key(keyword):
379 indict[keyword] = arglist[keyword]
380
381 if indict['lower']: # keyword lower :-)
382 instring = instring.lower()
383 if indict['sha_hash']:
384 instring = sha.new(instring).digest()
385
386 if indict['daynumber'] == True:
387 if not DATEIN:
388 indict['daynumber'] = None
389 else:
390 a,b,c = returndate()
391 indict['daynumber'] = daycount(a,b,c) # set the daycount to today
392 if indict['timestamp']== True:
393 if not DATEIN:
394 indict['timestamp'] = None
395 else:
396 indict['timestamp'] = return_now() # set the time to now.
397
398 datestamp = makestamp(indict['daynumber'], indict['timestamp'])
399 if len(instring) == len(datestamp): instring = instring + '&mjf-end;' # otherwise we can't tell which is which when we unleave them later :-)
400 outdata = binleave(instring, datestamp, indict['endleave'])
401 return table_enc(outdata) # do the encoding of the actual string
402
403
404
405 def pass_dec(incode):
406 """Given a string encoded by pass_enc - it returns it decoded.
407 It also extracts the datestamp and returns that.
408 The return is :
409 (instring, daynumber, timestamp)
410 """
411 binary = table_dec(incode)
412 out1, out2 = binunleave(binary)
413 if len(out1) == 5:
414 datestamp = out1
415 if out2.endswith('&mjf-end;'):
416 out2 = out2[:-9]
417 instring = out2
418 else:
419 datestamp = out2
420 if out1.endswith('&mjf-end;'):
421 out1 = out1[:-9]
422 instring = out1
423 daynumber, timestamp = dec_datestamp(datestamp)
424 return instring, daynumber, timestamp
425
426
427
428 def expired(daynumber, timestamp, validity):
429 """Given the length of time a password is valid for, it checks if a daynumber/timestamp tuple is
430 still valid.
431 validity should be an integer tuple (DAYS, HOURS, MINUTES).
432 Returns True for valid or False for invalid.
433 Needs the dateutils module to get the current daynumber.
434
435 >>> a, b, c = returndate()
436 >>> today = daycount(a, b, c)
437 >>> h, m = return_now()
438 >>> expired(today, (h, m-2), (0,0,1))
439 False
440 >>> expired(today, (h, m-2), (0,0,10))
441 True
442 >>> expired(today, (h-2, m-2), (0,1,10))
443 False
444 >>> expired(today-1, (h-2, m-2), (1,1,10))
445 False
446 >>> expired(today-1, (h-2, m-2), (2,1,10))
447 True
448 >>>
449 """
450 """
451 Not sure why I'm doing this
452 """
453 daynumber = daynumber or int('0')
454 timestamp = timestamp or (0,0)
455 if not DATEIN:
456 raise ImportError("Need the dateutils module to use the 'expired' function.")
457
458 h1, m1 = timestamp
459 # h1, m1 are the hours and minutes of the timestamp
460 d2, h2, m2 = validity
461 # validity is how long the timestamp is valid for
462
463 a, b, c = returndate()
464 today = daycount(a, b, c)
465 # today is number representing the julian day number of today
466 h, m = return_now()
467 # h, m are the hours and minutes of time now
468
469 h1 = h1 + h2
470 m1 = m1 + m2
471 daynumber = daynumber + d2
472 # so we need to test if today, h, m are greater than daynumber, h1, m1
473 # But first we need to adjust because we might currently have hours above 23 and minutes above 59
474 while m1 > 59:
475 h1 += 1
476 m1 -= 60
477 while h1 > 23:
478 daynumber += 1
479 h1 -= 24
480 daynumber += d2
481 if today > daynumber:
482 return False
483 if today < daynumber:
484 return True
485 if h > h1: # same day
486 return False
487 if h < h1:
488 return True
489 if m > m1: # same hour
490 return False
491 else:
492 return True
493
494 unexpired = expired # Technically unexpired is a better name since this function returns True if the timestamp is unexpired.
495
496 def makestamp(daynumber, timestamp):
497 """Receives a Julian daynumber (integer 1 to 16777215) and an (HOUR, MINUTES) tuple timestamp.
498 Returns a 5 digit string of binary characters that represent that date/time.
499 Can receive None for either or both of these arguments.
500
501 The function 'daycount' in dateutils will turn a date into a daynumber.
502 """
503 if not daynumber:
504 datestamp = chr(0)*3
505 else:
506 day1 = daynumber//65536
507 daynumber = daynumber % 65536
508 day2 = daynumber//256
509 daynumber = daynumber%256
510 datestamp = chr(day1) + chr(day2) + chr(daynumber)
511 if not timestamp:
512 datestamp = datestamp + chr(255)*2
513 else:
514 datestamp = datestamp + chr(timestamp[0]) + chr(timestamp[1])
515 return datestamp
516
517
518 def dec_datestamp(datestamp):
519 """Given a 5 character datestamp made by makestamp, it returns it as the tuple :
520 (daynumber, timestamp).
521 daynumber and timestamp can either be None *or*
522 daynumber is an integer between 1 and 16777215
523 timestamp is (HOUR, MINUTES)
524
525 The function 'counttodate' in dateutils will turn a daynumber back into a date."""
526 daynumber = datestamp[:3]
527 timechars = datestamp[3:]
528 daynumber = ord(daynumber[0])*65536 + ord(daynumber[1])*256 + ord(daynumber[2])
529 if daynumber == 0: daynumber = None
530 if ord(timechars[0]) == 255:
531 timestamp = None
532 else:
533 timestamp = (ord(timechars[0]), ord(timechars[1]))
534 return daynumber, timestamp
535
536
537
538 def sixbit(invalue):
539 """Given a value in it returns a list representing the base 64 version of that number.
540 Each value in the list is an integer from 0-63...
541 The first member of the list is the most significant figure... down to the remainder.
542 Should only be used for positive values.
543 """
544 if invalue < 1: # special case !
545 return [0]
546 power = -1
547 outlist = []
548 test = 0
549 while test <= invalue:
550 power += 1
551 test = pow(64,power)
552
553 while power:
554 power -= 1
555 outlist.append(int(invalue//pow(64,power)))
556 invalue = invalue % pow(64,power)
557 return outlist
558
559 def sixtoeight(intuple):
560 """Given four base 64 (6-bit) digits... it returns three 8 bit digits that represent
561 the same value.
562 If length of intuple != 4, or any digits are > 63, it returns None.
563
564 **NOTE**
565 Not quite the reverse of the sixbit function."""
566 if len(intuple) != 4: return None
567 for entry in intuple:
568 if entry > 63:
569 return None
570 value = intuple[3] + intuple[2]*64 + intuple[1]*4096 + intuple[0]*262144
571 val1 = value//65536
572 value = value % 65536
573 val2 = value//256
574 value = value % 256
575 return val1, val2, value
576
577
578 def table_enc(instring, table=None):
579 """The actual function that performs TABLE encoding.
580 It takes instring in three character chunks (three 8 bit values)
581 and turns it into 4 6 bit characters.
582 Each of these 6 bit characters maps to a character in TABLE.
583 If the length of instring is not divisible by three it is padded with Null bytes.
584 The number of Null bytes to remove is then encoded as a semi-random character at the start of the string.
585 You can pass in an alternative 64 character string to do the encoding with if you want.
586 """
587 if table == None: table = TABLE
588 out = []
589 test = len(instring) % 3
590 if test: instring = instring + chr(0)*(3-test) # make sure the length of instring is divisible by 3
591 # print test,' ', len(instring) % 3
592 while instring:
593 chunk = instring[:3]
594 instring = instring[3:]
595 value = 65536 * ord(chunk[0]) + 256 * ord(chunk[1]) + ord(chunk[2])
596 newdat = sixbit(value)
597 while len(newdat) < 4:
598 newdat.insert(0, 0)
599 for char in newdat:
600 out.append(table[char])
601 if not test:
602 out.insert(0, table[int(random()*21)]) # if we added 0 extra characters we add a character from 0 to 20
603 elif test == 1:
604 out.insert(0, table[int(random()*21)+21]) # if we added 1 extra characters we add a character from 21 to 41
605 elif test == 2:
606 out.insert(0, table[int(random()*22)+42]) # if we added 1 extra characters we add a character from 42 to 63
607 return ''.join(out)
608
609 def table_dec(instring, table=None):
610 """The function that performs TABLE decoding.
611 Given a TABLE encoded string it returns the original binary data - as a string.
612 If the data it's given is invalid (not data encoded by table_enc) it returns None
613 (definition of invalid : not consisting of characters in the TABLE or length not len(instring) % 4 = 1).
614 You can pass in an alternative 64 character string to do the decoding with if you want.
615 """
616 if table == None: table = TABLE
617 out = []
618 rem_test = table.find(instring[0]) # remove the length data at the end
619 if rem_test == -1: return None
620 instring = instring[1:]
621 if len(instring)%4 != 0: return None # check the length is now divisible by 4
622 while instring:
623 chunk = instring[:4]
624 instring = instring[4:]
625 newchunk = []
626 for char in chunk:
627 test = table.find(char)
628 if test == -1: return None
629 newchunk.append(test)
630 newchars = sixtoeight(newchunk)
631 if not newchars: return None
632 for char in newchars:
633 out.append(chr(char))
634 if rem_test > 41:
635 out = out[:-1]
636 elif rem_test > 20:
637 out = out[:-2]
638 return ''.join(out)
639
640 def return_now():
641 """Returns the time now.
642 As (HOUR, MINUTES)."""
643 return int(strftime('%I')), int(strftime('%M'))
644
645 def check_pass(inhash, pswdhash, EXPIRE):
646 """Given the hash (possibly from a webpage) it checks that it is still valid and matches the password it is supposed
647 to have.
648 If so it returns the new hash.
649 If expired it returns -1.
650 If the pass is invalid it returns False."""
651 try:
652 instring, daynumber, timestamp = pass_dec(inhash) # of course a fake or mangled password will cause an exception here
653 if not table_dec(pswdhash) == instring:
654 return False
655 if not unexpired(daynumber, timestamp, EXPIRE): # this tests if the hash is still valid and is the password hash the same as the password hash encoded in the page ?
656 return -1
657 else:
658 return pass_enc(instring, daynumber = True, timestamp = True) # generate a new hash, with the current time
659 except:
660 return False
661
662 def binleave(data1, data2, endleave = False):
663 """Given two strings of binary data it interleaves data1 into data2 on a bitwise basis
664 and returns a single string combining both. (bits interleaved not just the bytes).
665 The returned string will be 4 bytes or so longer than the two strings passed in.
666 Use bin_unleave to return the two strings again.
667 Even if both strings passed in are ascii - the result will contain non-ascii characters.
668 To keep ascii-safe you must subsequently encode with table_enc.
669
670 Max length for the smallest data string (one string can be of unlimited size) is about 16meg
671 (increasing this would be easy if anyone needed it - but would be very slow anyway).
672
673 If either string is empty (or the smallest string greater than 16meg) - we return None.
674 The first 4 characters of the string returned 'define' the interleave. (actually the size of the watermark)
675 For added safety you could remove this and send seperately.
676
677 Version 1.0.0 used a bf (bitfield) object from the python cookbook. Version 1.1.0 uses the binary and & and or |
678 operations and is about 2.5 times faster. On my AMD 3000, leaving and unleaving two 20k files took 1.8 seconds.
679 (instead of 4.5 previously - with Psyco enabled this improved to 0.4 seconds.....)
680
681 Interleaving a file with a watermark of pretty much any size makes it unreadable - this is because *every* byte is changed.
682 (Except perhaps a few at the end - see the endleave keyword). However it shouldn't be relied on if you need
683 a really secure method of encryption. For many purposes it will be sufficient however.
684
685 In practise any file not an exact multiple of the size of the watermark will have a chunk at the end that is untouched.
686 To get round this you can set endleave = True.. which then releaves the end data back into itself.
687 (and therefore takes twice as long - it shouldn't be necessary where you have a short watermark.)
688
689 data2 ought to be the smaller string - or they will be swapped round internally.
690 This could cause you to get them back in an unexpected order from binunleave.
691 """
692 header, out, data1 = internalfunc(data1,data2)
693 # print len(data1), len(out), len(header)
694 header = chr(int(random()*128)) + header # making it a 4 byte header
695 if endleave and data1 and len(data1) < 65536:
696 header, out, data1 = internalfunc(header + out, data1)
697 header = chr(int(random()*128)+ 128) + header
698 return header + out + data1
699
700 def binunleave(data):
701 """Given a chunk of data woven by binleave - it returns the two seperate pieces of data."""
702 header = data[0]
703 data = data[1:]
704 data1, data2 = internalfunc2(data)
705 if ord(header) > 127:
706 # print len(data1)
707 data = data2 + data1
708 data = data[1:]
709 data1, data2 = internalfunc2(data)
710 return data1, data2
711
712 ######################
713
714 # binleave and binunleave used to make extensive use of a python objectcalled bf() (bitfield)
715 # There are still many places this could be useful, but I now use binary operations inline.
716 # Included for reference is the bf object and the binary operations as functions.
717
718 class bf(object):
719 """the bf(object) from activestate python cookbook - by Sebastien Keim - Many Thanks
720 http://aspn.activestate.com/ASPN/Cookbook/Python/Recipe/113799"""
721 def __init__(self,value=0):
722 self._d = value
723
724 def __getitem__(self, index):
725 return (self._d >> index) & 1
726
727 def __setitem__(self,index,value):
728 value = (value&1L)<<index
729 mask = (1L)<<index
730 self._d = (self._d & ~mask) | value
731
732 def __getslice__(self, start, end):
733 mask = 2L**(end - start) -1
734 return (self._d >> start) & mask
735
736 def __setslice__(self, start, end, value):
737 mask = 2L**(end - start) -1
738 value = (value & mask) << start
739 mask = mask << start
740 self._d = (self._d & ~mask) | value
741 return (self._d >> start) & mask
742
743 def __int__(self):
744 return self._d
745
746
747 def bittest(value, bitindex):
748 """This function returns the setting of any bit from a value.
749 bitindex starts at 0.
750 """
751 return (value&(1<<bitindex))>>bitindex
752
753 def bitset(value, bitindex, bit):
754 """Sets a bit, specified by bitindex, in 'value' to 'bit'.
755 bit should be 1 or 0
756 bitindex starts at 0.
757 """
758 bit = (bit&1L)<<bitindex
759 mask = (1L)<<bitindex
760 return (value & ~mask) | bit # set that bit of value to 0 with an & operation and then or it with the 'bit'
761
762
763
764 ##################################################################
765
766 # private functions used by the above public functions
767
768 def internalfunc(data1, data2):
769 """Used by binleave.
770 This function interleaves data2 into data1 a little chunk at a time."""
771 if len(data2) > len(data1): # make sure that data1 always has the longer string, making data2 the watermark
772 dummy = data1
773 data1 = data2
774 data2 = dummy
775 if not data2 or not data1: return None # check for empty data
776 length = len(data2)
777 if length >= pow(2,24): return None # if the strings are oversized
778 multiple = len(data1)//length # this is how often we should interleave bits
779 if multiple > 65535: multiple = 65535 # in practise we'll set to max 65535
780 header1 = length//65536
781 header3 = length % 65536
782 header2 = header3//256
783 header3 = header3 % 256
784 header = chr(header1) + chr(header2) + chr(header3) # these are the 3 bytes we will put at the start of the string
785 # so - to encode one byte of data2 (the watermark) we need multiple bytes of data1
786 data1 = [ord(char) for char in list(data1)]
787 startpos = 0
788 data2 = [ord(char) for char in list(data2)]
789 BINLIST=[1,2,4,8,16,32,64,128]
790 out = []
791 bitlen = multiple*8 + 8 # the total number of bits we'll have
792 # print bitlen, multiple
793 while data2:
794 chunklist = data1[startpos:startpos + multiple]
795 startpos = startpos + multiple
796 heapobj = 0
797 mainobj = data2.pop(0)
798 charobj = chunklist.pop(0)
799 bitindex = 0
800 mainindex = 0
801 heapindex = 0
802 charindex = 0
803 while mainindex < bitlen:
804 # print mainindex, heapindex, charindex, bitindex
805 if heapindex == 8: # if we've got all 8 bit's
806 out.append(chr(heapobj))
807 heapobj = 0
808 heapindex = 0
809 if not mainindex%(multiple+1): # we've got to a point where we should nick another bit from the byte
810 if mainobj&BINLIST[bitindex]: # if the bit at binindex is set
811 heapobj = heapobj|BINLIST[heapindex] # set the bit at heapindex
812 heapindex += 1
813 bitindex += 1
814 mainindex += 1
815 continue
816 if charindex == 7 and chunklist: # we've used up the current character from the chunk
817 if charobj&BINLIST[charindex]:
818 heapobj = heapobj|BINLIST[heapindex]
819 charobj = chunklist.pop(0)
820 charindex = 0
821 heapindex += 1
822 mainindex += 1
823 continue
824 if charobj&BINLIST[charindex]:
825 heapobj = heapobj|BINLIST[heapindex]
826 heapindex += 1
827 charindex += 1
828 mainindex += 1
829
830 if heapindex == 8: # if we've got all 8 bit's.. but the loop has ended...
831 out.append(chr(heapobj))
832
833 return header, ''.join(out), ''.join([chr(char) for char in data1[startpos:]])
834
835 def internalfunc2(data):
836 """Used by binunleave.
837 This function extracts data that has been interleaved using binleave."""
838 lenstr = data[:3] # extract the length of the watermark
839 data = list(data[3:])
840 length2 = ord(lenstr[0])*65536 + ord(lenstr[1])*256 + ord(lenstr[2]) # length of watermark
841 length1 = len(data) - length2 # overall length
842 multiple = length1//length2 + 1
843 if multiple > 65536: multiple = 65536 # in practise we'll set to max 65535 + 1
844 bitlen = multiple*8
845 # print len(data), length1, length2, multiple
846 out1 = []
847 out = []
848 index = 0
849 BINLIST=[1,2,4,8,16,32,64,128]
850 # print len(chunk)
851 while index < length2:
852 index += 1
853 chunk = data[:multiple]
854 data = data[multiple:]
855 chunklist = [ord(char) for char in chunk] # turn chunk into a list of it's values
856 heapobj = 0
857 outbyte = 0
858 charobj = chunklist.pop(0)
859 bitindex = 0
860 mainindex = 0
861 heapindex = 0
862 charindex = 0
863 while mainindex < bitlen:
864 # print mainindex, heapindex, charindex, bitindex
865 if heapindex == 8: # if we've got all 8 bit's
866 out.append(chr(heapobj))
867 heapobj = 0
868 heapindex = 0
869 if not mainindex%multiple: # we've got to a point where we should add another bit to the byte
870 if charobj&BINLIST[charindex]:
871 outbyte = outbyte|BINLIST[bitindex]
872 if not charindex == 7:
873 charindex += 1
874 else:
875 charobj = chunklist.pop(0)
876 charindex = 0
877 bitindex += 1
878 mainindex += 1
879 continue
880 if charindex == 7 and chunklist: # we've used up the current character from the chunk
881 if charobj&BINLIST[charindex]:
882 heapobj = heapobj|BINLIST[heapindex]
883 charobj = chunklist.pop(0)
884 charindex = 0
885 heapindex += 1
886 mainindex += 1
887 continue
888 if charobj&BINLIST[charindex]:
889 heapobj = heapobj|BINLIST[heapindex]
890 heapindex += 1
891 charindex += 1
892 mainindex += 1
893 if heapindex == 8: # if we've got all 8 bit's.. but the loop has ended...
894 out.append(chr(heapobj))
895 out1.append(chr(outbyte))
896
897 return ''.join(out1), ''.join(out+data)
898
899 def test(): # the test suite
900 from time import clock
901 from os.path import exists
902 print 'Printing the TABLE : '
903 index = 0
904 while index < len(TABLE):
905 print TABLE[index], TABLE.find(TABLE[index])
906 index +=1
907
908 print '\nEnter test password to encode using table_enc :\n(Hit enter to continue past this)\n'
909 while True:
910 dummy = raw_input('>>...')
911 if not dummy: break
912 test = table_enc(dummy)
913 test2 = table_dec(test)
914 print test
915 print 'length : ', len(test), ' modulo 4 of length - 1 : ', (len(test)-1) % 4
916 print 'Decoded : ', test2
917 print 'Length dec : ', len(test2)
918
919 print '\nEnter password - to timestamp and then encode :\n(Hit enter to continue past this)\n'
920 while True:
921 instring = raw_input('>>...')
922 if not instring:
923 break
924 code = pass_enc(instring, sha_hash=False, daynumber=True, timestamp=True)
925 print code
926 print pass_dec(code)
927
928
929 print '\n\nTesting interleaving a 1000 byte random string with a 1500 byte random string :'
930 print
931 print 'Overall length of combined string : ',
932 a=0
933 b=''
934 c = ''
935 while a < 1000:
936 a += 1
937 b = b + chr(int(random()*256))
938 c = c + chr(int(random()*256))
939 while a < 1500:
940 a += 1
941 c = c + chr(int(random()*256))
942 d = clock()
943 test = binleave(c, b, True)
944 print len(test)
945 a1, a2 = binunleave(test)
946 print 'Time taken (including print statements ;-) ', str(clock()-d)[:6], ' seconds'
947 print 'Test for equality of extracted data against original :'
948 print a1 == b
949 print a2 == c
950
951
952 # If you give it two test files 'test1.zip' and 'test2.zip' it will interleave the two files,
953 # unleave them again and write out the first file as 'test4.zip'
954 # It prints how long it takes and you can verify that the returned file is undamaged.
955
956 if exists('test1.zip') and exists('test2.zip'):
957 print
958 print "Reading 'test1.zip' and 'test2.zip'"
959 print "Interleaving them together and writing the combined file out as 'test3.zip'"
960 print "Then unleaving them and writing 'test1.zip' back out as 'test4.zip'",
961 print " to confirm it is unchanged by the process"
962 a = file('test1.zip','rb')
963 b = a.read()
964 a.close()
965 a = file('test2.zip','rb')
966 c = a.read()
967 a.close()
968 d = clock()
969 test = binleave(c,b, True)
970
971 print len(test)
972 a = file('test3.zip','wb')
973 a.write(test)
974 a.close()
975 a1, a2 = binunleave(test)
976 print str(clock()-d)[:6]
977 a = file('test4.zip','wb')
978 a.write(a1)
979 a.close()
980 else:
981 print
982 print 'Unable to perform final test.'
983 print "We need two files to use for the test : 'test1.zip' and 'test2.zip'"
984 print "We then interleave them together, and write the combined file out as 'test3.zip'"
985 print "Then we unleave them again, and write 'test1.zip' back out as 'test4.zip'",
986 print "(So we can confirm that it's unchanged by the process.)"
987
988
989
990
991
992 if __name__ == '__main__':
993
994 # the start of making dataenc an application - but I don't think it will be used :-)
995 # just runs the test suite instead
996
997 ### this is executed if dataenc is run from the commandline
998 ##
999 ### first we get the arguments we were called with using optparse
1000 ##
1001 ### minimum arguments :
1002 ### input file
1003 ### output file
1004 ##
1005 ### default :
1006 ### if three file arguments are given the two are interelaved and saved as the third file
1007 ### so long as the third file doesn't already exist.
1008 ##
1009 ### if two filenames are given it reads the first file and datestamps it
1010 ### saves as the second file (assuming it doesn't exist)
1011 ##
1012 ### if one filename is given it assumes it is a n interleaved file to extract
1013 ##
1014 ### options :
1015 ### overwrite output file - default OFF
1016 ### encode or decode - default is encode (specifying three files forces encode)
1017 ### table_enc on or off - default is OFF
1018 ### specify a TABLE file - default is to use inbuilt
1019 ### datestamp/interleave on or off - default is ON (datestamping)
1020 ### endleaving on or off - default is OFF
1021 ### header file - default is to use the header in the file when decoding, and to leave it in the file when encoding
1022 ### (If a header file is specified the 3 byte header from binary interleaving will be saved seperately).
1023 ##
1024 ### **special**
1025 ### config file - *all* values are read from the config file
1026 ##
1027 ## from optparse import OptionParser
1028 ##
1029 ## parser = OptionParser()
1030 ## parser.add_option("-q", "--quiet",
1031 ## action="store_false", dest="quiet", default = False,
1032 ## help="Set a verbosity level of 0, print no messages.")
1033 ##
1034 ## parser.add_option("--test",
1035 ## action="store_true", dest="test", default=False,
1036 ## help="Run the tests, all other options ignored. Verbosity of tests is 9.")
1037 ##
1038 ## parser.add_option("-v", "--verbose", type = 'int', dest="verbose", default=9,
1039 ## help="Set the verbosity level. Should be an integer from 0 to 9,"+\
1040 ## "9 means the most verbose and 0 means don't ouput any messages. Default is 9.")
1041 ##
1042 ## parser.add_option("-d", "--decode",
1043 ## action="store_true", dest="decode", default = False,
1044 ## help="Set to decode rather than encode. Default is encode.")
1045 ##
1046 ## parser.add_option("-t", "--table",
1047 ## action="store_true", dest="table", default = False,
1048 ## help="Encode or decode files using the TABLE. (ASCII to binary or binary to ASCII).")
1049 ##
1050 ## parser.add_option("-T","--TABLE", dest="table_file",
1051 ## help="Specify a 64 character file to use as the TABLE for encoding/decoding.")
1052 ##
1053 ## parser.add_option("-o", "--off",
1054 ## action="store_false", dest="datestamp", default = True,
1055 ## help="Switches datestamping OFF. Default is ON.")
1056 ##
1057 ## parser.add_option("-e", "--end",
1058 ## action="store_false", dest="end", default = True,
1059 ## help="Switches endleaving ON. default is OFF.")
1060 ##
1061 ## parser.add_option("-H","--header", dest="header_file", default = False,
1062 ## help="Specify a separate file to use as the header file when binary encoding/decoding.")
1063 ##
1064 ## parser.add_option("-c","--config", dest="config_file", default = False,
1065 ## help="Specify a config file to read *all* the other options from.")
1066 ##
1067 ##
1068 ##
1069 ## options, args = parser.parse_args()
1070 ### print args
1071 ##
1072 ##
1073 ### next import StandOut which allows us to set variable levels of verbosity
1074 ## try:
1075 ## from standout import StandOut
1076 ## stout = StandOut()
1077 ## except:
1078 ## print 'dataenc uses the standout module to handle varying levels of verbosity'
1079 ## print 'Without it, all messages will be printed.'
1080 ## class dummy: # a dummy object that we can twiddle if StandOut isn't available
1081 ## def __init__(self):
1082 ## self.priority = 0
1083 ## self.verbosity = 0
1084 ## def close(self):
1085 ## pass
1086 ## stout = dummy()
1087 ##
1088 ## defaults = { 'header_file' : False, 'datestamp' : True
1089 ## if options.config_file: # a configfile, the settings here override all the others
1090 ## try:
1091 ## from configobj import ConfigObj
1092 ## except ImportError:
1093 ## print "Without the ConfigObj module I can't import a config file."
1094 ## print 'See http://www.voidspace.org.uk/atlantibots/pythonutils.html'
1095 ## raise
1096 ## config = ConfigObj(options.config_file, fileerror=True)
1097 ##
1098 ##
1099 ## if options.verbose:
1100 ## stout.verbosity = 10 - options.verbose # a higher verbosity level here, actually means quiter
1101 ## else:
1102 ## stout.verbosity = 0 # except for 0, which means silent
1103 ## if options.quiet: # if the quiet option is explicitly set
1104 ## stout.verbosity = 0
1105 ## stout.priority = 2
1106 ## print 'Welcome to dataenc - the data encoding and interleaving program by Fuzzyman'
1107 ## print 'See http://www.voidspace.org.uk/atlantibots/pythonutils.html'
1108 ## print 'Written in Python.'
1109 ## stout.priority = 3
1110 ## if not psycoin:
1111 ## print 'Having the Psyco module installed (Python Specialising compiler) would vastly speed up dataenc.'
1112 ## if not DATEIN:
1113 ## print 'Some of the datestamping features are only available when the dateutils module is available.'
1114 ## stout.priority = 5
1115 ##
1116 ##
1117 ## if options.test:
1118 test()
1119
1120
1121
1122
1123
1124
1125 """
1126
1127 BUGS
1128 No more known bugs... yet.
1129 I'm sure they'll surface.
1130
1131 ISSUES
1132 binleave and bin_unleave are still quite slow.
1133 For stamping small password hashes with a date stamp it's fast enough - for weaving larger files together it's *too slow*.
1134 Also for weaving similar sized files together we may be better with a pattern of 2 bits of water mark per 3 bits of string.
1135 (or a 3 to 4 or 5 to 7 etc..)
1136 Currrently it will only work with 1 bit of watermark per 1 or 2 or 3 or 4 etc bits of main string. (Exact multiples)
1137 Again, for small watermarks this works fine - and as that is all I'm using it for I'm not inclined to change it.
1138 The logic would be simple - just fiddly.
1139
1140
1141 TODO :
1142 Might make it a simple application - so it can be used from the command line for encoding, decoding
1143 timestamping and combining files.....
1144 Could replace use of the BINLIST and the if test with a single inline statement with more << >> in binleave and binunleave
1145 Could move the binleave and binuleave into C
1146
1147
1148 CHANGELOG
1149 13-09-04 Version 1.1.5
1150 Increased speed in table_enc and table_dec.
1151
1152 30-08-04 Version 1.1.4
1153 Slight docs improvement.
1154 Slight speed improvement in binleave and binunleave.
1155
1156 22-08-04 Version 1.1.3
1157 Added the unexpired alias and the check_pass function.
1158 Changed license text.
1159 Minor preemptive bugfix in some default values.
1160
1161 11-04-04 Version 1.1.2
1162 Added the expired function for testing validity of timestamps.
1163 Changed the TABLE to be URL safe for passing in forms using the 'GET' method.
1164 Added OLD_TABLE with the old encoding, and gave table_dec and table_enc the ability to receive an explicit TABLE.
1165
1166 07-04-04 Version 1.1.1
1167 Improved the tests a bit.
1168 Corrected a bug that affected large files or large files with small watermarks.
1169
1170 05-04-04 Version 1.1.0
1171 Replaced the bf object with much faster bitwise logical operations. It is now about 2.5 times faster.
1172 With Psyco enabled it becomes 11 times faster than the first version....
1173 Added the bit setting and testing operations as functions.
1174
1175 03-04-04 Version 1.0.0
1176 Initial testing is a success.
1177
1178
1179 """

Managed by UCC Webmasters ViewVC Help
Powered by ViewVC 1.1.26