/[theodore]/bunnyblog/modules/dataenc.py


UCC Code Repository

Contents of /bunnyblog/modules/dataenc.py

Parent Directory Parent Directory | Revision Log Revision Log


Revision 1 - (hide annotations) (download) (as text)
Tue Jan 29 14:32:01 2008 UTC (12 years, 2 months ago) by svn-admin
File MIME type: text/x-python
File size: 48756 byte(s)
Re-import of repository after repository database corruption.

1 svn-admin 1 #13-09-04
2     # v1.1.5
3    
4     # **build 3**
5    
6     # dataenc.py
7     # password encoding and comparing module
8     # The strength of the SHA hashing module - in an ascii safe, timestamped format.
9    
10     # uses a binary to ascii encoding
11     # and will timestamp the encodings as well.
12     # (Binary watermarking of data).
13    
14     # used in a CGI called 'custbase' - to check logins are correct
15     # (For the CGI to function it needs the users password (or it's SH5 hash) to be encoded into
16     # each page as a hidden form field. This exposes the encrypted password in the HTML source of each page.
17     # This module provides functions to interleave a timestamp *into* the hash.
18     # Even if the encoded 'timestamped hash' is extracted from the HTML source, the CGI can tell
19     # that the password has expired).
20    
21    
22     # Contains functions to :
23    
24     # do binary to ascii encoding using a TABLE mapping (and ascii to binary)
25     # binary interleave - to disperse one set of binary data into another (e.g. as a 'watermark' or date/time stamp)
26     # to extract the watermark again
27     # convert a decimal value to base 64 digits, and base 64 digits back to 8 bit digits..
28     # Creating and retrieving a timestamp from the current time/date
29     # functions for testing and setting bits in a byte (or larger value)
30     # (including a bitwise operator object that comes from the python cookbook
31     # and is no longer used here, but included for reference).
32    
33     # Wrapping all that together to return an ascii encoded, date stamped SHA hash of a string
34    
35    
36     # Copyright Michael Foord 2004
37     # dataenc.py
38     # Functions for encoding and interleaving data.
39    
40     # http://www.voidspace.org.uk/python/modules.shtml
41    
42     # Released subject to the BSD License
43     # Please see http://www.voidspace.org.uk/documents/BSD-LICENSE.txt
44    
45     # For information about bugfixes, updates and support, please join the Pythonutils mailing list.
46     # http://voidspace.org.uk/mailman/listinfo/pythonutils_voidspace.org.uk
47     # Comments, suggestions and bug reports welcome.
48     # Scripts maintained at http://www.voidspace.org.uk/python/index.shtml
49     # E-mail fuzzyman@voidspace.org.uk
50    
51    
52     """
53     DOCS for dataenc as a module
54    
55     When run it should go through a few basic tests - see the function test()
56    
57     This module provides low-level functions to interleave two bits of data into each other and separate them.
58     It will also encode this binary data to and from ascii - for inclusion in HTML, cookies or email transmission.
59    
60     It also provides high level functions to use these functions for time stamping passwords and password hashes,
61     and also to check that a 'time-stamped hash' is both valid and unexpired.
62    
63     The check_pass function is interesting. Given an encoded and timestamped hash it compares it with the hash (using SD5) of a password.
64     If it matches *and* is unexpired (you set the time limit) it returns a new encoded time stamp of the hash with the current time.
65     I use this for secure, time limited, logins over CGI. (Could be stored in a cookie as well).
66     (On the first login you will need to compare the password with the stored hash and use that to generate a time stamped hash to include in the page returned.
67     Thereafter you can just use the check_pass function and include the time-stamped hash in a hidden form field for every action.)
68    
69     The binary data is interleaved on a 'bitwise' basis - every byte is mangled.
70    
71     --
72    
73     CONSTANTS
74    
75     The main constant defined in dataenc.py is :
76    
77     TABLE = '_-0123456789' + \
78     'abcdefghijklmnopqrstuvwxyz'+ \
79     'NOPQRSTUVWXYZABCDEFGHIJKLM'
80     TABLE should be exactly 64 printable characters long... or we'll all die horribly
81     Obviously the same TABLE should be used for decoding as for encoding....
82     note - changing the order of the TABLE here can be used to change the mapping.
83     Versions 1.1.2+ of TABLE uses only characters that are safe to pass in URLs
84     (e.g. using the GET method for passing FORM data)
85    
86     OLD_TABLE is the previous encoding map used for versions of dataenc.py previous to 1.1.2
87     See the table_dec function for how to decode data encoded with that map.
88    
89     PSYCOIN = 1
90     This decides if we attempt to import psyco or not (the specialising compiler). Set to 0 to not import.
91     If we attempt but fail to import psyco then this value will be set to 0.
92    
93     DATEIN = 1
94     As above but for the dateutils and time module.
95     We need to import dateutils for the expired and pass_enc functions (amongst others) to work fully.
96    
97    
98     FUNCTIONS
99    
100     Following are the docstrings extracted from the public functions :
101    
102     pass_enc(instring, indict = {}, **keywargs)
103     Returns an ascii version of an SHA hash or a string, with the date/time stamped into it.
104     e.g. For ascii safe storing of password hashes.
105    
106     It also accepts the following keyword args (or a dictionary conatining the following keys).
107     (Keywords shown - with default values).
108    
109     lower = False, sha_hash = False, daynumber = None, timestamp = None, endleave = False
110    
111     Setting lower to True makes instring lowercase before hashing/encoding.
112    
113     If sha_hash is set to True then instead of the actual string passed in being encoded, it's SHA hash
114     is encoded. (In either case the string can contain any binary data).
115    
116     If a daynumber is passed in then the daynumber will be encoded into the returned string.
117     (daynumber is an integer representing the 'Julian day number' of a date - see the dateutils module).
118     This can be used as a 'datestamp' for the generated code and you can detect anyone reusing old codes this way.
119     If 'daynumber' is set to True then today's daynumber will automatically be used.
120     (dateutils module required - otherwise it will be ignored).
121    
122     Max allowed value for daynumber is 16777215 (9th May 41222)
123     (so daynumber can be any integer from 1 to 16777215 that you want to 'watermark' the hash with
124     could be used as a session ID for a CGI for example).
125    
126     If a timestamp is passed in it should either be timestamp = True meanining use 'now'.
127     Or it should be a tuple (HOUR, MINUTES).
128     HOUR should be an integer 0-23
129     MINUTES should be an integer 0-59
130    
131     The time and date stamp is *binary* interleaved, before encoding, into the data.
132    
133     If endleave is set to True then the timestamp is interleaved more securely. Shouldn't be necessary in practise
134     because the stamp is so short and we subsequently encode using table_enc.
135     If the string is long this will slow down the process - because we interleave twice.
136    
137    
138     pass_dec(incode)
139     Given a string encoded by pass_enc - it returns it decoded.
140     It also extracts the datestamp and returns that.
141     The return is :
142     (instring, daynumber, timestamp)
143    
144    
145     expired(daynumber, timestamp, validity)
146     Given the length of time a password is valid for, it checks if a daynumber/timestamp tuple is
147     still valid.
148     validity should be an integer tuple (DAYS, HOURS, MINUTES).
149     Returns True for valid or False for invalid.
150     Needs the dateutils module to get the current daynumber.
151    
152     unexpired is an alias for expired - because it makes for better tests.
153     (The return results from the expired function are logically the wrong way round, expired returns True if the timestamp is *not* expired..)
154    
155    
156     check_pass(inhash, pswdhash, EXPIRE)
157     Given the hash (possibly from a webpage or cookie) it checks that it is still valid and matches the password it is supposed to have.
158     If so it returns a new hash - with the current time stamped into it.
159     EXPIRE is a validity tuple to test for (see expired function)
160     e.g. (0, 1, 0) means the supplied hash should be no older than 1 hour
161    
162     If the hash is expired it returns -1.
163     If the pass is invalid or doesn't match the supplied pswdhash it returns False.
164     This is a high level function that can do all your password checking and 'time-stamped hash' generation after initial login.
165    
166    
167     makestamp(daynumber, timestamp)
168     Receives a Julian daynumber (integer 1 to 16777215) and an (HOUR, MINUTES) tuple timestamp.
169     Returns a 5 digit string of binary characters that represent that date/time.
170     Can receive None for either or both of these arguments.
171    
172     The function 'daycount' in dateutils will turn a date into a daynumber.
173    
174    
175     dec_datestamp(datestamp)
176     Given a 5 character datestamp made by makestamp, it returns it as the tuple :
177     (daynumber, timestamp).
178     daynumber and timestamp can either be None *or*
179     daynumber is an integer between 1 and 16777215
180     timestamp is (HOUR, MINUTES)
181    
182     The function 'counttodate' in dateutils will turn a daynumber back into a date.
183    
184    
185     sixbit(invalue)
186     Given a value in it returns a list representing the base 64 version of that number.
187     Each value in the list is an integer from 0-63...
188     The first member of the list is the most significant figure... down to the remainder.
189     Should only be used for positive values.
190    
191    
192     sixtoeight(intuple)
193     Given four base 64 (6-bit) digits... it returns three 8 bit digits that represent
194     the same value.
195     If length of intuple != 4, or any digits are > 63, it returns None.
196    
197     **NOTE**
198     Not quite the reverse of the sixbit function.
199    
200    
201     table_enc(instring, table=TABLE)
202     The actual function that performs TABLE encoding.
203     It takes instring in three character chunks (three 8 bit values)
204     and turns it into 4 6 bit characters.
205     Each of these 6 bit characters maps to a character in TABLE.
206     If the length of instring is not divisible by three it is padded with Null bytes.
207     The number of Null bytes to remove is then encoded as a semi-random character at the start of the string.
208     You can pass in an alternative 64 character string to do the encoding with if you want.
209    
210    
211     table_dec(instring, table=TABLE)
212     The function that performs TABLE decoding.
213     Given a TABLE encoded string it returns the original binary data - as a string.
214     If the data it's given is invalid (not data encoded by table_enc) it returns None
215     (definition of invalid : not consisting of characters in the TABLE or length not len(instring) % 4 = 1).
216     You can pass in an alternative 64 character string to do the decoding with if you want.
217    
218    
219     return_now()
220     Returns the time now.
221     As (HOUR, MINUTES).
222    
223    
224     binleave(data1, data2, endleave = False)
225     Given two strings of binary data it interleaves data1 into data2 on a bitwise basis
226     and returns a single string combining both. (not just the bytes interleaved).
227     The returned string will be 4 bytes or so longer than the two strings passed in.
228     Use bin_unleave to return the two strings again.
229     Even if both strings passed in are ascii - the result will contain non-ascii characters.
230     To keep ascii-safe you must subsequently encode with table_enc.
231    
232     Max length for the smallest data string (one string can be of unlimited size) is about 16meg
233     (increasing this would be easy if anyone needed it - but would be very slow anyway).
234    
235     If either string is empty (or the smallest string greater than 16meg) - we return None.
236     The first 4 characters of the string returned 'define' the interleave. (actually the size of the watermark)
237     For added safety you could remove this and send seperately.
238    
239     Version 1.0.0 used a bf (bitfield) object from the python cookbook. Version 1.1.0 uses the binary and & and or |
240     operations and is about 2.5 times faster. On my AMD 3000, leaving and unleaving two 20k files took 1.8 seconds.
241     (instead of 4.5 previously - with Psyco enabled this improved to 0.4 seconds.....)
242    
243     Interleaving a file with a watermark of pretty much any size makes it unreadable - this is because *every* byte is changed.
244     (Except perhaps a few at the end - see the endleave keyword). However it shouldn't be relied on if you need
245     a really secure method of encryption. For many purposes it will be sufficient however.
246    
247     In practise any file not an exact multiple of the size of the watermark will have a chunk at the end that is untouched.
248     To get round this you can set endleave = True.. which then releaves the end data back into itself.
249     (and therefore takes twice as long - it shouldn't be necessary where you have a short watermark.)
250    
251     data2 ought to be the smaller string - or they will be swapped round internally.
252     This could cause you to get them back in an unexpected order from binunleave.
253    
254    
255     binunleave(data)
256     Given a chunk of data woven by binleave - it returns the two seperate pieces of data.
257    
258    
259     For the binary operations of binleave and binunleave, version 1.0.0 used a bf (bitfield) object from
260     the python cookbook.
261    
262     class bf(object)
263     the bf(object) from activestate python cookbook - by Sebastien Keim - Many Thanks
264     http://aspn.activestate.com/ASPN/Cookbook/Python/Recipe/113799
265    
266     Version 1.1.0 replaced these with specific binary AND & and OR | operations that are about 2.5 times faster.
267     They are 'inline' in the functions for speed (avoiding function calls) but are available separately as well.
268    
269     def bittest(value, bitindex)
270     This function returns the setting of any bit from a value.
271     bitindex starts at 0.
272    
273     def bitset(value, bitindex, bit)
274     Sets a bit, specified by bitindex, in in 'value' to 'bit'.
275     bit should be 1 or 0
276    
277    
278    
279     There are also the 'private functions' which actually contain the substance of binleave and binunleave,
280     You are welcome to 'browse' them - but you shouldn't need to use them directly.
281    
282    
283     Any comments, suggestions and bug reports welcome.
284    
285     Regards,
286    
287     Fuzzy
288    
289     michael AT foord DOT me DOT uk
290    
291    
292     """
293    
294     import sha
295     from random import random
296    
297     DATEIN = 1
298     if DATEIN:
299     try: # try to import the dateutils and time module
300     from time import strftime
301     from dateutils import daycount, returndate # ,counttodate # counttodate returns a daynumber as a date
302     except:
303     DATEIN = 0
304    
305     # If PSYCOON is set to 0 then we won't try and import the psyco module
306     # IF importing fails, PSYCOON is set to 0
307     PSYCOON = 1
308     if PSYCOON:
309     try:
310     import psyco
311     psyco.full()
312     from psyco.classes import *
313     try:
314     psyco.cannotcompile(re.compile) # psyco hinders rather than helps regular expression compilation
315     except NameError:
316     pass
317     except:
318     PSYCOON = 0
319    
320     # note - changing the order of the TABLE here can be used to change the mapping.
321     TABLE = '_-0123456789' + \
322     'abcdefghijklmnopqrstuvwxyz'+ \
323     'NOPQRSTUVWXYZABCDEFGHIJKLM'
324     # table should be exactly 64 printable characters long... or we'll all die horribly
325     # Obviously the same TABLE should be used for decoding as for encoding....
326     # This version of TABLE (v1.1.2) uses only characters that are safe to pass in URLs
327     # (e.g. using the GET method for passing FORM data)
328    
329    
330     OLD_TABLE = '!$%^&*()_-+=' + \
331     'abcdefghijklmnopqrstuvwxyz'+ \
332     'NOPQRSTUVWXYZABCDEFGHIJKLM'
333     # OLD_TABLE is the old encoding. If anyone has stuff encoded with this then it can be decoded using :
334     # data = table_dec(encodedstring, OLD_TABLE)
335    
336    
337     def pass_enc(instring, indict=None, **keywargs):
338     """Returns an ascii version of an SHA hash or a string, with the date/time stamped into it.
339     e.g. For ascii safe storing of password hashes.
340    
341     It also accepts the following keyword args (or a dictionary conatining the following keys).
342     (Keywords shown - with default values).
343    
344     lower = False, sha_hash = False, daynumber = None, timestamp = None, endleave = False
345    
346     Setting lower to True makes instring lowercase before hashing/encoding.
347    
348     If sha_hash is set to True then instead of the actual string passed in being encoded, it's SHA hash
349     is encoded. (In either case the string can contain any binary data).
350    
351     If a daynumber is passed in then the daynumber will be encoded into the returned string.
352     (daynumber is an integer representing the 'Julian day number' of a date - see the dateutils module).
353     This can be used as a 'datestamp' for the generated code and you can detect anyone reusing old codes this way.
354     If 'daynumber' is set to True then today's daynumber will automatically be used.
355     (dateutils module required - otherwise it will be ignored).
356    
357     Max allowed value for daynumber is 16777215 (9th May 41222)
358     (so daynumber can be any integer from 1 to 16777215 that you want to 'watermark' the hash with
359     could be used as a session ID for a CGI for example).
360    
361     If a timestamp is passed in it should either be timestamp = True meanining use 'now'.
362     Or it should be a tuple (HOUR, MINUTES).
363     HOUR should be an integer 0-23
364     MINUTES should be an integer 0-59
365    
366     The time and date stamp is *binary* interleaved, before encoding, into the data.
367    
368     If endleave is set to True then the timestamp is interleaved more securely. Shouldn't be necessary in practise
369     because the stamp is so short and we subsequently encode using table_enc.
370     If the string is long this will slow down the process - because we interleave twice.
371     """
372     if indict == None: indict = {}
373     arglist = {'lower' : False, 'sha_hash' : False, 'daynumber' : None, 'timestamp' : None, 'endleave' : False}
374    
375     if not indict and keywargs: # if keyword passed in instead of a dictionary - we use that
376     indict = keywargs
377     for keyword in arglist: # any keywords not specified we use the default
378     if not indict.has_key(keyword):
379     indict[keyword] = arglist[keyword]
380    
381     if indict['lower']: # keyword lower :-)
382     instring = instring.lower()
383     if indict['sha_hash']:
384     instring = sha.new(instring).digest()
385    
386     if indict['daynumber'] == True:
387     if not DATEIN:
388     indict['daynumber'] = None
389     else:
390     a,b,c = returndate()
391     indict['daynumber'] = daycount(a,b,c) # set the daycount to today
392     if indict['timestamp']== True:
393     if not DATEIN:
394     indict['timestamp'] = None
395     else:
396     indict['timestamp'] = return_now() # set the time to now.
397    
398     datestamp = makestamp(indict['daynumber'], indict['timestamp'])
399     if len(instring) == len(datestamp): instring = instring + '&mjf-end;' # otherwise we can't tell which is which when we unleave them later :-)
400     outdata = binleave(instring, datestamp, indict['endleave'])
401     return table_enc(outdata) # do the encoding of the actual string
402    
403    
404    
405     def pass_dec(incode):
406     """Given a string encoded by pass_enc - it returns it decoded.
407     It also extracts the datestamp and returns that.
408     The return is :
409     (instring, daynumber, timestamp)
410     """
411     binary = table_dec(incode)
412     out1, out2 = binunleave(binary)
413     if len(out1) == 5:
414     datestamp = out1
415     if out2.endswith('&mjf-end;'):
416     out2 = out2[:-9]
417     instring = out2
418     else:
419     datestamp = out2
420     if out1.endswith('&mjf-end;'):
421     out1 = out1[:-9]
422     instring = out1
423     daynumber, timestamp = dec_datestamp(datestamp)
424     return instring, daynumber, timestamp
425    
426    
427    
428     def expired(daynumber, timestamp, validity):
429     """Given the length of time a password is valid for, it checks if a daynumber/timestamp tuple is
430     still valid.
431     validity should be an integer tuple (DAYS, HOURS, MINUTES).
432     Returns True for valid or False for invalid.
433     Needs the dateutils module to get the current daynumber.
434    
435     >>> a, b, c = returndate()
436     >>> today = daycount(a, b, c)
437     >>> h, m = return_now()
438     >>> expired(today, (h, m-2), (0,0,1))
439     False
440     >>> expired(today, (h, m-2), (0,0,10))
441     True
442     >>> expired(today, (h-2, m-2), (0,1,10))
443     False
444     >>> expired(today-1, (h-2, m-2), (1,1,10))
445     False
446     >>> expired(today-1, (h-2, m-2), (2,1,10))
447     True
448     >>>
449     """
450     """
451     Not sure why I'm doing this
452     """
453     daynumber = daynumber or int('0')
454     timestamp = timestamp or (0,0)
455     if not DATEIN:
456     raise ImportError("Need the dateutils module to use the 'expired' function.")
457    
458     h1, m1 = timestamp
459     # h1, m1 are the hours and minutes of the timestamp
460     d2, h2, m2 = validity
461     # validity is how long the timestamp is valid for
462    
463     a, b, c = returndate()
464     today = daycount(a, b, c)
465     # today is number representing the julian day number of today
466     h, m = return_now()
467     # h, m are the hours and minutes of time now
468    
469     h1 = h1 + h2
470     m1 = m1 + m2
471     daynumber = daynumber + d2
472     # so we need to test if today, h, m are greater than daynumber, h1, m1
473     # But first we need to adjust because we might currently have hours above 23 and minutes above 59
474     while m1 > 59:
475     h1 += 1
476     m1 -= 60
477     while h1 > 23:
478     daynumber += 1
479     h1 -= 24
480     daynumber += d2
481     if today > daynumber:
482     return False
483     if today < daynumber:
484     return True
485     if h > h1: # same day
486     return False
487     if h < h1:
488     return True
489     if m > m1: # same hour
490     return False
491     else:
492     return True
493    
494     unexpired = expired # Technically unexpired is a better name since this function returns True if the timestamp is unexpired.
495    
496     def makestamp(daynumber, timestamp):
497     """Receives a Julian daynumber (integer 1 to 16777215) and an (HOUR, MINUTES) tuple timestamp.
498     Returns a 5 digit string of binary characters that represent that date/time.
499     Can receive None for either or both of these arguments.
500    
501     The function 'daycount' in dateutils will turn a date into a daynumber.
502     """
503     if not daynumber:
504     datestamp = chr(0)*3
505     else:
506     day1 = daynumber//65536
507     daynumber = daynumber % 65536
508     day2 = daynumber//256
509     daynumber = daynumber%256
510     datestamp = chr(day1) + chr(day2) + chr(daynumber)
511     if not timestamp:
512     datestamp = datestamp + chr(255)*2
513     else:
514     datestamp = datestamp + chr(timestamp[0]) + chr(timestamp[1])
515     return datestamp
516    
517    
518     def dec_datestamp(datestamp):
519     """Given a 5 character datestamp made by makestamp, it returns it as the tuple :
520     (daynumber, timestamp).
521     daynumber and timestamp can either be None *or*
522     daynumber is an integer between 1 and 16777215
523     timestamp is (HOUR, MINUTES)
524    
525     The function 'counttodate' in dateutils will turn a daynumber back into a date."""
526     daynumber = datestamp[:3]
527     timechars = datestamp[3:]
528     daynumber = ord(daynumber[0])*65536 + ord(daynumber[1])*256 + ord(daynumber[2])
529     if daynumber == 0: daynumber = None
530     if ord(timechars[0]) == 255:
531     timestamp = None
532     else:
533     timestamp = (ord(timechars[0]), ord(timechars[1]))
534     return daynumber, timestamp
535    
536    
537    
538     def sixbit(invalue):
539     """Given a value in it returns a list representing the base 64 version of that number.
540     Each value in the list is an integer from 0-63...
541     The first member of the list is the most significant figure... down to the remainder.
542     Should only be used for positive values.
543     """
544     if invalue < 1: # special case !
545     return [0]
546     power = -1
547     outlist = []
548     test = 0
549     while test <= invalue:
550     power += 1
551     test = pow(64,power)
552    
553     while power:
554     power -= 1
555     outlist.append(int(invalue//pow(64,power)))
556     invalue = invalue % pow(64,power)
557     return outlist
558    
559     def sixtoeight(intuple):
560     """Given four base 64 (6-bit) digits... it returns three 8 bit digits that represent
561     the same value.
562     If length of intuple != 4, or any digits are > 63, it returns None.
563    
564     **NOTE**
565     Not quite the reverse of the sixbit function."""
566     if len(intuple) != 4: return None
567     for entry in intuple:
568     if entry > 63:
569     return None
570     value = intuple[3] + intuple[2]*64 + intuple[1]*4096 + intuple[0]*262144
571     val1 = value//65536
572     value = value % 65536
573     val2 = value//256
574     value = value % 256
575     return val1, val2, value
576    
577    
578     def table_enc(instring, table=None):
579     """The actual function that performs TABLE encoding.
580     It takes instring in three character chunks (three 8 bit values)
581     and turns it into 4 6 bit characters.
582     Each of these 6 bit characters maps to a character in TABLE.
583     If the length of instring is not divisible by three it is padded with Null bytes.
584     The number of Null bytes to remove is then encoded as a semi-random character at the start of the string.
585     You can pass in an alternative 64 character string to do the encoding with if you want.
586     """
587     if table == None: table = TABLE
588     out = []
589     test = len(instring) % 3
590     if test: instring = instring + chr(0)*(3-test) # make sure the length of instring is divisible by 3
591     # print test,' ', len(instring) % 3
592     while instring:
593     chunk = instring[:3]
594     instring = instring[3:]
595     value = 65536 * ord(chunk[0]) + 256 * ord(chunk[1]) + ord(chunk[2])
596     newdat = sixbit(value)
597     while len(newdat) < 4:
598     newdat.insert(0, 0)
599     for char in newdat:
600     out.append(table[char])
601     if not test:
602     out.insert(0, table[int(random()*21)]) # if we added 0 extra characters we add a character from 0 to 20
603     elif test == 1:
604     out.insert(0, table[int(random()*21)+21]) # if we added 1 extra characters we add a character from 21 to 41
605     elif test == 2:
606     out.insert(0, table[int(random()*22)+42]) # if we added 1 extra characters we add a character from 42 to 63
607     return ''.join(out)
608    
609     def table_dec(instring, table=None):
610     """The function that performs TABLE decoding.
611     Given a TABLE encoded string it returns the original binary data - as a string.
612     If the data it's given is invalid (not data encoded by table_enc) it returns None
613     (definition of invalid : not consisting of characters in the TABLE or length not len(instring) % 4 = 1).
614     You can pass in an alternative 64 character string to do the decoding with if you want.
615     """
616     if table == None: table = TABLE
617     out = []
618     rem_test = table.find(instring[0]) # remove the length data at the end
619     if rem_test == -1: return None
620     instring = instring[1:]
621     if len(instring)%4 != 0: return None # check the length is now divisible by 4
622     while instring:
623     chunk = instring[:4]
624     instring = instring[4:]
625     newchunk = []
626     for char in chunk:
627     test = table.find(char)
628     if test == -1: return None
629     newchunk.append(test)
630     newchars = sixtoeight(newchunk)
631     if not newchars: return None
632     for char in newchars:
633     out.append(chr(char))
634     if rem_test > 41:
635     out = out[:-1]
636     elif rem_test > 20:
637     out = out[:-2]
638     return ''.join(out)
639    
640     def return_now():
641     """Returns the time now.
642     As (HOUR, MINUTES)."""
643     return int(strftime('%I')), int(strftime('%M'))
644    
645     def check_pass(inhash, pswdhash, EXPIRE):
646     """Given the hash (possibly from a webpage) it checks that it is still valid and matches the password it is supposed
647     to have.
648     If so it returns the new hash.
649     If expired it returns -1.
650     If the pass is invalid it returns False."""
651     try:
652     instring, daynumber, timestamp = pass_dec(inhash) # of course a fake or mangled password will cause an exception here
653     if not table_dec(pswdhash) == instring:
654     return False
655     if not unexpired(daynumber, timestamp, EXPIRE): # this tests if the hash is still valid and is the password hash the same as the password hash encoded in the page ?
656     return -1
657     else:
658     return pass_enc(instring, daynumber = True, timestamp = True) # generate a new hash, with the current time
659     except:
660     return False
661    
662     def binleave(data1, data2, endleave = False):
663     """Given two strings of binary data it interleaves data1 into data2 on a bitwise basis
664     and returns a single string combining both. (bits interleaved not just the bytes).
665     The returned string will be 4 bytes or so longer than the two strings passed in.
666     Use bin_unleave to return the two strings again.
667     Even if both strings passed in are ascii - the result will contain non-ascii characters.
668     To keep ascii-safe you must subsequently encode with table_enc.
669    
670     Max length for the smallest data string (one string can be of unlimited size) is about 16meg
671     (increasing this would be easy if anyone needed it - but would be very slow anyway).
672    
673     If either string is empty (or the smallest string greater than 16meg) - we return None.
674     The first 4 characters of the string returned 'define' the interleave. (actually the size of the watermark)
675     For added safety you could remove this and send seperately.
676    
677     Version 1.0.0 used a bf (bitfield) object from the python cookbook. Version 1.1.0 uses the binary and & and or |
678     operations and is about 2.5 times faster. On my AMD 3000, leaving and unleaving two 20k files took 1.8 seconds.
679     (instead of 4.5 previously - with Psyco enabled this improved to 0.4 seconds.....)
680    
681     Interleaving a file with a watermark of pretty much any size makes it unreadable - this is because *every* byte is changed.
682     (Except perhaps a few at the end - see the endleave keyword). However it shouldn't be relied on if you need
683     a really secure method of encryption. For many purposes it will be sufficient however.
684    
685     In practise any file not an exact multiple of the size of the watermark will have a chunk at the end that is untouched.
686     To get round this you can set endleave = True.. which then releaves the end data back into itself.
687     (and therefore takes twice as long - it shouldn't be necessary where you have a short watermark.)
688    
689     data2 ought to be the smaller string - or they will be swapped round internally.
690     This could cause you to get them back in an unexpected order from binunleave.
691     """
692     header, out, data1 = internalfunc(data1,data2)
693     # print len(data1), len(out), len(header)
694     header = chr(int(random()*128)) + header # making it a 4 byte header
695     if endleave and data1 and len(data1) < 65536:
696     header, out, data1 = internalfunc(header + out, data1)
697     header = chr(int(random()*128)+ 128) + header
698     return header + out + data1
699    
700     def binunleave(data):
701     """Given a chunk of data woven by binleave - it returns the two seperate pieces of data."""
702     header = data[0]
703     data = data[1:]
704     data1, data2 = internalfunc2(data)
705     if ord(header) > 127:
706     # print len(data1)
707     data = data2 + data1
708     data = data[1:]
709     data1, data2 = internalfunc2(data)
710     return data1, data2
711    
712     ######################
713    
714     # binleave and binunleave used to make extensive use of a python objectcalled bf() (bitfield)
715     # There are still many places this could be useful, but I now use binary operations inline.
716     # Included for reference is the bf object and the binary operations as functions.
717    
718     class bf(object):
719     """the bf(object) from activestate python cookbook - by Sebastien Keim - Many Thanks
720     http://aspn.activestate.com/ASPN/Cookbook/Python/Recipe/113799"""
721     def __init__(self,value=0):
722     self._d = value
723    
724     def __getitem__(self, index):
725     return (self._d >> index) & 1
726    
727     def __setitem__(self,index,value):
728     value = (value&1L)<<index
729     mask = (1L)<<index
730     self._d = (self._d & ~mask) | value
731    
732     def __getslice__(self, start, end):
733     mask = 2L**(end - start) -1
734     return (self._d >> start) & mask
735    
736     def __setslice__(self, start, end, value):
737     mask = 2L**(end - start) -1
738     value = (value & mask) << start
739     mask = mask << start
740     self._d = (self._d & ~mask) | value
741     return (self._d >> start) & mask
742    
743     def __int__(self):
744     return self._d
745    
746    
747     def bittest(value, bitindex):
748     """This function returns the setting of any bit from a value.
749     bitindex starts at 0.
750     """
751     return (value&(1<<bitindex))>>bitindex
752    
753     def bitset(value, bitindex, bit):
754     """Sets a bit, specified by bitindex, in 'value' to 'bit'.
755     bit should be 1 or 0
756     bitindex starts at 0.
757     """
758     bit = (bit&1L)<<bitindex
759     mask = (1L)<<bitindex
760     return (value & ~mask) | bit # set that bit of value to 0 with an & operation and then or it with the 'bit'
761    
762    
763    
764     ##################################################################
765    
766     # private functions used by the above public functions
767    
768     def internalfunc(data1, data2):
769     """Used by binleave.
770     This function interleaves data2 into data1 a little chunk at a time."""
771     if len(data2) > len(data1): # make sure that data1 always has the longer string, making data2 the watermark
772     dummy = data1
773     data1 = data2
774     data2 = dummy
775     if not data2 or not data1: return None # check for empty data
776     length = len(data2)
777     if length >= pow(2,24): return None # if the strings are oversized
778     multiple = len(data1)//length # this is how often we should interleave bits
779     if multiple > 65535: multiple = 65535 # in practise we'll set to max 65535
780     header1 = length//65536
781     header3 = length % 65536
782     header2 = header3//256
783     header3 = header3 % 256
784     header = chr(header1) + chr(header2) + chr(header3) # these are the 3 bytes we will put at the start of the string
785     # so - to encode one byte of data2 (the watermark) we need multiple bytes of data1
786     data1 = [ord(char) for char in list(data1)]
787     startpos = 0
788     data2 = [ord(char) for char in list(data2)]
789     BINLIST=[1,2,4,8,16,32,64,128]
790     out = []
791     bitlen = multiple*8 + 8 # the total number of bits we'll have
792     # print bitlen, multiple
793     while data2:
794     chunklist = data1[startpos:startpos + multiple]
795     startpos = startpos + multiple
796     heapobj = 0
797     mainobj = data2.pop(0)
798     charobj = chunklist.pop(0)
799     bitindex = 0
800     mainindex = 0
801     heapindex = 0
802     charindex = 0
803     while mainindex < bitlen:
804     # print mainindex, heapindex, charindex, bitindex
805     if heapindex == 8: # if we've got all 8 bit's
806     out.append(chr(heapobj))
807     heapobj = 0
808     heapindex = 0
809     if not mainindex%(multiple+1): # we've got to a point where we should nick another bit from the byte
810     if mainobj&BINLIST[bitindex]: # if the bit at binindex is set
811     heapobj = heapobj|BINLIST[heapindex] # set the bit at heapindex
812     heapindex += 1
813     bitindex += 1
814     mainindex += 1
815     continue
816     if charindex == 7 and chunklist: # we've used up the current character from the chunk
817     if charobj&BINLIST[charindex]:
818     heapobj = heapobj|BINLIST[heapindex]
819     charobj = chunklist.pop(0)
820     charindex = 0
821     heapindex += 1
822     mainindex += 1
823     continue
824     if charobj&BINLIST[charindex]:
825     heapobj = heapobj|BINLIST[heapindex]
826     heapindex += 1
827     charindex += 1
828     mainindex += 1
829    
830     if heapindex == 8: # if we've got all 8 bit's.. but the loop has ended...
831     out.append(chr(heapobj))
832    
833     return header, ''.join(out), ''.join([chr(char) for char in data1[startpos:]])
834    
835     def internalfunc2(data):
836     """Used by binunleave.
837     This function extracts data that has been interleaved using binleave."""
838     lenstr = data[:3] # extract the length of the watermark
839     data = list(data[3:])
840     length2 = ord(lenstr[0])*65536 + ord(lenstr[1])*256 + ord(lenstr[2]) # length of watermark
841     length1 = len(data) - length2 # overall length
842     multiple = length1//length2 + 1
843     if multiple > 65536: multiple = 65536 # in practise we'll set to max 65535 + 1
844     bitlen = multiple*8
845     # print len(data), length1, length2, multiple
846     out1 = []
847     out = []
848     index = 0
849     BINLIST=[1,2,4,8,16,32,64,128]
850     # print len(chunk)
851     while index < length2:
852     index += 1
853     chunk = data[:multiple]
854     data = data[multiple:]
855     chunklist = [ord(char) for char in chunk] # turn chunk into a list of it's values
856     heapobj = 0
857     outbyte = 0
858     charobj = chunklist.pop(0)
859     bitindex = 0
860     mainindex = 0
861     heapindex = 0
862     charindex = 0
863     while mainindex < bitlen:
864     # print mainindex, heapindex, charindex, bitindex
865     if heapindex == 8: # if we've got all 8 bit's
866     out.append(chr(heapobj))
867     heapobj = 0
868     heapindex = 0
869     if not mainindex%multiple: # we've got to a point where we should add another bit to the byte
870     if charobj&BINLIST[charindex]:
871     outbyte = outbyte|BINLIST[bitindex]
872     if not charindex == 7:
873     charindex += 1
874     else:
875     charobj = chunklist.pop(0)
876     charindex = 0
877     bitindex += 1
878     mainindex += 1
879     continue
880     if charindex == 7 and chunklist: # we've used up the current character from the chunk
881     if charobj&BINLIST[charindex]:
882     heapobj = heapobj|BINLIST[heapindex]
883     charobj = chunklist.pop(0)
884     charindex = 0
885     heapindex += 1
886     mainindex += 1
887     continue
888     if charobj&BINLIST[charindex]:
889     heapobj = heapobj|BINLIST[heapindex]
890     heapindex += 1
891     charindex += 1
892     mainindex += 1
893     if heapindex == 8: # if we've got all 8 bit's.. but the loop has ended...
894     out.append(chr(heapobj))
895     out1.append(chr(outbyte))
896    
897     return ''.join(out1), ''.join(out+data)
898    
899     def test(): # the test suite
900     from time import clock
901     from os.path import exists
902     print 'Printing the TABLE : '
903     index = 0
904     while index < len(TABLE):
905     print TABLE[index], TABLE.find(TABLE[index])
906     index +=1
907    
908     print '\nEnter test password to encode using table_enc :\n(Hit enter to continue past this)\n'
909     while True:
910     dummy = raw_input('>>...')
911     if not dummy: break
912     test = table_enc(dummy)
913     test2 = table_dec(test)
914     print test
915     print 'length : ', len(test), ' modulo 4 of length - 1 : ', (len(test)-1) % 4
916     print 'Decoded : ', test2
917     print 'Length dec : ', len(test2)
918    
919     print '\nEnter password - to timestamp and then encode :\n(Hit enter to continue past this)\n'
920     while True:
921     instring = raw_input('>>...')
922     if not instring:
923     break
924     code = pass_enc(instring, sha_hash=False, daynumber=True, timestamp=True)
925     print code
926     print pass_dec(code)
927    
928    
929     print '\n\nTesting interleaving a 1000 byte random string with a 1500 byte random string :'
930     print
931     print 'Overall length of combined string : ',
932     a=0
933     b=''
934     c = ''
935     while a < 1000:
936     a += 1
937     b = b + chr(int(random()*256))
938     c = c + chr(int(random()*256))
939     while a < 1500:
940     a += 1
941     c = c + chr(int(random()*256))
942     d = clock()
943     test = binleave(c, b, True)
944     print len(test)
945     a1, a2 = binunleave(test)
946     print 'Time taken (including print statements ;-) ', str(clock()-d)[:6], ' seconds'
947     print 'Test for equality of extracted data against original :'
948     print a1 == b
949     print a2 == c
950    
951    
952     # If you give it two test files 'test1.zip' and 'test2.zip' it will interleave the two files,
953     # unleave them again and write out the first file as 'test4.zip'
954     # It prints how long it takes and you can verify that the returned file is undamaged.
955    
956     if exists('test1.zip') and exists('test2.zip'):
957     print
958     print "Reading 'test1.zip' and 'test2.zip'"
959     print "Interleaving them together and writing the combined file out as 'test3.zip'"
960     print "Then unleaving them and writing 'test1.zip' back out as 'test4.zip'",
961     print " to confirm it is unchanged by the process"
962     a = file('test1.zip','rb')
963     b = a.read()
964     a.close()
965     a = file('test2.zip','rb')
966     c = a.read()
967     a.close()
968     d = clock()
969     test = binleave(c,b, True)
970    
971     print len(test)
972     a = file('test3.zip','wb')
973     a.write(test)
974     a.close()
975     a1, a2 = binunleave(test)
976     print str(clock()-d)[:6]
977     a = file('test4.zip','wb')
978     a.write(a1)
979     a.close()
980     else:
981     print
982     print 'Unable to perform final test.'
983     print "We need two files to use for the test : 'test1.zip' and 'test2.zip'"
984     print "We then interleave them together, and write the combined file out as 'test3.zip'"
985     print "Then we unleave them again, and write 'test1.zip' back out as 'test4.zip'",
986     print "(So we can confirm that it's unchanged by the process.)"
987    
988    
989    
990    
991    
992     if __name__ == '__main__':
993    
994     # the start of making dataenc an application - but I don't think it will be used :-)
995     # just runs the test suite instead
996    
997     ### this is executed if dataenc is run from the commandline
998     ##
999     ### first we get the arguments we were called with using optparse
1000     ##
1001     ### minimum arguments :
1002     ### input file
1003     ### output file
1004     ##
1005     ### default :
1006     ### if three file arguments are given the two are interelaved and saved as the third file
1007     ### so long as the third file doesn't already exist.
1008     ##
1009     ### if two filenames are given it reads the first file and datestamps it
1010     ### saves as the second file (assuming it doesn't exist)
1011     ##
1012     ### if one filename is given it assumes it is a n interleaved file to extract
1013     ##
1014     ### options :
1015     ### overwrite output file - default OFF
1016     ### encode or decode - default is encode (specifying three files forces encode)
1017     ### table_enc on or off - default is OFF
1018     ### specify a TABLE file - default is to use inbuilt
1019     ### datestamp/interleave on or off - default is ON (datestamping)
1020     ### endleaving on or off - default is OFF
1021     ### header file - default is to use the header in the file when decoding, and to leave it in the file when encoding
1022     ### (If a header file is specified the 3 byte header from binary interleaving will be saved seperately).
1023     ##
1024     ### **special**
1025     ### config file - *all* values are read from the config file
1026     ##
1027     ## from optparse import OptionParser
1028     ##
1029     ## parser = OptionParser()
1030     ## parser.add_option("-q", "--quiet",
1031     ## action="store_false", dest="quiet", default = False,
1032     ## help="Set a verbosity level of 0, print no messages.")
1033     ##
1034     ## parser.add_option("--test",
1035     ## action="store_true", dest="test", default=False,
1036     ## help="Run the tests, all other options ignored. Verbosity of tests is 9.")
1037     ##
1038     ## parser.add_option("-v", "--verbose", type = 'int', dest="verbose", default=9,
1039     ## help="Set the verbosity level. Should be an integer from 0 to 9,"+\
1040     ## "9 means the most verbose and 0 means don't ouput any messages. Default is 9.")
1041     ##
1042     ## parser.add_option("-d", "--decode",
1043     ## action="store_true", dest="decode", default = False,
1044     ## help="Set to decode rather than encode. Default is encode.")
1045     ##
1046     ## parser.add_option("-t", "--table",
1047     ## action="store_true", dest="table", default = False,
1048     ## help="Encode or decode files using the TABLE. (ASCII to binary or binary to ASCII).")
1049     ##
1050     ## parser.add_option("-T","--TABLE", dest="table_file",
1051     ## help="Specify a 64 character file to use as the TABLE for encoding/decoding.")
1052     ##
1053     ## parser.add_option("-o", "--off",
1054     ## action="store_false", dest="datestamp", default = True,
1055     ## help="Switches datestamping OFF. Default is ON.")
1056     ##
1057     ## parser.add_option("-e", "--end",
1058     ## action="store_false", dest="end", default = True,
1059     ## help="Switches endleaving ON. default is OFF.")
1060     ##
1061     ## parser.add_option("-H","--header", dest="header_file", default = False,
1062     ## help="Specify a separate file to use as the header file when binary encoding/decoding.")
1063     ##
1064     ## parser.add_option("-c","--config", dest="config_file", default = False,
1065     ## help="Specify a config file to read *all* the other options from.")
1066     ##
1067     ##
1068     ##
1069     ## options, args = parser.parse_args()
1070     ### print args
1071     ##
1072     ##
1073     ### next import StandOut which allows us to set variable levels of verbosity
1074     ## try:
1075     ## from standout import StandOut
1076     ## stout = StandOut()
1077     ## except:
1078     ## print 'dataenc uses the standout module to handle varying levels of verbosity'
1079     ## print 'Without it, all messages will be printed.'
1080     ## class dummy: # a dummy object that we can twiddle if StandOut isn't available
1081     ## def __init__(self):
1082     ## self.priority = 0
1083     ## self.verbosity = 0
1084     ## def close(self):
1085     ## pass
1086     ## stout = dummy()
1087     ##
1088     ## defaults = { 'header_file' : False, 'datestamp' : True
1089     ## if options.config_file: # a configfile, the settings here override all the others
1090     ## try:
1091     ## from configobj import ConfigObj
1092     ## except ImportError:
1093     ## print "Without the ConfigObj module I can't import a config file."
1094     ## print 'See http://www.voidspace.org.uk/atlantibots/pythonutils.html'
1095     ## raise
1096     ## config = ConfigObj(options.config_file, fileerror=True)
1097     ##
1098     ##
1099     ## if options.verbose:
1100     ## stout.verbosity = 10 - options.verbose # a higher verbosity level here, actually means quiter
1101     ## else:
1102     ## stout.verbosity = 0 # except for 0, which means silent
1103     ## if options.quiet: # if the quiet option is explicitly set
1104     ## stout.verbosity = 0
1105     ## stout.priority = 2
1106     ## print 'Welcome to dataenc - the data encoding and interleaving program by Fuzzyman'
1107     ## print 'See http://www.voidspace.org.uk/atlantibots/pythonutils.html'
1108     ## print 'Written in Python.'
1109     ## stout.priority = 3
1110     ## if not psycoin:
1111     ## print 'Having the Psyco module installed (Python Specialising compiler) would vastly speed up dataenc.'
1112     ## if not DATEIN:
1113     ## print 'Some of the datestamping features are only available when the dateutils module is available.'
1114     ## stout.priority = 5
1115     ##
1116     ##
1117     ## if options.test:
1118     test()
1119    
1120    
1121    
1122    
1123    
1124    
1125     """
1126    
1127     BUGS
1128     No more known bugs... yet.
1129     I'm sure they'll surface.
1130    
1131     ISSUES
1132     binleave and bin_unleave are still quite slow.
1133     For stamping small password hashes with a date stamp it's fast enough - for weaving larger files together it's *too slow*.
1134     Also for weaving similar sized files together we may be better with a pattern of 2 bits of water mark per 3 bits of string.
1135     (or a 3 to 4 or 5 to 7 etc..)
1136     Currrently it will only work with 1 bit of watermark per 1 or 2 or 3 or 4 etc bits of main string. (Exact multiples)
1137     Again, for small watermarks this works fine - and as that is all I'm using it for I'm not inclined to change it.
1138     The logic would be simple - just fiddly.
1139    
1140    
1141     TODO :
1142     Might make it a simple application - so it can be used from the command line for encoding, decoding
1143     timestamping and combining files.....
1144     Could replace use of the BINLIST and the if test with a single inline statement with more << >> in binleave and binunleave
1145     Could move the binleave and binuleave into C
1146    
1147    
1148     CHANGELOG
1149     13-09-04 Version 1.1.5
1150     Increased speed in table_enc and table_dec.
1151    
1152     30-08-04 Version 1.1.4
1153     Slight docs improvement.
1154     Slight speed improvement in binleave and binunleave.
1155    
1156     22-08-04 Version 1.1.3
1157     Added the unexpired alias and the check_pass function.
1158     Changed license text.
1159     Minor preemptive bugfix in some default values.
1160    
1161     11-04-04 Version 1.1.2
1162     Added the expired function for testing validity of timestamps.
1163     Changed the TABLE to be URL safe for passing in forms using the 'GET' method.
1164     Added OLD_TABLE with the old encoding, and gave table_dec and table_enc the ability to receive an explicit TABLE.
1165    
1166     07-04-04 Version 1.1.1
1167     Improved the tests a bit.
1168     Corrected a bug that affected large files or large files with small watermarks.
1169    
1170     05-04-04 Version 1.1.0
1171     Replaced the bf object with much faster bitwise logical operations. It is now about 2.5 times faster.
1172     With Psyco enabled it becomes 11 times faster than the first version....
1173     Added the bit setting and testing operations as functions.
1174    
1175     03-04-04 Version 1.0.0
1176     Initial testing is a success.
1177    
1178    
1179     """

Managed by UCC Webmasters ViewVC Help
Powered by ViewVC 1.1.26