Check-in [5013209038]
Not logged in

Many hyperlinks are disabled.
Use anonymous login to enable hyperlinks.

Overview
Comment:Added a helper class to ease database Morgification and verification.
Timelines: family | ancestors | descendants | both | dev
Files: files | file ages | folders
SHA1: 50132090387dffcc72f9dac5100ca60b5b115abf
User & Date: mvnathan 2014-09-19 00:33:12.619
Context
2014-09-19
00:36
Create and populate watermark table in a transaction. check-in: a23e710131 user: mvnathan tags: dev
00:33
Added a helper class to ease database Morgification and verification. check-in: 5013209038 user: mvnathan tags: dev
2014-09-18
21:13
Use between operator for numeric property range constraints instead of the relational operators. check-in: 0caa6b62ab user: mvnathan tags: dev
Changes
Unified Diff Ignore Whitespace Patch
Changes to py/morglib/database.py.
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
        self._init_property_tables(properties)

    def _init_morg_table(self):
        '''Create Morg's identification table.

        This internal method creates and populates the <tt>morg</tt>
        table, which is used to identify SQLite files as "belonging" to
        Morg. The <tt>morg</tt> table contains exactly one row, which has
        the following columns:

        @li <tt>schema_version</tt>
        @li <tt>timestamp</tt>
        @li <tt>hostname</tt>
        @li <tt>username</tt>
        @li <tt>magic_phrase</tt>

        The schema version is an integer that helps track the structure
        of the database. It can be used, for example, to upgrade older
        data models to more current ones.

        The time stamp records the creation time (GMT) of the database in
        the format <tt>yyyy-mm-dd HH:MM:SS</tt>.

        The host name records the fully qualified domain name of the
        machine on which this Morg database was created.

        The user name identifies the login name of the user who created
        the database. If this cannot be determined, this field will be
        recorded as <tt>unidentifiable_user</tt>.

        Finally, the magic phrase is the SHA-1 ID of the string produced
        by concatenating the schema version, time stamp, host name, user
        name, and a hard-coded identification string used by Morg.

        '''
        logger.info('creating morg table')
        create =  '''create table if not exists morg(
                         schema_version  integer,
                         timestamp       text,
                         hostname        text,
                         username        text,
                         magic_phrase    text)'''
        self.execute(create)

        logger.info('populating morg table')
        ts = time.strftime('%Y-%m-%d %H:%M:%S', time.gmtime())
        hn = socket.getfqdn(socket.gethostname())
        un = get_user_name()
        mp = morg_id(self.SCHEMA_VERSION, ts, hn, un, self.MAGIC_PHRASE)
        columns =  ('schema_version',
                    'timestamp',
                    'hostname',
                    'username',
                    'magic_phrase')
        values  =  (self.SCHEMA_VERSION,
                    ts,
                    hn,
                    un,
                    mp)
        insert  = ('insert into morg ({}) values ({})'.
                   format(','.join(columns), ','.join('?' * len(columns))))
        self.execute(insert, values)

    def _sanity_check(self, tasks_file):
        '''Try and confirm we're dealing with a Morg database.

        @param tasks_file (string) Name of SQLite file.

        Morg creates a table named <tt>morg</tt> in which it stores the







<
<
|
<
<
<
<
<

<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<


|
<
|
<
<
<
|


<
<
<
<
|
<
<
<
<
<
<
<
<
<


|







296
297
298
299
300
301
302


303





304


















305
306
307

308



309
310
311




312









313
314
315
316
317
318
319
320
321
322
        self._init_property_tables(properties)

    def _init_morg_table(self):
        '''Create Morg's identification table.

        This internal method creates and populates the <tt>morg</tt>
        table, which is used to identify SQLite files as "belonging" to


        Morg.
























        '''
        logger.info('creating morg table')
        wm = watermark()

        wm.dump()



        self.execute(wm.create())

        logger.info('populating morg table')




        columns = wm.columns()









        insert  = ('insert into morg ({}) values ({})'.
                   format(','.join(columns), ','.join('?' * len(columns))))
        self.execute(insert, tuple(wm))

    def _sanity_check(self, tasks_file):
        '''Try and confirm we're dealing with a Morg database.

        @param tasks_file (string) Name of SQLite file.

        Morg creates a table named <tt>morg</tt> in which it stores the
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
        logger.debug('morg table has {} rows'.format(n))
        if (n < 1):
            raise sanity_check_error(tasks_file, 'morg table has no data')
        if (n > 1):
            raise sanity_check_error(tasks_file,
                                     'morg table has too much data')

        v,t,h,u,i = morg[0]
        logger.debug('schema version = {}, Morg ID = {}'.format(v, i))

        x = morg_id(v, t, h, u, self.MAGIC_PHRASE)
        if (i != x):
            msg = 'stored Morg ID {} does not match expected ID {}'.format(i, x)
            raise sanity_check_error(tasks_file, msg)
        logger.info('looks like this is a morg database')

    def _init_task_table(self):
        '''Create the task table if necessary.

        This internal method sets up the <tt>task</tt> table, which
        simply records all the tasks stored in the database.







|
<
|
<
|
<
|







356
357
358
359
360
361
362
363

364

365

366
367
368
369
370
371
372
373
        logger.debug('morg table has {} rows'.format(n))
        if (n < 1):
            raise sanity_check_error(tasks_file, 'morg table has no data')
        if (n > 1):
            raise sanity_check_error(tasks_file,
                                     'morg table has too much data')

        wm = watermark(morg[0])

        wm.dump()

        if (not wm):

            raise sanity_check_error(tasks_file, 'magic phrase mismatch')
        logger.info('looks like this is a morg database')

    def _init_task_table(self):
        '''Create the task table if necessary.

        This internal method sets up the <tt>task</tt> table, which
        simply records all the tasks stored in the database.
586
587
588
589
590
591
592














































































































































593
594
595
596
597
598
599
        try:
            logger.debug('executing query: {}'.format(query))
            cursor = self._db.cursor()
            return list(cursor.execute(query, bindings))
        except apsw.Error, e:
            logger.error('failed query: {}; reason: {}'.format(query, e))
            raise sql_error(query, e)















































































































































#------------------------------- HELPERS --------------------------------

def range_constraint(column_name, column_type, range_spec):
    '''Parse range specification and return column constraint clause.

    @param  column_name (string) Name of column that needs a constraint.







>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>







541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
        try:
            logger.debug('executing query: {}'.format(query))
            cursor = self._db.cursor()
            return list(cursor.execute(query, bindings))
        except apsw.Error, e:
            logger.error('failed query: {}; reason: {}'.format(query, e))
            raise sql_error(query, e)

#----------------------- WATERMARK ENCAPSULATION ------------------------

class watermark:
    '''Helper class for Morg's database watermark.

    This class eases creation and verification of the <tt>morg</tt>
    watermark table so that Morg can identify SQLite files as "belonging"
    to it.

    '''
    def __init__(self, metadata = None):
        '''Construct a watermark object.

        @param metadata (tuple or dict) Watermark table's data.

        The <tt>morg</tt> table contains exactly one row, which has the
        following columns:

        @li <tt>schema_version</tt>
        @li <tt>timestamp</tt>
        @li <tt>hostname</tt>
        @li <tt>username</tt>
        @li <tt>magic_phrase</tt>

        The schema version is an integer that helps track the structure
        of the database. It can be used, for example, to upgrade older
        data models to more current ones.

        The time stamp records the creation time (GMT) of the database in
        the format <tt>yyyy-mm-dd HH:MM:SS</tt>.

        The host name records the fully qualified domain name of the
        machine on which this Morg database was created.

        The user name identifies the login name of the user who created
        the database. If this cannot be determined, this field will be
        recorded as <tt>unidentifiable_user</tt>.

        Finally, the magic phrase is the SHA-1 ID of the string produced
        by concatenating the schema version, time stamp, host name, user
        name, and a hard-coded identification string used by Morg.

        If the metadata parameter is not given, this constructor will use
        default values for the above-mentioned fields of the watermark
        table. If it is given in the form of a dict, we expect the keys
        to be the column names stated above and the values to be as
        described above. If given as a tuple, we expect the tuple to
        contain only the values.

        '''
        self._metadata = {}
        self._metadata['schema_version'] = database.SCHEMA_VERSION
        try:
            self._metadata['timestamp' ] = time.strftime('%Y-%m-%d %H:%M:%S',
                                                        time.gmtime())
        except Exception:
            logger.warning('unable to get current time')
            self._metadata['timestamp' ] = '0000-00-00 00:00:00'
        try:
            self._metadata['hostname'  ] = socket.getfqdn(socket.gethostname())
        except Exception:
            logger.warning('unable to get hostname')
            self._metadata['hostname'  ] = 'unknown.host.name'
        try:
            self._metadata['username'  ] = getpass.getuser()
        except Exception:
            logger.warning('unable to get current login name')
            self._metadata['username'  ] = 'anonymous'
        self._metadata['magic_phrase'  ] = self.magic_phrase()

        if (isinstance(metadata, dict)):
            for key in self._metadata  :
                if (key  in  metadata) :
                    self._metadata[key] = metadata[key]

        if (isinstance(metadata,tuple)):
            for key, val in zip(self.columns(), metadata):
                self._metadata[key] = val

    def __iter__(self):
        '''Iterator over watermark metadata (in correct order).'''
        class _iterator_adaptor:
            def __init__(self, wm):
                self._metadata = []
                for key in wm.columns():
                    self._metadata.append((key, wm._metadata[key]))
                self._iter = iter(self._metadata)

            def next(self):
                column, value = self._iter.next()
                return  value

        return _iterator_adaptor(self)

    def __nonzero__(self):
        '''Confirm that stored and computed magic phrases match.'''
        stored   = self._metadata['magic_phrase']
        computed = self.magic_phrase()
        logger.debug('magic phrase: stored = {}, computed = {}'.
                     format(stored, computed))
        return stored == computed

    def columns(self):
        '''Return column names of <tt>morg</tt> table in correct order.'''
        return ('schema_version',
                'timestamp',
                'hostname' ,
                'username' ,
                'magic_phrase')

    def magic_phrase(self):
        '''Encode metadata into an identity string.'''
        metadata = []
        for key in self.columns()[:-1]:
            metadata.append(str(self._metadata[key]))
        metadata.append(database.MAGIC_PHRASE)
        return hashlib.sha1('\n'.join(metadata)).hexdigest()

    def create(self):
        '''Return SQL statement for creating watermark table.'''
        return '''create table if not exists morg(
                      schema_version  integer,
                      timestamp       text,
                      hostname        text,
                      username        text,
                      magic_phrase    text)'''

    def sql(self):
        '''Return SQL statements for creating and populating watermark table.'''
        insert = ('insert into morg (\n        {})\n    values (\n        {})'.
                  format(',\n        '.join(self.columns()),
                         ',\n        '.join(map(lambda v: ("'{}'".format(v)
                                                           if isinstance(v, str)
                                                           else str(v)),
                                                tuple(self)))))
        return '{};\n{};\n'.format(self.create(), insert)

    def dump(self):
        '''Debug support.'''
        for key, val in self._metadata.iteritems():
            logger.debug('{} = {}'.format(key, val))

#------------------------------- HELPERS --------------------------------

def range_constraint(column_name, column_type, range_spec):
    '''Parse range specification and return column constraint clause.

    @param  column_name (string) Name of column that needs a constraint.
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
    try:
        return ('check ({} in ({}))'.
                format(column_name,
                       ','.join(map(fmt, range_spec.split(',')))))
    except Exception, e:
        raise property_error(column_name, column_type, range_spec, e)


def get_user_name():
    '''Wrapper around getpass.getuser() to handle any exceptions.'''
    try:
        return getpass.getuser()
    except Exception:
        return 'unidentifiable_user'

def morg_id(schema_version, timestamp, hostname, username, magic_phrase):
    '''Encode parameters into a unique ID for a Morg database.

    @param  schema_version (int) Morg database schema version.
    @param  timestamp (string) Creation time of Morg database.
    @param  hostname  (string) Name of machine on which database was created.
    @param  username  (string) Name of user who created database.
    @param  magic_phrase (string) Fixed string used by Morg for watermarking.
    @return Hex string containing unique ID encoding above parameters.

    '''
    s = '{}\n{}\n{}\n{}\n{}'.format(schema_version,
                                    timestamp,
                                    hostname,
                                    username,
                                    magic_phrase)
    return hashlib.sha1(s).hexdigest()

#------------------------------------------------------------------------

##############################################
# Editor config:                             #
##############################################
# Local Variables:                           #
# indent-tabs-mode: nil                      #
# py-indent-offset: 4                        #
# python-indent: 4                           #
# End:                                       #
##############################################
# vim: set expandtab shiftwidth=4 tabstop=4: #
##############################################







<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<













740
741
742
743
744
745
746


























747
748
749
750
751
752
753
754
755
756
757
758
759
    try:
        return ('check ({} in ({}))'.
                format(column_name,
                       ','.join(map(fmt, range_spec.split(',')))))
    except Exception, e:
        raise property_error(column_name, column_type, range_spec, e)



























#------------------------------------------------------------------------

##############################################
# Editor config:                             #
##############################################
# Local Variables:                           #
# indent-tabs-mode: nil                      #
# py-indent-offset: 4                        #
# python-indent: 4                           #
# End:                                       #
##############################################
# vim: set expandtab shiftwidth=4 tabstop=4: #
##############################################
Changes to wiki/todo.wiki.
16
17
18
19
20
21
22

23
24
25
26
27

28
29
  *  Create <tt>task</tt> table on database init.
  *  Create <tt>property</tt> table on database init.
  *  Add default properties dict.
  *  Create <tt>property_NNN</tt> tables on database init.
  *  Implement sanity check on database initialization.
  *  Update database constructor doc string.
  *  Use the BETWEEN operator in property range constraints.


<h2>PENDING</h2>

  *  Implement a watermark class to ease verification and "Morgification."
  *  The watermark table should be created and populated in a transaction.

  *  Pass database to all commands' <tt>__call__</tt> method.
  *  Implement <tt>new</tt> command.







>



<

>


16
17
18
19
20
21
22
23
24
25
26

27
28
29
30
  *  Create <tt>task</tt> table on database init.
  *  Create <tt>property</tt> table on database init.
  *  Add default properties dict.
  *  Create <tt>property_NNN</tt> tables on database init.
  *  Implement sanity check on database initialization.
  *  Update database constructor doc string.
  *  Use the BETWEEN operator in property range constraints.
  *  Implement a watermark class to ease verification and "Morgification."

<h2>PENDING</h2>


  *  The watermark table should be created and populated in a transaction.
  *  Update doc strings to reflect recent changes about the watermark.
  *  Pass database to all commands' <tt>__call__</tt> method.
  *  Implement <tt>new</tt> command.