Check-in [ea1817df1f]
Not logged in

Many hyperlinks are disabled.
Use anonymous login to enable hyperlinks.

Overview
Comment:Updated documentation about watermarking.
Timelines: family | ancestors | descendants | both | dev
Files: files | file ages | folders
SHA1:ea1817df1fdd85ac9eb89c5b79aa680a0dab0fa1
User & Date: mvnathan 2014-09-19 04:20:20
Context
2014-09-19
04:57
Pass database object to all the commands. check-in: 1a510a86e6 user: mvnathan tags: dev
04:20
Updated documentation about watermarking. check-in: ea1817df1f user: mvnathan tags: dev
00:36
Create and populate watermark table in a transaction. check-in: a23e710131 user: mvnathan tags: dev
Changes

Changes to py/morglib/database.py.

180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204

205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246





247
248
249
250
251
252
253
254
...
291
292
293
294
295
296
297

298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318

319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
...
364
365
366
367
368
369
370

371
372
373
374
375
376
377
378
379
380
381
382

383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
...
433
434
435
436
437
438
439


440
441

442
443
444
445
446
447
448
...
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
...
553
554
555
556
557
558
559






























































560
561
562
563
564
565
566
...
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
...
721
722
723
724
725
726
727

728
729
730
731
732
733
734
735
736
737
738
739
740
741
742
743
744
745
746
747
748
        @param properties (dict) Initial set of task properties.

        This method opens the SQLite file passed to it. If the file
        doesn't already exist, it will be created (which could involve
        creating the entire directory hierarchy specified in the
        tasks_file parameter).

        Failure to open the SQLite file will result in a connection_error
        exception.

        Once the database has been successfully opened, the constructor
        will proceed to check if the database is empty or not. In an
        empty database, Morg will create a watermark table so that it can
        subsequently identify that SQLite file as one that it can use.

        Failure to create the watermark table will usually lead to a
        sql_error. However, to populate the watermark table, Morg uses
        the current time, host name, user name, and an SHA-1 hex string.
        Thus, depending on exactly which of these steps fails, some other
        exception may also be raised. (Though that should be quite rare.)

        If the database is not empty, Morg will perform a sanity check
        using the watermark table to see if the database to which it is
        connected was its creation and, therefore, usable by it. If that
        is not the case, the constructor will raise a sanity_check_error.


        Please note that the watermark system Morg uses is not foolproof.
        You can easily enough insert and populate such a table in an
        arbitrary SQLite file and trick Morg into using that file as its
        database. Alternatively, you can remove the watermark table from
        an actual Morg database and prevent Morg from using that.

        However, it requires a small amount of effort to manually mangle
        SQLite files in this way and the payoff, at least in this case,
        is pretty much non-existent. Thus, the purpose of the watermark
        is simply to record some metadata and to assure Morg that the
        SQLite file it is using is most likely a valid Morg database
        rather than to enforce some strict, tamperproof barrier that
        prevents intentional or unintentional cross-application data
        pollution.

        After the watermark creation or verification phase, the
        constructor proceeds to creating the basic database structure
        required by Morg. At the very least, Morg needs a <tt>task</tt>
        table to record all your tasks as well as several property tables
        that define and record the various properties associated with
        tasks. Preexisting task and property tables will be left
        untouched.

        However, the database constructor allows you to define new
        properties during initialization. These initial properties are
        specified via the properties parameter, which is a dict that maps
        strings to 2-tuples. The key is the name of a property. The
        2-tuple corresponding to the property name contains the property
        type and its range.

        The property type can be one of the following (case-insensitive)
        strings:

        @li <tt>integer</tt>
        @li <tt>real</tt>
        @li <tt>text</tt>

        The property range may be either <tt>None</tt> or a string
        describing the range of values that particular property may take
        on. Commas can be used to separate items in a list of discrete
        values. Double dots can be used for numeric ranges. Here are a





        few illustrative examples:

        @verbatim
            props = {'role'       : ('text',    'President,Glutton,Buffoon'),
                     'priority'   : ('integer', '1,2,3,5,7'),
                     'importance' : ('real',    '-1..+1'),
                     'description': ('text',    None)}
        @endverbatim
................................................................................
        else:
            logger.info('{} has data'.format(tasks_file))
            self._sanity_check(tasks_file)

        self._init_task_table()
        self._init_property_tables(properties)


    def _init_morg_table(self):
        '''Create Morg's identification table.

        This internal method creates and populates the <tt>morg</tt>
        table, which is used to identify SQLite files as "belonging" to
        Morg.

        '''
        with self._db:
            logger.info('creating morg table')
            wm = watermark()
            wm.dump()
            self.execute(wm.create())

            logger.info('populating morg table')
            columns = wm.columns()
            insert  = ('insert into morg ({}) values ({})'.
                       format(','.join(columns),
                              ','.join('?' * len(columns))))
            self.execute(insert, tuple(wm))


    def _sanity_check(self, tasks_file):
        '''Try and confirm we're dealing with a Morg database.

        @param tasks_file (string) Name of SQLite file.

        Morg creates a table named <tt>morg</tt> in which it stores the
        following information:

        @li Database schema version
        @li Database creation timestamp
        @li Name of host on which database was created
        @li Name of user who created the database
        @li A hex string that encodes the above plus a fixed string

        This internal method reads the above-mentioned table and confirms
        that its contents "look good." Any failures in the data retrieval
        and/or verification steps will result in a sanity_check_error
        exception.

        @note The sanity check Morg implements is very basic and is
        easily subverted. The idea is simply to ensure that a given
        SQLite file is <em>likely</em> a Morg database that the
        application can use. You could create a table named <tt>morg</tt>
        in an arbitrary SQLite file and fill it with the things this test
        looks for and, thereby, deceive Morg into using that file as its
        database.

        @par
        Well, if that kind of thing appeals to you, go for it.

        '''
        logger.info('checking morg table')
        try:
            morg = self.execute('select * from morg')
        except sql_error, e:
            raise sanity_check_error(tasks_file,
                                     'unable to get data from morg table')

................................................................................

        wm = watermark(morg[0])
        wm.dump()
        if (not wm):
            raise sanity_check_error(tasks_file, 'magic phrase mismatch')
        logger.info('looks like this is a morg database')


    def _init_task_table(self):
        '''Create the task table if necessary.

        This internal method sets up the <tt>task</tt> table, which
        simply records all the tasks stored in the database.

        '''
        logger.info('creating task table')
        create =  '''create table if not exists task(
                         id integer primary key autoincrement)'''
        self.execute(create)


    def _init_property_tables(self, properties):
        '''Create and/or update the property tables.

        @param properties (dict) Initial set of task properties.

        This internal method sets up the <tt>property</tt> table and the
        corresponding <tt>property_NNN</tt> tables.

        If you specify properties that are already defined in the
        <tt>property</tt> table, we will ignore them. However,
        non-existent properties will be added.

        Extant properties that are not in the dictionary you provide will
        not be deleted. That is, this function will add new properties
        but will not delete or modify preexisting ones.

        @note The properties parameter is described in the constructor's
        documentation.

        '''
        logger.info('creating property table')
        create =  '''create table if not exists property(
                         id    integer primary key autoincrement,
                         name  text not null collate nocase unique,
                         type  text not null collate nocase
                                   check (type in ('integer', 'real', 'text')),
                         range text)'''
................................................................................

        This function inserts the given property name, type, and range
        into Morg's "master" <tt>property</tt> table.

        The name of a property can be pretty much anything you like.
        However, keep in mind that property names are case-insensitive
        and must be unique, non-empty strings. Thus, attempting to


        duplicate a property or using a zero-length string will result in
        an exception.


        The property type can be one of the following (case-insensitive)
        strings:

        @li <tt>integer</tt>
        @li <tt>real</tt>
        @li <tt>text</tt>
................................................................................
        Once a connection to the Morg SQLite database has been
        established, clients can use this function to see if that
        database actually has any data in it or not.

        This function executes a SQL query to determine whether or not
        the database is empty. If that query fails for some reason or
        returns an unexpected result, this function will raise a
        sql_error exception.

        '''
        query = 'select count(*) from sqlite_master where type = "table"'
        rows  = self.execute(query)

        r = len(rows)
        if (r != 1):
................................................................................
class watermark:
    '''Helper class for Morg's database watermark.

    This class eases creation and verification of the <tt>morg</tt>
    watermark table so that Morg can identify SQLite files as "belonging"
    to it.































































    '''
    def __init__(self, metadata = None):
        '''Construct a watermark object.

        @param metadata (tuple or dict) Watermark table's data.

        The <tt>morg</tt> table contains exactly one row, which has the
................................................................................
        The time stamp records the creation time (GMT) of the database in
        the format <tt>yyyy-mm-dd HH:MM:SS</tt>.

        The host name records the fully qualified domain name of the
        machine on which this Morg database was created.

        The user name identifies the login name of the user who created
        the database. If this cannot be determined, this field will be
        recorded as <tt>unidentifiable_user</tt>.

        Finally, the magic phrase is the SHA-1 ID of the string produced
        by concatenating the schema version, time stamp, host name, user
        name, and a hard-coded identification string used by Morg.

        If the metadata parameter is not given, this constructor will use
        default values for the above-mentioned fields of the watermark
        table. If it is given in the form of a dict, we expect the keys
        to be the column names stated above and the values to be as
        described above. If given as a tuple, we expect the tuple to
        contain only the values.

        '''
        self._metadata = {}
        self._metadata['schema_version'] = database.SCHEMA_VERSION
        try:
            self._metadata['timestamp' ] = time.strftime('%Y-%m-%d %H:%M:%S',
                                                        time.gmtime())
................................................................................
                                 "using .. in text column's range spec")
        bounds = range_spec.split('..')
        if (len(bounds) != 2):
            raise property_error(column_name, column_type, range_spec,
                                 "more than one .. in " +
                                 "numeric column's range spec")
        try:

            ctor = int if (column_type == 'integer') else float
            bounds = map(lambda v: ctor(v), bounds)
        except Exception, e:
            raise property_error(column_name, column_type, range_spec, e)
        return ('check ({} between {} and {})'.
                format(column_name, min(bounds), max(bounds)))

    # Assume we're dealing with discrete values
    if (column_type == 'text'):
        fmt = lambda v: "'{}'".format(v)
    elif (column_type == 'integer'):
        fmt = lambda v: '{}'.format(int(v))
    else:
        fmt = lambda v: '{}'.format(float(v))
    try:
        return ('check ({} in ({}))'.
                format(column_name,
                       ','.join(map(fmt, range_spec.split(',')))))
    except Exception, e:
        raise property_error(column_name, column_type, range_spec, e)








|
|






|
|
<
<
<




|
>

<
|
|
<
<
<
<
<
<
<
<
<
<
<












|
|
|











|
>
>
>
>
>
|







 







>

<
<
<
<
<
<
<













>

<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<







 







>

<
<
<
<
<
<





>

<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<







 







>
>
|
<
>







 







|







 







>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>







 







|
<










|







 







>









|

|

|







180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196



197
198
199
200
201
202
203

204
205











206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
...
282
283
284
285
286
287
288
289
290







291
292
293
294
295
296
297
298
299
300
301
302
303
304
305






























306
307
308
309
310
311
312
...
320
321
322
323
324
325
326
327
328






329
330
331
332
333
334
335



















336
337
338
339
340
341
342
...
366
367
368
369
370
371
372
373
374
375

376
377
378
379
380
381
382
383
...
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
...
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
...
576
577
578
579
580
581
582
583

584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
...
717
718
719
720
721
722
723
724
725
726
727
728
729
730
731
732
733
734
735
736
737
738
739
740
741
742
743
744
745
        @param properties (dict) Initial set of task properties.

        This method opens the SQLite file passed to it. If the file
        doesn't already exist, it will be created (which could involve
        creating the entire directory hierarchy specified in the
        tasks_file parameter).

        Failure to open the SQLite file will result in a
        <tt>connection_error</tt> exception.

        Once the database has been successfully opened, the constructor
        will proceed to check if the database is empty or not. In an
        empty database, Morg will create a watermark table so that it can
        subsequently identify that SQLite file as one that it can use.

        Failure to create the watermark table will lead to a
        <tt>sql_error</tt>.




        If the database is not empty, Morg will perform a sanity check
        using the watermark table to see if the database to which it is
        connected was its creation and, therefore, usable by it. If that
        is not the case, the constructor will raise a
        <tt>sanity_check_error</tt>.


        Refer to the documentation of the <tt>watermark</tt> class for
        further information about Morg's simple watermarking system.












        After the watermark creation or verification phase, the
        constructor proceeds to creating the basic database structure
        required by Morg. At the very least, Morg needs a <tt>task</tt>
        table to record all your tasks as well as several property tables
        that define and record the various properties associated with
        tasks. Preexisting task and property tables will be left
        untouched.

        However, the database constructor allows you to define new
        properties during initialization. These initial properties are
        specified via the properties parameter, which is a dict that maps
        strings to 2-tuples. The dict keys are the names of properties.
        The 2-tuple corresponding to each key, i.e., property name,
        should contain the property type and its range.

        The property type can be one of the following (case-insensitive)
        strings:

        @li <tt>integer</tt>
        @li <tt>real</tt>
        @li <tt>text</tt>

        The property range may be either <tt>None</tt> or a string
        describing the range of values that particular property may take
        on. Commas can be used to separate items in a list of discrete
        values. Double dots can be used for numeric ranges. Note that
        even if a particular property does not have any restrictions on
        its range, its 2-tuple in the properties dict cannot omit the
        <tt>None</tt> for the second element; i.e., each key's value
        <b>must</b> be a 2-tuple.

        Here are a few illustrative examples:

        @verbatim
            props = {'role'       : ('text',    'President,Glutton,Buffoon'),
                     'priority'   : ('integer', '1,2,3,5,7'),
                     'importance' : ('real',    '-1..+1'),
                     'description': ('text',    None)}
        @endverbatim
................................................................................
        else:
            logger.info('{} has data'.format(tasks_file))
            self._sanity_check(tasks_file)

        self._init_task_table()
        self._init_property_tables(properties)

    # Create and populate watermark table
    def _init_morg_table(self):







        with self._db:
            logger.info('creating morg table')
            wm = watermark()
            wm.dump()
            self.execute(wm.create())

            logger.info('populating morg table')
            columns = wm.columns()
            insert  = ('insert into morg ({}) values ({})'.
                       format(','.join(columns),
                              ','.join('?' * len(columns))))
            self.execute(insert, tuple(wm))

    # Try and confirm we're dealing with a Morg database
    def _sanity_check(self, tasks_file):






























        logger.info('checking morg table')
        try:
            morg = self.execute('select * from morg')
        except sql_error, e:
            raise sanity_check_error(tasks_file,
                                     'unable to get data from morg table')

................................................................................

        wm = watermark(morg[0])
        wm.dump()
        if (not wm):
            raise sanity_check_error(tasks_file, 'magic phrase mismatch')
        logger.info('looks like this is a morg database')

    # Create task table if necessary
    def _init_task_table(self):






        logger.info('creating task table')
        create =  '''create table if not exists task(
                         id integer primary key autoincrement)'''
        self.execute(create)

    # Create and/or update the property tables
    def _init_property_tables(self, properties):



















        logger.info('creating property table')
        create =  '''create table if not exists property(
                         id    integer primary key autoincrement,
                         name  text not null collate nocase unique,
                         type  text not null collate nocase
                                   check (type in ('integer', 'real', 'text')),
                         range text)'''
................................................................................

        This function inserts the given property name, type, and range
        into Morg's "master" <tt>property</tt> table.

        The name of a property can be pretty much anything you like.
        However, keep in mind that property names are case-insensitive
        and must be unique, non-empty strings. Thus, attempting to
        duplicate a property will end up violating the relevant SQL
        column constraint and result in a <tt>sql_error</tt>. Trying to
        use a zero-length string will result in a

        <tt>property_error</tt>.

        The property type can be one of the following (case-insensitive)
        strings:

        @li <tt>integer</tt>
        @li <tt>real</tt>
        @li <tt>text</tt>
................................................................................
        Once a connection to the Morg SQLite database has been
        established, clients can use this function to see if that
        database actually has any data in it or not.

        This function executes a SQL query to determine whether or not
        the database is empty. If that query fails for some reason or
        returns an unexpected result, this function will raise a
        <tt>sql_error</tt> exception.

        '''
        query = 'select count(*) from sqlite_master where type = "table"'
        rows  = self.execute(query)

        r = len(rows)
        if (r != 1):
................................................................................
class watermark:
    '''Helper class for Morg's database watermark.

    This class eases creation and verification of the <tt>morg</tt>
    watermark table so that Morg can identify SQLite files as "belonging"
    to it.

    Please note that the watermark system Morg uses is not foolproof. You
    can easily enough insert and populate such a table in an arbitrary
    SQLite file and "trick" Morg into using that file as its database. As
    a matter of fact, this class is designed specifically to facilitate
    such "Morgification" without requiring you to jump through
    unnecessary hoops.

    For example, assuming the <tt>morglib</tt> module is in your Python
    path, here is a quick Python program to generate a SQL script that
    you can then use to "Morgify" an SQLite file:

    @verbatim
        import morglib
        f = open('morgify.sql', 'w')
        f.write(morglib.database.watermark().sql())
    @endverbatim

    Now, you can fire up the SQLite command-line tool on any SQLite
    database and have it execute <tt>morgify.sql</tt> to create the
    watermark table so that Morg can use that file.

    Alternatively, you could do the whole thing from Python as
    illustrated below:

    @verbatim
        import morglib
        import apsw

        db = apsw.Connection('foo.sqlite')
        with db:
            wm  = morglib.database.watermark()
            cur = db.cursor()
            cur.execute(wm.create())
            cur.execute('insert into morg values (?,?,?,?,?)', tuple(wm))
    @endverbatim

    After you run the above program, <tt>foo.sqlite</tt> will have the
    necessary magic to allow Morg to use it as a tasks database.

    To prevent Morg from using an SQLite file, all you have to do is
    delete the <tt>morg</tt> table from it (or just mangle its contents
    so that the sanity check fails). <tt>morglib</tt>, however, does not
    provide a straightforward means of demorgification of a database. For
    that, you'll have to resort to the SQLite command-line.

    One possible use for Morgification is to be able to share the same
    database between Morg and another application by storing all their
    data in the same SQLite file. As long as the two applications don't
    step on each others' toes (e.g., naming their tables the same), this
    should work just fine.

    Having said that, however, simply because you can do something
    doesn't mean you should.

    @note The purpose of the Morg watermark is only to record some
    metadata and to assure Morg that the SQLite file it is using is most
    likely a valid Morg database. The idea is not to enforce some strict,
    tamperproof barrier that prevents intentional or unintentional
    cross-application data pollution. That may not even be feasible with
    SQLite; after all, you can always fire up the SQLite tool and do
    whatever you like to the data in there.

    '''
    def __init__(self, metadata = None):
        '''Construct a watermark object.

        @param metadata (tuple or dict) Watermark table's data.

        The <tt>morg</tt> table contains exactly one row, which has the
................................................................................
        The time stamp records the creation time (GMT) of the database in
        the format <tt>yyyy-mm-dd HH:MM:SS</tt>.

        The host name records the fully qualified domain name of the
        machine on which this Morg database was created.

        The user name identifies the login name of the user who created
        the database.


        Finally, the magic phrase is the SHA-1 ID of the string produced
        by concatenating the schema version, time stamp, host name, user
        name, and a hard-coded identification string used by Morg.

        If the metadata parameter is not given, this constructor will use
        default values for the above-mentioned fields of the watermark
        table. If it is given in the form of a dict, we expect the keys
        to be the column names stated above and the values to be as
        described above. If given as a tuple, we expect the tuple to
        contain only the values in the order shown above.

        '''
        self._metadata = {}
        self._metadata['schema_version'] = database.SCHEMA_VERSION
        try:
            self._metadata['timestamp' ] = time.strftime('%Y-%m-%d %H:%M:%S',
                                                        time.gmtime())
................................................................................
                                 "using .. in text column's range spec")
        bounds = range_spec.split('..')
        if (len(bounds) != 2):
            raise property_error(column_name, column_type, range_spec,
                                 "more than one .. in " +
                                 "numeric column's range spec")
        try:
            # Convert bounds from string to int or float
            ctor = int if (column_type == 'integer') else float
            bounds = map(lambda v: ctor(v), bounds)
        except Exception, e:
            raise property_error(column_name, column_type, range_spec, e)
        return ('check ({} between {} and {})'.
                format(column_name, min(bounds), max(bounds)))

    # Assume we're dealing with discrete values
    if (column_type == 'text'):
        fmt = lambda v: "'{}'".format(v) # wrap strings in single quotes
    elif (column_type == 'integer'):
        fmt = lambda v: '{}'.format(int(v)) # convert strings to int
    else:
        fmt = lambda v: '{}'.format(float(v)) # convert strings to float
    try:
        return ('check ({} in ({}))'.
                format(column_name,
                       ','.join(map(fmt, range_spec.split(',')))))
    except Exception, e:
        raise property_error(column_name, column_type, range_spec, e)

Changes to wiki/todo.wiki.

18
19
20
21
22
23
24

25
26
27
28
29
30
  *  Add default properties dict.
  *  Create <tt>property_NNN</tt> tables on database init.
  *  Implement sanity check on database initialization.
  *  Update database constructor doc string.
  *  Use the BETWEEN operator in property range constraints.
  *  Implement a watermark class to ease verification and "Morgification."
  *  The watermark table should be created and populated in a transaction.


<h2>PENDING</h2>

  *  Update doc strings to reflect recent changes about the watermark.
  *  Pass database to all commands' <tt>__call__</tt> method.
  *  Implement <tt>new</tt> command.







>



<


18
19
20
21
22
23
24
25
26
27
28

29
30
  *  Add default properties dict.
  *  Create <tt>property_NNN</tt> tables on database init.
  *  Implement sanity check on database initialization.
  *  Update database constructor doc string.
  *  Use the BETWEEN operator in property range constraints.
  *  Implement a watermark class to ease verification and "Morgification."
  *  The watermark table should be created and populated in a transaction.
  *  Update doc strings to reflect recent changes about the watermark.

<h2>PENDING</h2>


  *  Pass database to all commands' <tt>__call__</tt> method.
  *  Implement <tt>new</tt> command.