251_DBCorruption - PUG Challenge Americas
Download
Report
Transcript 251_DBCorruption - PUG Challenge Americas
Database Corruption
Be prepared, not scared.
Richard Banville
Fellow, OpenEdge Development
Progress Software
Dealing With Database Corruption
2
Preparation
• Prepare for the worst, hope for the best
Prevention
• Stopping corruption before it happens
• Avoiding foolish behavior
Detection
• Identifying you have a problem
• Pinpointing the cause
Reaction
• Resolving corruption with least impact
© 2013 Progress Software Corporation. All rights reserved.
Types Of Corruption
Corruption can be small or widespread
• User based corruption
• Internal system based corruption
Physical
• Block level corruption
• Hardware: Bad disk, memory, etc
Logical
• Missing Data
– Relational issues
• Data access
– Index issues
3
© 2013 Progress Software Corporation. All rights reserved.
Be Prepared
Modern Release (all facets of deployment)
Backups – perform regularly
• Backup database AND application
– Perform large backups with split mirrors
– Run online backup with –Bp
• TEST your backups with restore & access or hot stand-by
prorest
<db> -vp
– prorest
– Validates data written successfully (not proper data written)
– prorest
–vf: Compares against original, but who wants to be down that long?
prorest
<db> -vf
• Use offsite storage
Run with AI enabled
• Put AI files on separate disk/separate controller
• AI management tool makes AI management easy
4
© 2013 Progress Software Corporation. All rights reserved.
Be Really Prepared
Keep hot standby
• Continually roll forward AI files
• OpenEdge Replication
Have a comprehensive recovery strategy
• Audit changes
• Plan for natural disasters
• Plan for not so natural disasters
Document and test your recovery strategy
• Educate at all levels of organization
Implement redundancy
• Failover Clusters
• Have a duplicate remote site
5
© 2013 Progress Software Corporation. All rights reserved.
Database Consistency Checking
Seen these messages before?
Index name in customer for recid 16991 could not be deleted.
Wrong key in index 10 for record 2010.
Invalid size of an index entry.
6
© 2013 Progress Software Corporation. All rights reserved.
Database Consistency Checking
Or how about these…
Invalid RM block for area 10
rmdoins: pbk->free went negative dbkey 4096
bkwrite: bktbl dbk 4096 not equal to bkbuf dbk -1234
bkaddr called with negative blkaddr: -1234
7
© 2013 Progress Software Corporation. All rights reserved.
Database Consistency Checking
Stop shared memory problems before they happen
Memory overwrite protection
• -MemCheck
• Ensure block changes written to proper shm location
Buffer 1
Buffer 1
Insert new
key entry
Buffer 2
8
© 2013 Progress Software Corporation. All rights reserved.
Buffer 2
Oops! Miscalculation results in memory
stomp of next block header.
Database Consistency Checking
Stop database corruption from becoming persistent
Physical block consistency checking
• -DbCheck
• Validates record and index blocks after each update operation
– -AreaCheck “area name”
– -Index Check “index name”
– -TableCheck ‘table name”
Typically the result of a bug
Available for OLTP and roll forward
9
© 2013 Progress Software Corporation. All rights reserved.
Enabling Database Consistency Checking
Database startup parameter (-MemCheck, -DbCheck)
Managed via promon R&D, 4. Admin Functions
• 8. Block level consistency check
Current consistency check status:
1. -MemCheck:
enabled
2. -DbCheck:
enabled
3. –AreaCheck:
disabled
4. -IndexCheck:
disabled
5. -TableCheck:
disabled
Enter the option to enable/disable a consistency check:
10
© 2013 Progress Software Corporation. All rights reserved.
Database Consistency Checking Performance Impact
Memory checking: unnoticeable impact
Block level checking: still reasonable
On error, get .lg file to Progress Technical Support
Current consistency check status:
1. -MemCheck:
enabled
2. -DbCheck:
enabled
3. –AreaCheck:
disabled
4. -IndexCheck:
disabled
5. -TableCheck:
disabled
< 1%
~5%
Enter the option to enable/disable a consistency check:
11
© 2013 Progress Software Corporation. All rights reserved.
Identifying Problem Types and Reacting
There are many ways for data to get corrupted
Identifying corruption type
• Key word association can help direct recovery effort
• Understanding process can also help
Quickest way to recovery
• Knowing the tools & which to use is key
• Practice recovery efforts before needed
Let’s examine a few
12
© 2013 Progress Software Corporation. All rights reserved.
Index Issues
Index Messaging
• Index
• (ix, cx, ky)
•
• Root block
• B-tree
• Key entry
• Cursor
Index <i> in <t> for recid <r> could not be deleted. (1422)
– Logical corruption: Missing entries or record not found
•
Index <i>, block <b>, element no. 1: bad compression size. (4423)
– Physical corruption: Storage format of index is incorrect
How to proceed
13
© 2013 Progress Software Corporation. All rights reserved.
Index Validation Tools
proutil <db> -C idxcheck
• Idxcheck online validation levels
– Physical/Block corruption
o Physical consistency
– Logical/key entry corruption
o Keys to records
o Records to keys
o Validate key order
– Lock table option
• New index rebuild may be faster!
14
© 2013 Progress Software Corporation. All rights reserved.
Index Validation & Repair Tools
proutil <db> -C idxfix
Index Fix Utility
1.
2.
3.
4.
5.
6.
7.
Scan records for missing index entries.
Scan indexes for invalid index entries.
Both 1 and 2 above.
Cross-reference check of multiple indexes for a table.
Build indexes from existing indexes.
Delete one record and it's index entries.
Quit.
Select one of the following:
All
(a/A) - Fix all the indexes
Some
(s/S) - Fix only some of the indexes
By Area
(r/R) - Fix indexes in selected areas
By Schema (c/C) - Fix indexes by schema owners
By Table
(t/T) - Fix indexes in selected tables
By Activation (v/V) - Fix selected active or inactive indexes
15
© 2013 Progress Software Corporation. All rights reserved.
Fix indexes on Scan.
Is this correct? (y/n)
Index Validation & Repair Tools
proutil <db> -C idxfix
Index Fix Utility
1.
2.
3.
4.
5.
6.
7.
Scan records for missing index entries.
Scan indexes for invalid index entries.
Both 1 and 2 above.
Cross-reference check of multiple indexes for a table.
Build indexes from existing indexes.
Delete one record and it's index entries.
Quit.
• Online operation
– Transactions are relatively small
– Does not fix physical block corruption
– One concurrent idxfix process per table
16
© 2013 Progress Software Corporation. All rights reserved.
Using Index Fix: Record but no index entry
OLTP (.lg and screen):
Index name in customer for recid 16691 could not be deleted.
proutil <db> -C idxfix
1. Scan records for missing index entries:
Index 12 (customer, name): couldn't find key <RICHB> recid 16691.
16689
16690
16691
10
aaaa
Field2
Field3
Field4
bbbb
Field2
Field3
Field4
richb
Field2
Field3
Field4
11
12
Option #1: Add key entry to index
• 1. Scan records for missing index entries.
• Fix indexes on Scan. Yes
NOTE: 2. Scan indexes for invalid index entries.
• Would NOT report an error!
17
© 2013 Progress Software Corporation. All rights reserved.
Using Index Fix: Record but no index entry
OLTP (.lg and screen):
Index name in customer for recid 16691 could not be deleted.
proutil <db> -C idxfix
1. Scan records for missing index entries:
Index 12 (customer, name): couldn't find key <RICHB> recid 16691.
16689
16690
16691
10
aaaa
Field2
Field3
Field4
bbbb
Field2
Field3
Field4
richb
Field2
Field3
Field4
11
12
Find first cust where recid(cust) = 16691.
display cust
Option #2: Delete record and its key entry in table’s other indexes
6. Delete one record and it's index entries.
• Type the recid to delete: 16691
• Type the area (number) for the recid(s): 8
– Look in the .st file to match area number and area name.
18
© 2013 Progress Software Corporation. All rights reserved.
Using Index Fix: Record but no index entry
Often no runtime error reported.
proutil <db> -C idxfix
2. Scan indexes for invalid index entries:
Index 12 (customer, name): found invalid key <RICHB> recid 16691.
10
11
12
Only option: remove invalid key entry
• 2. Scan indexes for invalid index entries
• Fix indexes on Scan. Yes
NOTE: 1. Scan records for missing index entries.
• Would NOT report an error!
19
© 2013 Progress Software Corporation. All rights reserved.
Fixing Index Corruption (continued)
Missing key entries or record not found (logical corruption)
• Index fix
• Action based on record removal or index entry insert/delete
Index <i>, block <b>, element no. 1: bad compression size
• Physical b-tree corruption
• Must rebuild index to recover
20
© 2013 Progress Software Corporation. All rights reserved.
Index Repair Tools
proutil <db> -C idxbuild
• Offline utility
• Performance improvements since 10.2b06
• Will repair:
– Index block corruption (physical)
– Orphan index blocks
– Adds missing index entries
o Assumes record data is correct
• Flexible options (db, area, table, index)
• Truncates existing BI file
– Does not record idxbuild changes into BI file
21
© 2013 Progress Software Corporation. All rights reserved.
Index Build/Repair Tools
proutil <db> -C idxactivate <i1> useindex <i2>
• Builds and activates index
• Online
– One concurrent idxactivate process per table
– Requires client schema re-cache
– Transaction size based on “recs” parameter
• Deactivate requires exclusive access
• Repair logical and physical index corruption
• Assumes valid record data
• *** Static queries require recompile to consider new index
23
© 2013 Progress Software Corporation. All rights reserved.
Record Issues
Record Messaging
• Record
• (rm, bf, rec)
• recid
• rowid
• field
• bffld: nxtfld: scan past last field. (16)
Looking for field #5 but only 4 fields exist
Field1
Field2
Field3
Field4
• Record continuation not found, fragment recid <r> area <a>. (10831)
Pointer to next record fragment is invalid
Record Fragment 1
How to proceed
24
© 2013 Progress Software Corporation. All rights reserved.
Checking For Inconsistencies Online
proutil <db> -C dbanalys | tabanalys
• Reads record for statistics purposes
dbtool <db>
Physical Validation
5. Read or Validate Database Block(s)
• Validation levels
– 0: Block header info only
– 1: Record header & record size
– 2: Record overlap checking
Logical Validation w/schema
3. Record Validation
4. Record Version Validation
25
© 2013 Progress Software Corporation. All rights reserved.
Record Repair Tools
bffld: nxtfld: scan past last field. (16)
dbtool <db>
• Online and multi-threaded
6. Record Fixup
– Adds missing fields
– Removes invalid “end-rec” indicator
proutil <db> -C idxfix
6. Delete one record and it’s index entries
26
© 2013 Progress Software Corporation. All rights reserved.
Record Repair Tools
Record continuation not found, fragment recid <r> area <a>.
Record Fragment 1
proutil <db> -C dbrpr
3. Remove Bad Record Fragment
14. Display Record Contents
• Exclusive access
• Truncate bi file
27
© 2013 Progress Software Corporation. All rights reserved.
More Record Repair Tools
Record continuation not found, fragment recid <r> area <a>
Record Fragment 1
The use of dbrpr to fix problems in the
database should be done with the assistance
of Progress Technical Support.
28
© 2013 Progress Software Corporation. All rights reserved.
Dbrpr Record Fix-up Example – Last resort
Before you do anything: Validate current backup
proutil <db> -C truncate bi
proutil <db> -C dbrpr
Options:
29
1. Database Scan Menu
2. Test One or More Indexes
3. Remove Bad Record Fragment
4. Dump Block
5. Load Block
6. Copy Bytes Between Files
7. Load RM Dump File
8. Reformat Block to a Free Block
9. Change Current Working Area
© 2013 Progress Software Corporation. All rights reserved.
10. Display the Free Chain
11. Display the RM Chain
12. Display the Index Delete
Chain
13. Display Block Contents
14. Display Record Contents
15. Display Cluster Chain
16. Scan/Fix block checksum
Dbrpr Record Fix-up Example – Last resort
Record continuation not found, fragment recid 16896 area 8 3.
Before you do anything: Validate current backup
proutil <db> -C truncate bi
proutil <db> -C dbrpr
Validate bad record info
1. Database Scan Menu
1. Report Bad Blocks
8. Rebuild RM Chain
3. Fix Bad Blocks
9. Rebuild Index Delete Chain
4. Report Bad Records
10. Change Current Working Area
5. Delete Bad Records
11. Fix Cluster Chains in Type II
Area
6. Dump Records to RM File
7. Rebuild Free Chain
30
© 2013 Progress Software Corporation. All rights reserved.
Dbrpr Record Fix-up Example – Last resort
Record continuation not found, fragment recid 16896 area 8 3.
proutil <db> -C truncate bi
proutil <db> -C dbrpr
Get a view of what you are going to delete:
• 9. Change Current Working Area
Offset
Len
19
1
0x64
d
– 1. Dump Data Block Details
21
5
0x72696368620
richb
– 6. Start Dbkey
30
0
0
“”
35
2
0x6d61
MA
35
3
0x626262
BBB
• 13. Display Block Contents
Delete partial record
• 9. Change Current Working Area
• 3. Remove Bad Record Fragment
Re-validate (see previous screen)
31
© 2013 Progress Software Corporation. All rights reserved.
Hex
Ascii
Other Record Oriented Repair Tools
proutil <db> -C dump <table> . -index <i>
Binary dump
• Online & multi-threaded
• Binary record format
– May not fix individual record corruption
– May fail when encountering physical corruption
– Use selective binary dump to dump in ranges
• -index defaults to primary index
– Use different index if primary cannot be used
– Use –index 0 if no valid index exists (Type II storage area)
32
© 2013 Progress Software Corporation. All rights reserved.
Other Record Repair Mechanisms
Dump records in “PUB” schema by rowid
Manual Ascii dump and load “repair”
Reload w/bulk load or ABL import
Specify index to use or TABLE-SCAN
DEFINE VARIABLE ix AS INTEGER NO-UNDO.
FIND _file "item".
OUTPUT TO item.d.
DO ix = 1 TO 10000:
Make sure Large enough!
FIND item WHERE RECID(item) = ix NO-ERROR.
IF AVAILABLE item AND ix <> INTEGER(_file._template) THEN
EXPORT item.
END.
33
© 2013 Progress Software Corporation. All rights reserved.
Block Issues
Block and shared memory buffer messages
• bkio, bk, bm
• Dbkey
•
• Block
• Buffer
• Area
• Extent
Wrong dbkey in block. Found <x>, should be <y> in area <z>. (1124)
– Read, write, modify, release
– Most often O/S File System issue
– Reboot often fixes this error – but why?
•
bkioWrite:Unknown O/S error during write, errno 2, fd <x>, len <y>, offset
<z>, filename <s> database <t>. (14676)
•
Attempt to read block <n> which does not exist in area <a>. (201)
– Often index rebuild will fix this error. (rebuild on area level)
34
© 2013 Progress Software Corporation. All rights reserved.
Block Repair Tools
Checksum validation of dbkey <d> block type 4 in area <a>
does not match data. Expected: <e> received <r>. (14410)
• Master block: 1
• Record block: 3
• Free block: 4
• Index block: 5
proutil <db> -C dbrpr
1. Report Bad Checksum
2. Fix Bad Checksum
16. Scan/Fix block checksum (Type II Area)
35
•
Ignore for free blocks (block type 4)
•
Validate database by other means prior to “fixing”
•
True corruption will require a database rebuild
–
dump and load
–
restore/roll forward
© 2013 Progress Software Corporation. All rights reserved.
Block Chain Repair Tools
RM chain count inconsistency.
20 Blocks indicated on record free chain (actually 5)
RM
RM
RM
RM
RM
RM block found not on RM chain, but flagged RM chain.
RM block free chain link error
RM
RM
RM
RM
RM
<type> Block <number> with invalid chain type <number> on RM chain
Free block marked on free chain but linked into RM chain
RM
36
FREE
RM
RM
© 2013 Progress Software Corporation. All rights reserved.
RM
Block Chain Repair Tools
RM chain count inconsistency.
RM block found not on RM chain, but flagged RM chain.
<name> Block <number> with invalid chain type <number> on RM chain
proutil <db> -C dbrpr
1. Database Scan Menu
7. Rebuild Free Chain
8. Rebuild RM Chain
9. Rebuild Index Delete Chain
11. Fix Cluster Chains in Type II Area
• Rebuild free chains/rm chains from dbrpr
• Seek help from support
37
© 2013 Progress Software Corporation. All rights reserved.
Recovery Manager Issues
Recovery Messages
• Recovery (rl)
• Redo
• Undo
• Retry
• Before image
• After image
• ai, a<n>
• bi, b<n>
• Transaction (tm)
** The after-image file expected Tue Feb 26 16:47:29 2013. (832)
** Those dates don't match, so you have the wrong copy of one of
them. (833)
Undo failed to reproduce the record in area <a> with rowid <r> and
return code -1. (10566)
Invalid block <x> for file <y>.a3, max is 1024 (2329)
How to proceed
• Restore / roll forward
• Switch to hot standby
38
© 2013 Progress Software Corporation. All rights reserved.
Recovering From Recovery Failures
I’ve got no backup & crash recovery won’t work?
proutil <db> -C truncate bi –G 120
• Looks further back in BI.
• Should no longer be needed but its worth a try!
**** As a very last resort, force truncate
proutil <db> -C truncate bi -F
• What are the side effects of skipping crash recovery?
– -F: How bad could it be?
• Dump and re-load into new database
• Reconcile data contents and relationships after load
• Backup & enable AI
• Maintain hot standby
39
© 2013 Progress Software Corporation. All rights reserved.
Structural Repair
Those dates don't match, so you have the wrong copy of one of them.
• Usually the result of an OS copy or move
• Make sure all right pieces in place & .st file identifies them correctly
prostrct repair <db> <x>.st
• Does NOT repair corrupt database
• Updates path names to those specified in .st file
prostrct unlock <db> <x>.st
• Use “sparingly”
• Does NOT repair corrupt database
• Patches date mismatch & creates dummy extents
• Use to recover what ever data remains when no backup exists
40
© 2013 Progress Software Corporation. All rights reserved.
Structural Repair
rm x.db - Ooops!
prostrct builddb <db> <db>.st
• Rebuild database “control area” (.db file) from .st file
• Changes to control area are not logged
– Cancelling a txn that changes control area may require builddb
• May force re-base for OpenEdge Replication
prostrct list <db>
• Always have an up to date .st file
41
© 2013 Progress Software Corporation. All rights reserved.
Summary
The many faces of corruption
• Corruption shows itself in many different ways
• Hard and soft corruption
• Memory and disk. Record, index, block and db structure
Some repair tools are a loaded gun
• In the wrong hands they can produce havoc
Preparation is your best way to recovery
• Standard disaster recovery preparations
• Knowing options before problems occur
42
© 2013 Progress Software Corporation. All rights reserved.
?
Questions
43
© 2013 Progress Software Corporation. All rights reserved.
October 6–9, 2013 • Boston
#PRGS13
www.progress.com/exchange-pug
Special low rate of $495 for PUG Challenge
attendees with the code PUGAM
And visit the Progress booth to learn more about the
Progress App Dev Challenge!