Transcript Slide 1
Never Lose a SAS Job Copyright © 2010, SAS Institute Inc. All rights reserved. SAS and all other SAS Institute Inc. product or service names are registered trademarks or trademarks of SAS Institute Inc. in the USA and other countries. ® indicates USA registration. Not Again!! Unexpected re-boot, system failures Long running job didn’t complete Must manually re-start job from step 1 Copyright © 2010, SAS Institute Inc. All rights reserved. SAS and all other SAS Institute Inc. product or service names are registered trademarks or trademarks of SAS Institute Inc. in the USA and other countries. ® indicates USA registration. SAS Grid Gets the Stars Aligned... SAS checkpoint-restart features + LSF requeue capabilities + SASGSUB batch submission utility --------------------------------------------------- Completion of SAS Jobs in Minimal Time Ideal for critical long-running SAS jobs Copyright © 2010, SAS Institute Inc. All rights reserved. SAS and all other SAS Institute Inc. product or service names are registered trademarks or trademarks of SAS Institute Inc. in the USA and other countries. ® indicates USA registration. SAS Checkpoint/Restart Checkpoint mode • Record info about data/proc steps in checkpoint library Restart mode • Global statements and macros re-executed • SAS reads data in checkpoint library to determine which steps completed • Program execution resumes with step that was executing when failure occurred • Data/proc steps that completed successfully will not be re-executed Copyright © 2010, SAS Institute Inc. All rights reserved. SAS and all other SAS Institute Inc. product or service names are registered trademarks or trademarks of SAS Institute Inc. in the USA and other countries. ® indicates USA registration. To Set Up for Checkpoint-Restart Specify following options on batch SAS invocation: • STEPCHKPT – enables checkpoint mode • STEPRESTART – causes SAS to use checkpoint-restart data • NOWORKINIT – does not init WORK library when SAS starts • NOWORKTERM – saves WORK library when SAS exits • ERRORCHECK STRICT – puts SAS in syntax check mode when error in libname, filename, %include and lock stmts • ERRORABEND – causes SAS to terminate for most errors Copyright © 2010, SAS Institute Inc. All rights reserved. SAS and all other SAS Institute Inc. product or service names are registered trademarks or trademarks of SAS Institute Inc. in the USA and other countries. ® indicates USA registration. The WORK Directory WORK is default location for checkpoint library • Can use STEPCHKPTLIB to point to permanent library • Must include libname as first statement in batch program WORK directory must be on shared storage Example: • sas92 -noworkinit -noworkterm -work abc Copyright © 2010, SAS Institute Inc. All rights reserved. SAS and all other SAS Institute Inc. product or service names are registered trademarks or trademarks of SAS Institute Inc. in the USA and other countries. ® indicates USA registration. Use of Both STEPCHKPT and STEPRESTART Initial invocation • Results in checkpoint mode only • No data in checkpoint library Subsequent invocations • Uses data from checkpoint library • Continues checkpoint mode for remainder of program Copyright © 2010, SAS Institute Inc. All rights reserved. SAS and all other SAS Institute Inc. product or service names are registered trademarks or trademarks of SAS Institute Inc. in the USA and other countries. ® indicates USA registration. SAS Grid Manager – Queues HOST A SAS Application Normal Queue SAS Grid Manager HOST B HOST C Copyright © 2010, SAS Institute Inc. All rights reserved. SAS and all other SAS Institute Inc. product or service names are registered trademarks or trademarks of SAS Institute Inc. in the USA and other countries. ® indicates USA registration. Automatic Job Requeue Configure queue to automatically requeue job with specific exit value • REQUEUE_EXIT_VALUES=all ~0 ~1 − Any exit code other than 0 or 1 (success & warnings) will be requeued • REQUEUE_EXIT_VALUES=EXCLUDE(all ~0 ~1) − Run requeued job on different host • Jobs requeued 5 times by default − MAX_JOB_REQUEUE lets you configure requeue limit, can be globally specified for all queue or on per queue basis Copyright © 2010, SAS Institute Inc. All rights reserved. SAS and all other SAS Institute Inc. product or service names are registered trademarks or trademarks of SAS Institute Inc. in the USA and other countries. ® indicates USA registration. Automatic Job Rerun A job is automatically rerun when • Execution host becomes unavailable while a job is running • System fails while a job is running • RERUNNABLE=yes Copyright © 2010, SAS Institute Inc. All rights reserved. SAS and all other SAS Institute Inc. product or service names are registered trademarks or trademarks of SAS Institute Inc. in the USA and other countries. ® indicates USA registration. LSF Queue Definition Jobs dispatched from this queue will be rerun if system failures Begin Queue QUEUE_NAME = sas_rerun PRIORITY = 40 NICE = 10 RERUNNABLE = YES REQUEUE_EXIT_VALUES = all ~0 ~1 DESCRIPTION = Jobs submitted to this queue will be requeued automatically and also rerunnable. End Queue Jobs with fatal exit code will be requeued Copyright © 2010, SAS Institute Inc. All rights reserved. SAS and all other SAS Institute Inc. product or service names are registered trademarks or trademarks of SAS Institute Inc. in the USA and other countries. ® indicates USA registration. SASGSUB Capabilities Standalone utility that will allow user to • Submit SAS program to grid for processing • Display status of user’s jobs on the grid • Retrieve output from user’s jobs to local directory • Kill jobs Copyright © 2010, SAS Institute Inc. All rights reserved. SAS and all other SAS Institute Inc. product or service names are registered trademarks or trademarks of SAS Institute Inc. in the USA and other countries. ® indicates USA registration. Using SASGSUB Advantages • Submit and forget • View job output while job is running • Eliminate need for full SAS install on client • Make use of SAS checkpoint/restart capability NOTE - requires shared file system between client and grid Copyright © 2010, SAS Institute Inc. All rights reserved. SAS and all other SAS Institute Inc. product or service names are registered trademarks or trademarks of SAS Institute Inc. in the USA and other countries. ® indicates USA registration. Submitting a Job Command line interface • sasgsub –gridsubmitpgm <sas_pgm> Example output Job ID: 6772 Job directory: "/CNT/sasgsub/gridwork/sascnn1/SASGSUB-2009-03-17_14.09.52.847_testPgm" Job log file: "/CNT/sasgsub/gridwork/sascnn1/SASGSUB-2009-03-17_14.09.52.847_testPgm/testPgm.log“ Copyright © 2010, SAS Institute Inc. All rights reserved. SAS and all other SAS Institute Inc. product or service names are registered trademarks or trademarks of SAS Institute Inc. in the USA and other countries. ® indicates USA registration. Submitting a Job for Checkpoint-Restart GRIDRESTARTOK • Automatically adds the following options to batch SAS invocation − STEPCHKPT, STEPRESTART, ERRORCHECK STRICT, ERRORABEND, NOWORKINIT, NOWORKTERM • Sets RERUNNABLE parm on job Command line interface • sasgsub –gridsubmitpgm <sas_pgm> -gridrestartok Copyright © 2010, SAS Institute Inc. All rights reserved. SAS and all other SAS Institute Inc. product or service names are registered trademarks or trademarks of SAS Institute Inc. in the USA and other countries. ® indicates USA registration. Getting Job Status Command line interface • sasgsub –gridgetstatus <job_id | _ALL_> Example output Current Job Information Job 1917 (testPgm) is Finished: Submitted: 08Dec2008:10:28:57, Started: 08Dec2008:10:28:57 on Host d15003, Ended: 08Dec2008:10:28:57 Job 1918 (testPgm) is Finished: Submitted: 08Dec2008:10:28:57, Started: 08Dec2008:10:28:57 on Host d15003, Ended: 08Dec2008:10:28:57 Job 1925 (testPgm) is Submitted: Submitted: 08Dec2008:10:28:57 Copyright © 2010, SAS Institute Inc. All rights reserved. SAS and all other SAS Institute Inc. product or service names are registered trademarks or trademarks of SAS Institute Inc. in the USA and other countries. ® indicates USA registration. Retrieving Results Command line interface • sasgsub –gridgetresults <job_id | _ALL_> Example Output Current Job Information Job 1917 (testPgm) is Finished: Submitted: 08Dec2008:10:53:33, Started: 08Dec2008:10:53:33 on Host d15003, Ended: 08Dec2008:10:53:33 Moved job information to .\SASGSUB-2008-11-21_21.52.57.130_testPgm Job 1918 (testPgm) is Finished: Submitted: 08Dec2008:10:53:33, Started: 08Dec2008:10:53:33 on Host d15003, Ended: 08Dec2008:10:53:33 Moved job information to .\SASGSUB-2008-11-24_13.13.39.167_testPgm Job 1925 (testPgm) is Submitted: Submitted: 08Dec2008:10:53:34 Copyright © 2010, SAS Institute Inc. All rights reserved. SAS and all other SAS Institute Inc. product or service names are registered trademarks or trademarks of SAS Institute Inc. in the USA and other countries. ® indicates USA registration. Putting It All Together HOST A normal queue SAS Application SAS Grid Manager HOST B sas_rerun queue HOST C Copyright © 2010, SAS Institute Inc. All rights reserved. SAS and all other SAS Institute Inc. product or service names are registered trademarks or trademarks of SAS Institute Inc. in the USA and other countries. ® indicates USA registration. Putting It All Together HOST A normal queue SAS Application SAS Grid Manager HOST B sas_rerun queue HOST C Copyright © 2010, SAS Institute Inc. All rights reserved. SAS and all other SAS Institute Inc. product or service names are registered trademarks or trademarks of SAS Institute Inc. in the USA and other countries. ® indicates USA registration. Author contact information second line Copyright © 2010, SAS Institute Inc. All rights reserved. SAS and all other SAS Institute Inc. product or service names are registered trademarks or trademarks of SAS Institute Inc. in the USA and other countries. ® indicates USA registration. Copyright © 2010, SAS Institute Inc. All rights reserved. SAS and all other SAS Institute Inc. product or service names are registered trademarks or trademarks of SAS Institute Inc. in the USA and other countries. ® indicates USA registration. A simple solution Record a checkpoint number, save it in WORK If restarting, skip PROC / DATA steps to there Tokenize everything Execute all global statements Copyright © 2010, SAS Institute Inc. All rights reserved. SAS and all other SAS Institute Inc. product or service names are registered trademarks or trademarks of SAS Institute Inc. in the USA and other countries. ® indicates USA registration.