Importing with cron

New method of importing using ExpressionEngine's CLI

Starting with DataGrab 4.2, this is the preferred and supported way to import data with cron.

If you are not familiar with the ExpressionEngine CLI tool you should start with their documenation.

php system/ee/eecli.php import:run --id=2

When the command is executed you should see output similar to the following:

/var/www/html$ php system/ee/eecli.php import:run --id=2
Starting: Simple Import Test... 
Queueing...
Consuming...

Worker Stopped

You can see the options and parameters by using the -h flag.

php system/ee/eecli.php import:run -h

If an import gets stuck or stops prematurely, or you need to reset it to a new import status run the following command. This co

php system/ee/eecli.php import:reset --id=2

Importing large data sets

Verision 5 introduces 2 new arguments to the CLI commands.

php system/ee/eecli.php import:run --id=27 --producer

Running this command will only read entries from your import file and put them into the queue. If you have a daily import you can setup a crontab to schedule this command.

php system/ee/eecli.php import:run --id=27 --consumer

Running this command will create a single worker to consume entries from the queue. If your import is configured with a "limit" of 50, then it will only import 50 entries then stop. This is indicated in the DataGrab import log file as "WORKER STOPPED". A consumer can also stop if it reaches the PHP script max execution time. If a consumer stops because it reached its limit, or timed out early, you'll need to run the --consumer command again. The best way to do this is to setup a crontab on a schedule to run the command every 1, 3, 5, or 30 minutes (or use supervisord). Choose any interval that works for you. If the queue is empty and there is nothing to consume, then the consumer will start, find that there is nothing in the queue to enact on, then immediately stop. At the next interval, it will start another consumer, check the queue and if something exists it will enact upon it. Rinse and repeat.

If you want to run a single consumer that will import all items in the queue then set the limit to 0. Using a limit of 0 on a large import will likely run into server memory or request timeout limits, therefore it is only recommended to use a limit of 0 on smaller imports. If you set a limit and find that the import is not finishing, then you know that 0 is not a viable option for your import size and server settings, and you'll have to to define a limit value and run the consumer periodically with a cron.

php system/ee/eecli.php import:run --id=27 --consumer --limit=0

To setup a consumer to run every 5 minutes your cron entry will look similar to the following:

*/5 * * * * php system/ee/eecli.php import:run --id=27 --consumer --limit=50

It is perfectly fine to configure the DataGrab consumer to execute every X minutes, even if there is nothing to import. If there is nothing in the queue, then it will simply abort and try again a few minutes later. To learn more about cron visit cron.guru.

Help configuring crontab or supervisord is not included as part of DataGrab's support. Adequate documentation is available, and this generally requires direct access to the server.

Last updated