Importing with cron

New method of importing using ExpressionEngine's CLI

Starting with DataGrab 4.2, this is the preferred and supported way to import data with cron.

If you are not familiar with the ExpressionEngine CLI tool you should start with their documenation.

php system/ee/eecli.php import:run --id=2

When the command is executed it will read from your import file, put items into the queue (produce), then enact on that queue (consume) with a single command. This single command is usually fine for small imports where you've configured your import limit to be the same or greater than the number of entries in your import file. The limit is set to 50 by default. If you're having issues with importing your entire dataset with this command you may need to run the producer and consumer commands separately (see "Importing large data sets" below).

An example of the output when running the import:run command:

/var/www/html$ php system/ee/eecli.php import:run --id=2
Starting: Simple Import Test... 
Queueing...
Consuming...

Worker Stopped

You can see the options and parameters by using the -h flag.

php system/ee/eecli.php import:run -h

If an import gets stuck or stops prematurely, or you need to reset it to a new import status run the following command.

php system/ee/eecli.php import:reset --id=2

Importing large data sets

Verision 5 introduces 2 new arguments to the CLI commands.

php system/ee/eecli.php import:run --id=27 --producer

Using the producer flag with the command will only read entries from your import file and put them into the queue (produce) where they will reside until enacted upon (consume). If you want to run your import once a day, then you should run the command with the producer flag once a day. Once the producer command is in place you will need something to enact upon the items that the producer puts into the queue. This is what the consumer flag is for.

php system/ee/eecli.php import:run --id=27 --consumer

Running this command will create a single worker to consume entries from the queue. If your import is configured with a "limit" of 50, then it will only import 50 entries then stop. This is indicated in the DataGrab import log file as "WORKER STOPPED". A consumer can also stop if it reaches the PHP script max execution time. If a consumer stops because it reached its limit, or timed out early, you'll need to run the --consumer command again. The best way to do this is to setup a crontab on a schedule to run the command every 1, 3, 5, or 30 minutes (or use supervisord). Choose any interval that works for you. If the queue is empty and there is nothing to consume, then the consumer will start, find that there is nothing in the queue to enact on, then immediately stop. At the next interval, it will start another consumer, check the queue and if something exists it will enact upon it. Rinse and repeat.

If you want to run a single consumer that will import all items in the queue then set the limit to 0. Using a limit of 0 on a large import will likely run into server memory or request timeout limits, therefore it is only recommended to use a limit of 0 on smaller imports. If you set a limit and find that the import is not finishing, then you know that 0 is not a viable option for your import size and server settings, and you'll have to to define a limit value and run the consumer periodically with a cron.

php system/ee/eecli.php import:run --id=27 --consumer --limit=0

To setup a consumer to run every 5 minutes your cron entry will look similar to the following:

*/5 * * * *    php system/ee/eecli.php import:run --id=27 --consumer --limit=50

It is perfectly fine to configure the DataGrab consumer to execute every X minutes, even if there is nothing to import. If there is nothing in the queue, then it will simply abort and try again a few minutes later. To learn more about cron visit cron.guru.

Conclusion, your crontab might look like this:

# This will run once every morning at 5am to read your import file and fill up the queue
0 5 * * * php /var/www/mysite.com/system/ee/eecli.php import:run --id=27 --producer

# This will run every 5 minutes and will check and pull from the queue if anything exists
# it will only grab 50 entries to import, then 5 minutes later it will grab the next 50 etc
# If the queue is empty, this will run and do nothing, then run again in 5 minutes etc
*/5 * * * * php /var/www/mysite.com/system/ee/eecli.php import:run --id=27 --consumer --limit=50

Help configuring crontab or supervisord is not included as part of DataGrab's support. Adequate documentation is available, and this generally requires direct access to the server.

Last updated