Links

Version 5.0

Version 5.0 is in Beta
Version 5.0 of DataGrab brings a fairly significant change to how DataGrab works. DataGrab 5 introduces the Laravel Queue package. This means that DataGrab now supports the producer/consumer model. Since DataGrab's initial release in 2010 it has relied on reading a JSON, XML, or CSV file and interating the contents of that file to perform the updates. Users with large imports often ran into server timeout responses or PHP memory issues. Simply put DataGrab was never built to handle large imports.
A lot has changed under the hood, but the actual methods that perform the entry importing have remained unchanged, but everything leading up to the actual import process has received an overhaul. Overall the code is simpler and DataGrab doesn't have to perform as much gymnastics to read and iterate an import file as it used to. When an import file is read, it inserts the import items into a queue (this is the "producer"). The items, or entries, remain in the queue until a consumer enacts upon them and completes the import. If you are running the imports manually within the control panel not much has changed for you. Initiating an import will run the producer to read the import file and create the queue, then immediately start consuming the queue.

Notable Changes

If you had imports configured with a "limit" value below 50, upon upgrading to DataGrab 5 it will change the limit to 50. This is because the queue does a much better job at managing it's own resources and we don't have to set a "limit" of 1 (the previous default) to stay within the any PHP or server based timeout settings. You can still adjust this value when configuring an import but we recommend starting at 50 and seeing how the imports peform based on your server's configurations. You maybe able to set it to 0, which means the consumers will import as many entries as possible until it decides to self terminate the consumer and start a new one.
When importing within the control panel and you have configured your import to delete non-imported entries, you will see a second red progress bar. The first purple progress bar is the consumer that is importing the entries, and the second progress bar is the consumer that deletes the other entries. The deletions to be included in the same request, but since we're using queues we're taking advantage of them and split up the work. The second red progress bar is the indicator that the initial entries were imported, and it started a new consumer to delete the entries that should be deleted.

Deletions

A new "Soft delete" option was added. If you checked the "Delete old" option to delete old entries from a channel that were not included in the import you can optionally soft delete them, which will set it's status to Closed instead of removing the entry entirely from the database.

Improved Cartthrob Order Items fieldtype

The Cartthrob Order Items fieldtype support had been horribly neglected and did not work with more recent versions of Cartthrob. It has been updated to support importing variable column values, but it needs to follow a specific format. Your import file must contain an "extra" node that contains a JSON object.
...
<quantity>3</quantity>
<price>$100.00</price>
<extra><![CDATA[
{
"discount": 1,
"price_plus_tax": "$20",
"product_color": "Blue",
"product_code": "WIDGET123"
}
]]></extra>
If your import file is a JSON file, then the "extra" node needs to contain a JSON string:
"quantity": 3,
"price": "$100.00"
"extra": "{\"discount\": 1,\"price_plus_tax\": \"$20\",\"product_color\": \"Blue\",\"product_code\": \"WIDGET123\"}""

CLI Commands

The existing CLI commands will continue to work as they did before. If no additional arguments are defined it will produce and immediately consume the entries from the queue. E.g.
php system/ee/eecli.php import:run --id=27
Verision 5 introduces 2 new arguments to the CLI commands.
php system/ee/eecli.php import:run --id=27 --producer
Running this command will only read entries from your import file and put them into the queue. If you have a daily import you can setup a crontab to schedule this command.
php system/ee/eecli.php import:run --id=27 --consumer
Running this command will create a single worker to consume entries from the queue. If your import is configured with a "limit" of 50, then it will only import 50 entries then terminate. You'll need to run the --consumer command again. The best way to do this is to setup a crontab on a schedule to run the command every 1, 3, 5, or 30 minutes (or use supervisord). Choose any interval that works for you. If the queue is empty and there is nothing to consume, then it'll just terminate itself, and try again at the next interval.
If you want to run a single consumer that will import all items in the queue then set the limit to 0. Using a limit of 0 on a large import will likely run into server memory or request timeout limits, therefore it is only recommended to use a limit of 0 on smaller imports. If you set a limit and find that the import is not finishing, then you know that 0 is not a viable option for your import size and server settings, and you'll have to to define a limit value and run the consumer periodically with a cron.
php system/ee/eecli.php import:run --id=27 --consumer --limit=0
To setup a consumer to run every 5 minutes your cron entry will look similar to the following:
*/5 * * * * php system/ee/eecli.php import:run --id=27 --consumer --limit=50
It is perfectly fine to configure the DataGrab consumer to execute every X minutes, even if there is nothing to import. If there is nothing in the queue, then it will simply abort and try again a few minutes later. To learn more about cron visit cron.guru.
Help configuring crontab or supervisord is not included as part of DataGrab's support. Adequate documentation is available, and this generally requires direct access to the server.

Queue Drivers

By default DataGrab uses the database for it's queue. No changes are needed to your config files to support this. You can optionally use Redis as a queue driver as well. You'll need to have Redis installed and configured on your server, and add the following to your ExpressionEngine config.php file.
$config['datagrab'] = [
'driver' => 'redis',
'redis_config' => [
'host' => 'redis',
'port' => '6379',
'timeout' => '0',
'password' => null,
],
];
When using the Database queue driver, which is the default, it is best to only run 1 consumer at a time. Running multiple consumers at the same time may result in database locking issues and all items in the queue may not be imported. If you want to run more than 1 consumer at a time try the Redis queue driver.