Version 5.0

Version 5.0 of DataGrab brings a fairly significant change to how DataGrab works. DataGrab 5 introduces the Laravel Queue package. This means that DataGrab now supports the producer/consumer model. Since DataGrab's initial release in 2010 it has relied on reading a JSON, XML, or CSV file and interating the contents of that file to perform the updates. Users with large imports often ran into server timeout responses or PHP memory issues. Simply put DataGrab was never built to handle large imports.

A lot has changed under the hood, but the actual methods that perform the entry importing have remained unchanged, but everything leading up to the actual import process has received an overhaul. Overall the code is simpler and DataGrab doesn't have to perform as much gymnastics to read and iterate an import file as it used to. When an import file is read, it inserts the import items into a queue (this is the "producer"). The items, or entries, remain in the queue until a consumer enacts upon them and completes the import. If you are running the imports manually within the control panel not much has changed for you. Initiating an import will run the producer to read the import file and create the queue, then immediately start consuming the queue.

Notable Changes

If you had imports configured with a "limit" value below 50, upon upgrading to DataGrab 5 it will change the limit to 50. This is because the queue does a much better job at managing it's own resources and we don't have to set a "limit" of 1 (the previous default) to stay within the any PHP or server based timeout settings. You can still adjust this value when configuring an import but we recommend starting at 50 and seeing how the imports peform based on your server's configurations. You maybe able to set it to 0, which means the consumers will import as many entries as possible until it decides to self terminate the consumer and start a new one.

When importing within the control panel and you have configured your import to delete non-imported entries, you will see a second red progress bar. The first purple progress bar is the consumer that is importing the entries, and the second progress bar is the consumer that deletes the other entries. The deletions to be included in the same request, but since we're using queues we're taking advantage of them and split up the work. The second red progress bar is the indicator that the initial entries were imported, and it started a new consumer to delete the entries that should be deleted.

Deletions

A new "Soft delete" option was added. If you checked the "Delete old" option to delete old entries from a channel that were not included in the import you can optionally soft delete them, which will set it's status to Closed instead of removing the entry entirely from the database.

Improved Cartthrob Order Items fieldtype

The Cartthrob Order Items fieldtype support had been horribly neglected and did not work with more recent versions of Cartthrob. It has been updated to support importing variable column values, but it needs to follow a specific format. Your import file must contain an "extra" node that contains a JSON object.

...
<quantity>3</quantity>
<price>$100.00</price>
<extra><![CDATA[
  {
      "discount": 1,
      "price_plus_tax": "$20",
      "product_color": "Blue",
      "product_code": "WIDGET123"
  }
]]></extra>

If your import file is a JSON file, then the "extra" node needs to contain a JSON string:

"quantity": 3,
"price": "$100.00"
"extra": "{\"discount\": 1,\"price_plus_tax\": \"$20\",\"product_color\": \"Blue\",\"product_code\": \"WIDGET123\"}""

CLI Commands

The existing CLI commands will continue to work as they did before. If no additional arguments are defined it will produce and immediately consume the entries from the queue.

For more information on the CLI commands please visit the Importing with cron documentation.

Queue Drivers

By default DataGrab uses the database for it's queue. No changes are needed to your config files to support this. You can optionally use Redis as a queue driver as well. You'll need to have Redis installed and configured on your server, and add the following to your ExpressionEngine config.php file.

$config['datagrab'] = [
    'driver' => 'redis',
    'redis_config' => [
        'host' => 'redis',
        'port' => '6379',
        'timeout' => '0',
        'password' => null,
    ],
];

When using the Database queue driver, which is the default, it is best to only run 1 consumer at a time. Running multiple consumers at the same time may result in database locking issues and all items in the queue may not be imported. If you want to run more than 1 consumer at a time try the Redis queue driver.

Last updated