Efficiently Replicating RETS data to MySQL


A recent project of mine entailed replicating a RETS database on a local MySQL database server. The client had a new real-estate mobile search app in development and wanted to have a local copy of the RETS database for search queries instead of a remote RETS server. This approach has several advantages:

  • Response times for accessing your local database are significantly faster than connecting to a remote RETS server for each request.
  • RETS servers requires that you query using certain combination of fields and also limits the number of fields you can search on. Using a local database would allow you to search on any field with any combination.
  • RETS servers also limit the time of day you’re allowed to access the RETS server and also restricts how much queries you can run per hour. You may also be limited to the number of records you can retrieve per query from a RETS server while a local database wouldn’t have that limit.


Replicating RETS data rather than using the live version is slightly complex, however. In order to replicate the RETS data into your own local database, a series of processes are needed in order to make sure the data you have is both updated and in sync with the server. Those processes typically include the following two major steps: Grabbing the entire database once, and running incremental updates to find changes or new records.

For this project I used the excellent PHRETS PHP library. PHRETS is a versatile library and allows you to communicate with a RETS server in an easy manner.

Grabbing the complete database once.

To kick start the replication we need to get the complete initial database from the RETS server. Once of the important field we will require is the LastModifiedDateTime field. This field instructs the RETS server to only return records that have been modified since the given time. For our purpose here this will require that we pass a very old LastModifiedDateTime for each query. The following for example is a query to get all the records for the city of ‘Anaheim’ since 1 January 1980. Note the ‘LastModifiedDateTime’ value.

(City=|Anaheim), (Status=|A,C,E,H,I,O,P,S,W,X),
(LastModifiedDateTime=1980-01-01T00:00:00+)";

RETS database contains a set of six major metadata tables which we need to replicate on our server: History, Agent, Association, Office, Open House, and Property. Many of these also have corresponding images to go with. Downloading these images requires a separate query and can return multiple images for any single resource.

Your local database schema should always have an index on some main field, for example on ‘ListingRid’ in case of ‘Property’ table. This will ensure that no duplicate records are entered in our database.

Keeping the data in sync

Once we are through downloading the initial database, the next task is to run a regular update to keep the RETS server data in sync with our local server. For this a CRON job that runs on the hour is usually a standard option. The LastModifiedDateTime is again helpful here. Now we can run the same above query but with a ‘LastModifiedDateTime’ that is adjusted with our CRON job. So if you run a CRON job every hour to query the RETS server, we can set the ‘LastModifiedDateTime’ to a value that increments every hour. So for example the following will only fetch the records that have been modified since 3/5/2013 09:00:00am GMT.

(City=|Anaheim), (Status=|A,C,E,H,I,O,P,S,W,X),
(LastModifiedDateTime=2013-03-05T09:00:00+)";

The next query that will run with the next CRON execution will be the following.

(City=|Anaheim), (Status=|A,C,E,H,I,O,P,S,W,X),
(LastModifiedDateTime=2013-03-05T10:00:00+)";

It may happen that a duplicate record is found for some reason, most probably due to incorrect ‘LastModifiedDateTime’ value. We need to take that into account while inserting records into our database as a duplicate record will through an error. I usually check to see if MySQL has encountered a ’1062′ error (Duplicate entry) and then update or skip that particular record, depending on the context.

if(mysql_errno() == 1062)
{
   /* If duplicate record was found */
   $sql = "UPDATE property_table " . $subquery . " \ 
   WHERE ListingRid = '{$record['ListingRid']}'";
}

Ensuring CRON job is executed on regular intervals

Your RETS data will quickly be out of sync if your CRON script fails to run on a regular basis. One of the ways to avoid that is to create a new table in our local database that will have a update timestamp field; let us call it ‘lastupdate’. So every time a CRON script is run on the hour it will add a timestamp to the ‘lastupdate’ field. So now all we have to do to make sure that the CRON script is executed on time is to check for the ‘lastupdate’ field and ensure that the timestamp stored is no more than an hour old.

This site is a digital habitat of Sameer Borate, a freelance web developer working in PHP, MySQL and WordPress. I also provide web scraping services, website design and development and integration of various Open Source API's. Contact me at metapix[at]gmail.com for any new project requirements and price quotes.

3 Responses

1

Shane

May 6th, 2013 at 10:58 am

Interesting approach – I would have thought a simpler way would be to query the RETS data if the data didn’t exist in your system, and then save it locally. But I guess you never know when the data is going to expire or be updated. Thanks for sharing!

2

Jakub Vrana

May 12th, 2013 at 11:30 am

You can avoid the duplicate key error by using one of the following commands:

INSERT IGNORE INTO
INSERT INTO … ON DUPLICATE KEY UPDATE

3

Morgan

April 24th, 2014 at 2:23 pm

Hey I just wanted to say thanks! for the clear idea of querying by LastModified date. It’s clever and intuitive to develop for my circumstances (WordPress).

Your thoughts