New approach for updating spending columns in addresses table.

## Motivation

Now that the [block sync is much faster](https://github.com/decred/dcrdata/pull/751) (between 60-90 minutes, even on a VM/VPS), the slowest part of initial sync is _by far_ the stage where the **spending information is updated in the addresses table (2-4 hours)**.

## Current Solution

The function `(*ChainDB).UpdateSpendingInfoInAllAddresses` performs these updates in the following way:

1. Retrieve all row IDs of the `vins` table.
2. In chunks of 500 vins rows, retrieve funding tx (previous outpoint) info and spending tx (input) info.
<s>3. The spending transaction data is inserted into the addresses table.  See the call to `insertSpendingTxByPrptStmt` by `SetSpendingForVinDbIDs`.</s> insert moved to main sync loop
4. The matching_tx_hash for the funding tx output is updated with the spending transaction hash.

As more blocks are mined, the percentage of rows of the addresses table with spending information (spent outputs / total outputs) approaches a fairly high steady state value (low unspent outputs / total outputs).  Thus, most of the addresses table is updated.

## Proposed Solution

Since most of the addresses table is updated, completely rewriting the entire addresses table and rebuilding the indexes is a fast approach.  These steps were proposed for adding the matching_tx_index column, but can operate on the matching_tx_hash column too.

1. Create a new table `addresses_new` via `SELECT INTO` or `CREATE TABLE AS`.

```
CREATE TABLE addresses_new AS
SELECT addresses.*,
       vins.tx_index AS matching_tx_index
FROM addresses
JOIN vins ON addresses.tx_hash=vins.prev_tx_hash
AND addresses.tx_vin_vout_index=vins.prev_tx_index
AND is_funding=TRUE
AND is_valid=valid_mainchain;
```
About **a minute**.

Now `addresses_new` has the new column, no indexes, and only data with `is_funding=true`.

2. Append the spending addresses rows into `addresses_new` via `INSERT INTO SELECT`.

```
INSERT INTO addresses_new
SELECT addresses.*,
       vins.prev_tx_index AS matching_tx_index
FROM addresses
JOIN vins ON vins.id=tx_vin_vout_row_id
AND is_funding=FALSE;
```
About **40 seconds**.

3. The unspent funding rows, where `matching_tx_hash=''`.  Something like:

    INSERT INTO addresses_new SELECT addresses.*, -1 AS matching_tx_index FROM addresses WHERE is_funding=TRUE AND matching_tx_hash='';

4. Drop `addresses` table.

5. Rename `addresses_new` to `addresses`.

6. Index `addresses`.

The NOT NULL constraints on `id` and `block_time` need to be set, and `id` needs to be made into a SERIAL (not just a primary key) by making it's default value `nextval('addresses_id_seq'::regclass)`. Look into how to make it SERIAL!


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

New approach for updating spending columns in addresses table. #752

Motivation

Current Solution

Proposed Solution

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

New approach for updating spending columns in addresses table. #752

Description

Motivation

Current Solution

Proposed Solution

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions