Methodology

How we analyze property ownership patterns

Our methodology is systematic, automated, and reproducible. Every step — from data ingestion through scoring — follows documented rules applied uniformly to all parcels and all owners in a county.

1. Data ingestion

We process the complete parcel dataset from the county assessor. For St. Louis County, this is 401,458 parcel records with 73 data fields including owner name, mailing address, property address, valuations, deed type, year built, and property classification.

Source data is obtained from publicly available county assessor portals (typically ArcGIS Hub or similar open data platforms). We do not scrape, purchase, or use non-public data.

2. Normalization

Raw owner names and addresses contain significant variation that would prevent accurate matching. We apply two normalization functions:

Owner name normalization: Convert to uppercase, strip entity suffixes (LLC, Inc, Corp, Trust, LP, Ltd, and 25+ variants), remove punctuation, collapse whitespace, strip leading "THE". This allows "ABC Properties LLC" and "ABC PROPERTIES, L.L.C." to match as the same entity.

Address normalization: Convert to uppercase, strip suite/unit/apt designators, standardize directionals (North → N, South → S), standardize street suffixes (Street → ST, Avenue → AVE, Boulevard → BLVD), remove punctuation, collapse whitespace.

3. Clustering

We construct a mailing address key from the owner's mailing address, city, state, and ZIP code. We then group all parcels by this normalized mailing address key and count the number of distinct normalized owner names at each address.

The threshold is 3 or more distinct owner/entity names sharing the same normalized owner mailing address. This threshold balances sensitivity (catching real clusters) against specificity (avoiding false positives from incidental address sharing).

The question each cluster answers is: "How many different entity names all receive their mail at this same address?"

4. Virtual office exclusion

Known registered agent addresses, virtual office providers, and commercial mail centers are automatically excluded from clustering. This prevents common business service addresses from generating false positive clusters.

Our exclusion list includes addresses matching: registered agent services (CT Corporation, National Registered Agents, Corporation Service Company, United States Corporation Agents), identified mail-drop addresses, and addresses flagged through manual review of high-entity-count clusters.

The exclusion list is maintained and expanded with each report update. When in doubt, we exclude — a missed cluster is preferable to a false positive.

5. Concentration index

Each cluster receives a composite concentration index (referred to as "risk score" in the data) based on multiple factors:

ComponentValue
Base: number of distinct entities at the addressentity_count
Majority out-of-state owners (>50% non-local)+3
Each entity with distress score above 40+2 each
Total cluster appraised value exceeds $1M+5
10+ entities at address+10
5-9 entities at address+5
Each quitclaim deed in cluster+2 each

Higher concentration indices indicate greater ownership density and complexity. They do not indicate wrongdoing.

6. Distress scoring

Individual parcels receive a distress score (0-100) based on publicly observable signals that may indicate property neglect, vacancy, or financial stress:

SignalPoints
Out-of-state absentee owner20
In-state absentee owner10
Vacant land (no structure)15
Low improvement ratio (<10% of appraised value)10
Long ownership: 20+ years10
Long ownership: 10-20 years5
Pre-1950 building5
Low total appraised value (<$30,000)10
Large lot: 1+ acre residential5
Multi-family absentee5

Distress scores are observational indicators derived from public assessor data. A high distress score does not imply negligence or wrongdoing — it highlights parcels that may exhibit patterns associated with deferred maintenance, vacancy, or long-term holding.

7. Accuracy and limitations

Our analysis is only as current and accurate as the underlying county assessor data. Known limitations include:

We encourage all users to independently verify findings before taking action.

See the methodology in action

Every report includes a full methodology section with county-specific data currency dates.

View County Reports
Questions? nexus.info@convergence-data-analytics.com