⏺ 🎯 ULTIMATE SITE PARSER OVERVIEW
The Ultimate Site Import → Edit → Export Pipeline
🌊 The Complete Flow
┌─────────────────────────────────────────────────────────────────┐ │ ULTIMATE SITE PARSER │ │ Import ANY website → Convert to ACCID format │ └─────────────────────────────────────────────────────────────────┘ ↓ ┌─────────────────────────────────────────────────────────────────┐ │ ACCID HTML BUILDER │ │ Edit visually → Add content → Update metadata │ └─────────────────────────────────────────────────────────────────┘ ↓ ┌─────────────────────────────────────────────────────────────────┐ │ STATIC HTML EXPORT │ │ ZIP download → FTP upload → Fully portable sites │ └─────────────────────────────────────────────────────────────────┘
1️⃣ ULTIMATE SITE PARSER – The Importer
What It Does
Imports ANY website and converts it to editable ACCID format
Core Components
A. Content Discovery (core.py:30-130)
Finds ALL pages on a site using multiple methods:
… 16+ more platforms
Supports:
- WordPress builders: Elementor, Divi, WPBakery, Beaver, Bricks, Gutenberg
- SaaS platforms: Wix, Squarespace, Weebly, Shopify, Webflow
- JS frameworks: Next.js, React, Vue, Angular, Gatsby, Astro
- Other CMS: Joomla, Drupal
- Legacy: FrontPage, Flash
- Fallback: Generic HTML
C. Content Cleaning (core.py:322-401)
Strips all junk before parsing:
Remove tracking scripts
kie-notices-chat-widgets\’>Remove ads, social widgets, cookie notices, chat widgets
Remove nav/header/footer, sidebars, comments, related posts
Result: Pure content only
D. Parser System (parsers/.py)
Routes to framework-specific parsers:
Registry-based system
pre> @register_parser("wordpress-elementor") def parse_elementor(soup): modules = [] for section in soup.find_all(class_='elementor-section'): # Extract content # Create ACCID modules modules.append({ 'id': 'text-123-abc', 'type': 'text', 'content': '
...
', 'layoutClass': 'body-text' }) return modules
Parsers available:
-
generic.py – Fallback for any HTML
-
wordpress_elementor.py – Elementor builder
-
wordpress_divi.py – Divi builder
-
wordpress_gutenberg.py – Gutenberg blocks
-
wix.py – Wix sites
-
shopify.py – Shopify stores
-
squarespace.py – Squarespace sites
-
react_like.py – React/Next.js apps
- more
E. Metadata Extraction (meta_extractor.py)
Extracts ALL SEO data from original site:
def extract_page_meta(url, html, framework): pre> soup = BeautifulSoup(html, ‘html.parser’)
meta = { 'seo': { 'title': extract_meta('og:title') or extract_title(), 'description': extract_meta('description') }, 'author': extract_meta('author'), 'date': extract_meta('date') or extract_from_schema(), 'excerpt': extract_meta('description'), 'tags': extract_keywords(), 'categories': extract_from_schema(), 'featuredImage': extract_meta('og:image') } return meta
Extracts:
- SEO title & description
- Author & publication date
- Keywords → tags
- Categories (from structured data)
- Featured images (Open Graph)
- All existing SEO preserved!
F. Scam Detection (scam_detection.py)
BONUS: Analyzes sites for red flags:
class ScamDetector: def analyze_site(url, html, framework): # Check domain age (< 30 days = critical) # Check contact info (none = critical) # Check SSL certificate (HTTP = high risk) # Check payment methods (wire transfer = critical) # Check testimonials (stock photos = high risk) # Check legal pages (no privacy policy = high)
return { 'verdict': 'LIKELY SCAM' | 'SUSPICIOUS' | 'LEGITIMATE', 'scam_score': 0-100, 'flags': [...] }
Protects users from importing scam sites!
G. Export to ACCID Format (core.py:522-615)