βΊ π― ULTIMATE SITE PARSER OVERVIEW
The Ultimate Site Import β Edit β Export Pipeline
π The Complete Flow
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ β ULTIMATE SITE PARSER β β Import ANY website β Convert to ACCID format β βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ β βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ β ACCID HTML BUILDER β β Edit visually β Add content β Update metadata β βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ β βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ β STATIC HTML EXPORT β β ZIP download β FTP upload β Fully portable sites β βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
1οΈβ£ ULTIMATE SITE PARSER β The Importer
What It Does
Imports ANY website and converts it to editable ACCID format
Core Components
A. Content Discovery (core.py:30-130)
Finds ALL pages on a site using multiple methods:
β¦ 16+ more platforms
Supports:
- WordPress builders: Elementor, Divi, WPBakery, Beaver, Bricks, Gutenberg
- SaaS platforms: Wix, Squarespace, Weebly, Shopify, Webflow
- JS frameworks: Next.js, React, Vue, Angular, Gatsby, Astro
- Other CMS: Joomla, Drupal
- Legacy: FrontPage, Flash
- Fallback: Generic HTML
C. Content Cleaning (core.py:322-401)
Strips all junk before parsing:
Remove tracking scripts
kie-notices-chat-widgets\β>Remove ads, social widgets, cookie notices, chat widgets
Remove nav/header/footer, sidebars, comments, related posts
Result: Pure content only
D. Parser System (parsers/.py)
Routes to framework-specific parsers:
Registry-based system
pre> @register_parser("wordpress-elementor") def parse_elementor(soup): modules = [] for section in soup.find_all(class_='elementor-section'): # Extract content # Create ACCID modules modules.append({ 'id': 'text-123-abc', 'type': 'text', 'content': '
...
', 'layoutClass': 'body-text' }) return modules
Parsers available:
-
generic.py β Fallback for any HTML
-
wordpress_elementor.py β Elementor builder
-
wordpress_divi.py β Divi builder
-
wordpress_gutenberg.py β Gutenberg blocks
-
wix.py β Wix sites
-
shopify.py β Shopify stores
-
squarespace.py β Squarespace sites
-
react_like.py β React/Next.js apps
- more
E. Metadata Extraction (meta_extractor.py)
Extracts ALL SEO data from original site:
def extract_page_meta(url, html, framework): pre> soup = BeautifulSoup(html, βhtml.parserβ)
meta = { 'seo': { 'title': extract_meta('og:title') or extract_title(), 'description': extract_meta('description') }, 'author': extract_meta('author'), 'date': extract_meta('date') or extract_from_schema(), 'excerpt': extract_meta('description'), 'tags': extract_keywords(), 'categories': extract_from_schema(), 'featuredImage': extract_meta('og:image') } return meta
Extracts:
- SEO title & description
- Author & publication date
- Keywords β tags
- Categories (from structured data)
- Featured images (Open Graph)
- All existing SEO preserved!
F. Scam Detection (scam_detection.py)
BONUS: Analyzes sites for red flags:
class ScamDetector: def analyze_site(url, html, framework): # Check domain age (< 30 days = critical) # Check contact info (none = critical) # Check SSL certificate (HTTP = high risk) # Check payment methods (wire transfer = critical) # Check testimonials (stock photos = high risk) # Check legal pages (no privacy policy = high)
return { 'verdict': 'LIKELY SCAM' | 'SUSPICIOUS' | 'LEGITIMATE', 'scam_score': 0-100, 'flags': [...] }
Protects users from importing scam sites!
G. Export to ACCID Format (core.py:522-615)
