Train robots in 2 minutes to scrape web data automatically. No coding required. Handles pagination, CAPTCHAs, and layout changes with AI.

Overview:

Maxun is an open-source, no-code web data platform that allows users to turn websites into structured data sources. It provides integrated tools for web scraping, crawling, search, and AI-powered data extraction. Designed for users who need reliable data collection without writing code, Maxun supports creating automated "robots" that navigate websites, handle pagination, and extract information. It offers a record-and-replay interface, LLM-based extraction via natural language prompts, and enables self-hosted deployment for full infrastructure control.

Core Features:

  • Recorder Mode: Users can record their browsing actions; Maxun turns them into a reusable extraction robot that emulates real user behavior.

  • AI Mode: Users describe the data they want in natural language, and LLM-powered extraction handles the scraping of structured data.

  • Crawl: Crawls entire websites, extracting content from relevant pages with controls over discovery and scope.

  • Developer SDK: A programmatic toolkit for scraping, extraction, scheduling, and end-to-end data automation.

  • RESTful Endpoints: Turns any website into a structured API by generating endpoints for programmatic data access.

  • Self-Hostable: Allows full control over infrastructure and data, as the platform can be hosted on a user's own servers.

Use Cases:

  • Creating structured APIs from public websites: Developers can use Maxun to expose data from any website as a RESTful endpoint without manual coding.

  • Extracting listing data (e.g., from Airbnb or IMDb): For market research or competitive analysis, users can record a scraper or use AI mode to pull structured data like property details or movie ratings.

  • Converting webpages into AI-ready Markdown: Data teams can scrape and clean full webpages into Markdown or HTML for use in LLM agents, AI workflows, or document processing.

  • Automated lead generation from search results: Users can run automated web searches with time filters to discover and scrape results, feeding data into spreadsheets like Google Sheets or Airtable.

Why It Matters:

Maxun reduces the technical barrier to web data collection by offering visual recording and natural language controls, eliminating the need for custom scraping code. As an open-source and self-hostable tool, it gives users direct control over their extraction infrastructure, a notable benefit for those who require transparency and data sovereignty. Its modular robot types—extract, scrape, crawl, and search—also allow users to construct precise, automated workflows without building a full scraping pipeline from scratch.

PartagerXLinkedInReddit

Statistiques du projet

Étoiles

15,541

Forks

1,278

Licence

AGPL-3.0

Métadonnées

Alternative à
Octoparse