crawl-search
v1.0.1
Published
CLI and React component for site search
Readme
CrawlSearch
CrawlSearch is an npm package that provides both a CLI tool for generating a searchable site index and a React component for integrating search functionality into your web application. The CLI tool uses Puppeteer to crawl your website and generate a JSON index of your site's pages, while the React component provides a search interface to display and highlight matched content.
Table of Contents
Features
- CLI Tool: Crawl your website using a custom configuration and generate an index (
siteindex.json) of your site's pages. - React Component: Easily integrate a search interface that highlights matched content.
- Customizable Crawling: Configure the root URL and specific paths to crawl.
- Modern Development: Uses Babel for transpilation, supports ES6+ and JSX.
Installation
Install the package using npm:
npm install crawl-searchNote: This package has peer dependencies on React and ReactDOM. Make sure you install them in your project if you haven't already:
npm install react react-domUsage
Generating the Site Index
Initialize the Default Configuration
If you don't already have a crawling configuration file, you can generate a sample file by running:
npx site-search initThis command creates a sample
crawler.config.jsonfile in the library folder. You can then copy or move this file to your project directory for editing.Edit the Crawling Configuration File
Open the generated
crawler.config.jsonfile and update its contents as needed. A typical configuration looks like this:{ "rootUrl": "https://example.com", "paths": ["/page1", "/page2"] }- rootUrl: The base URL of the website you want to index.
- paths: An array of URL paths to crawl relative to the root URL.
Generate the Site Index
Once your configuration file is ready, run the following command to generate the site index:
npx site-search generate --config crawler.config.jsonThis command will:
- Crawl the specified pages using Puppeteer.
- Generate a site index and save it as
siteindex.jsonin your current directory. - Log progress updates and any errors encountered during the process.
Using the React Component
Once you have generated the index, you can use the provided React component to implement a search interface:
Import the Component and Index
In your React application, import the component and the generated index:
import React from "react"; import { SearchComponent } from "crawl-search"; import siteIndex from "./siteindex.json";Implement the Component in Your App
Use the component in your application by passing the index as a prop:
function App() { return ( <div> <h1>Site Search</h1> <SearchComponent index={siteIndex} /> </div> ); } export default App;Security Note: The component uses
dangerouslySetInnerHTMLto highlight search results. Ensure that the content you are indexing is trusted or properly sanitized.
Customization
The SearchComponent provides several props for customization:
Class Names:
containerClassName: Class name for the container.inputClassName: Class name for the input element.resultsContainerClassName: Class name for the results container.resultItemClassName: Class name for individual result items.
Styles:
containerStyle: Inline style for the container.inputStyle: Inline style for the input element.resultsContainerStyle: Inline style for the results container.resultItemStyle: Inline style for individual result items.
Custom Render Function:
renderResult: Function to customize the rendering of individual results.
Example of using customization props:
function App() {
const customRenderResult = (item) => (
<div key={item.id} style={{ padding: "10px", border: "1px solid #ccc" }}>
<a href={item.url} target="_blank" rel="noopener noreferrer">
<div dangerouslySetInnerHTML={{ __html: item.content }} />
</a>
</div>
);
return (
<div>
<h1>Site Search</h1>
<SearchComponent
index={siteIndex}
containerClassName="search-container"
inputClassName="search-input"
resultsContainerClassName="search-results-container"
resultItemClassName="search-result-item"
containerStyle={{ backgroundColor: "#f9f9f9" }}
inputStyle={{ borderColor: "#333" }}
resultsContainerStyle={{ marginTop: "20px" }}
resultItemStyle={{ backgroundColor: "#fff" }}
renderResult={customRenderResult}
placeholder="Search for content..."
/>
</div>
);
}
export default App;Local Usage
For local development or testing of the React component, follow these steps:
Set Up a Local React App
Use
npm create vite@latestto create a symlink to your package:cd path/to/react-app npm installInstall the Package via a Relative Path
In your React app directory, run:
npm install ../path-to/crawl-searchUse the Component
Now you can use the
SearchComponentin your React project as described in the previous section.
For local development or testing, you can run the CLI tool directly using Node.js:
node ./bin/cli.js generate --config crawler.config.jsonProject Structure
crawl-search/
├── bin/
│ └── cli.js # CLI entry point
├── src/
│ ├── crawler.js # Crawling logic using Puppeteer
│ ├── SearchComponent.js # React component for search functionality
│ └── index.js # Exports the React component
├── .babelrc # Babel configuration file
└── package.jsonPublishing the Package
Publish to npm
Make sure you are logged into your npm account, then run:
npm publish
License
This project is licensed under the MIT License. See the LICENSE file for more details.
