Follow us on Twitter
twitter icon@FreshPatents


Web Crawler patents

      

This page is updated frequently with new Web Crawler-related patent applications.




 Release date notification system patent thumbnailRelease date notification system
An application software for smartphones and tablets works with a cooperating website. The app will compile a calendar and notify the user of the dates of release or availability of selected events, media and/or products, based on a targeted item database compiled by the user and stored locally on the smart device as well as remotely accessible through the website.

 Web crawler scheduler that utilizes sitemaps from websites patent thumbnailWeb crawler scheduler that utilizes sitemaps from websites
Systems and methods for scheduling documents for crawling are disclosed in which sitemap information is updated for a first website identified by a sitemap by downloading updated sitemap information for the first website and scheduling documents for crawling in accordance with the updated sitemap information for the first website. The sitemap information includes one or more sitemap indexes, where each respective sitemap index in the one or more sitemap indices includes a list of urls corresponding to documents stored at a corresponding website in a plurality of websites, the plurality of websites including the first website, and each sitemap index in the one or more sitemap indexes includes information identifying one or more of: a last modification date of a url in the list of urls, a change frequency of a document specified by the url, a document title, an authority of the document, and a priority of the document..
Google Inc.


 System and  preventing web crawler access patent thumbnailSystem and preventing web crawler access
Preventing web crawler access includes receiving a request for a webpage that includes web content that is to be protected from a web crawler, encrypting the web content to be protected to generate encrypted content and responding to the request, including sending the encrypted content and a decryption instruction. The decryption instruction is configured to allow a web browser to decrypt the encrypted content..
Alibaba Group Holding Limited


 Web crawler optimization system patent thumbnailWeb crawler optimization system
Techniques for optimizing the performance of a webpage crawler are described. According to various embodiments, historical web crawler performance data is accessed, the data describing a performance of a web crawler during various time periods in one or more prior days.
Ebay Inc.


 Method for correlating data patent thumbnailMethod for correlating data
A method for correlating data stored in a database implements a web crawler element and an analyzer element to discover data correlations between a first set of data and a second set of data. The web crawler element searches online for a plurality of electronic files, and inspects said electronic files in order to determine a file type for each of the electronic files.

 Method, device, and system for acquiring user behavior patent thumbnailMethod, device, and system for acquiring user behavior
Embodiments of the present invention provide a method, a device, and a system for acquiring a user behavior. In the embodiments of the present invention, an acquired url request matches a database, and the database stores a url actively initiated by a user recognized by adopting a web crawler technology.
Huawei Technologies Co., Ltd.


 Configuring web crawler to extract web page information patent thumbnailConfiguring web crawler to extract web page information
Web crawling configuration includes: obtaining a webpage comprising a plurality of receiving a user selection of a node in the webpage; presenting a set of web crawling configuration options pertaining to a web crawling action to be performed with respect to the node, the set of web crawling configuration options depending at least in part on a type of an element included in the node and comprising: a first option to perform a first web crawling action in the event that the node include a first type of the element; and a second option to perform a second web crawling action in the event that the node includes a second type of the element; receiving a user input specifying the web crawling configuration option; and storing user specified web crawling configuration option, performing the web crawling action on the node according to the user input, or both.. .
Alibaba Group Holding Limited


 Systems and methods for attributing publishers for review-writing users patent thumbnailSystems and methods for attributing publishers for review-writing users
Methods and systems for tracking end users who submit reviews are provided. In some embodiments, reviews are submitted by end users via a reviewing application that reports review submission to a tracking system.
Tune, Inc.


 Systems and methods of web crawling patent thumbnailSystems and methods of web crawling
Methods and systems for dynamically training a web crawler. The web crawler maintains one or more categories each comprising a set of words.
Xerox Corporation


 Authentication of ip source addresses patent thumbnailAuthentication of ip source addresses
A method and system for authenticating ip source addresses by accessing one or more http requests whose source client identifies itself as a legitimate web crawler. One or more ip addresses are detected from the one or more http requests and each detected ip address is authenticated via a probability estimation regarding its association with a legitimate web crawler.

Method and computer readable medium for providing, via conventional web browsing, browsing capability for search engine web crawlers between remote/virtual windows and from remote/virtual windows to conventional hypertext documents

A method and computer readable medium is described for directing a search engine web crawler's local web browser to refresh the top-level container that is currently displaying the content presented by a remote computer with the new content that a navigational link, within a remote desktop, remote application window, or remote graphical windowing user session, points to. Links can be modified so as to be recognizable by the remote machine as unique from traditional hyperlinks.

System and method to analyze and rate online advertisement placement quality and potential value

A method, apparatus, system, article of manufacture, and computer program product provide the ability to rate advertisement placement quality. First data is collected from a panel of opt-in users.

System and preventing web crawler access

Preventing web crawler access includes receiving a request for a webpage that includes web content that is to be protected from a web crawler, encrypting the web content to be protected to generate encrypted content and responding to the request, including sending the encrypted content and a decryption instruction. The decryption instruction is configured to allow a web browser to decrypt the encrypted content..

Method and aggregating, extracting and presenting review and rating data

In an embodiment, a system is provided. The system includes a ratings database and a website database.

Direct page view measurement tag placement verification

Disclosed herein are strategies for verifying placement of a direct measurement tag useful for measuring internet traffic of a plurality of users at a website. For example, a method may include receiving web page identification data that is derived from user clickstream data, determining a url associated with a domain based on the webpage identification information, and providing a measurement code verification web crawler with the url and the depth to which to explore the domain for verifying measurement code placement with the web crawler..

Configuring web crawler to extract web page information

Web crawling configuration includes: obtaining a webpage comprising a plurality of receiving a user selection of a node in the webpage; presenting a set of web crawling configuration options pertaining to a web crawling action to be performed with respect to the node, the set of web crawling configuration options depending at least in part on a type of an element included in the node and comprising: a first option to perform a first web crawling action in the event that the node include a first type of the element; and a second option to perform a second web crawling action in the event that the node includes a second type of the element; receiving a user input specifying the web crawling configuration option; and storing user specified web crawling configuration option, performing the web crawling action on the node according to the user input, or both.. .

Seo results analysis based on first order data

Search query analytic reports may assist a website operator in understanding web traffic patterns in relation to the website. A search query analytic report may be generated by receiving web search data for a website from multiple data sources and assigning the web search data into multiple website-specific categories of the website.

System and method to identify machine-readable codes

A method and a system to identify machine-readable codes using a web crawler are provided. Machine-readable codes include, but are not limited to, universal product codes (upc), quick response (qr) codes, stock-keeping units (skus) and international standard book number (isbn) codes.

Community authoring content generation and navigation

One or more techniques and/or systems are provided for creating socially authored, or community authored, summaries of documents and/or for navigating a forum comprising such summaries. In one embodiment, at least some of the summaries are generated automatically when a document is written and/or discovered (e.g., by a web crawler), for example.

Three-dimensional object browsing in documents

A document that includes a representation of a two-dimensional (2-d) image may be obtained. A selection indicator indicating a selection of at least a portion of the 2-d image may be obtained.

Interactive web crawler

The claimed subject matter provides a system or method for web crawling hidden files. An exemplary method comprises loading a web page with a browser agent, and executing any dynamic elements hosted on the web page using the browser agent to insert pre-determined values.

Adapting content repositories for crawling and serving

A system for searching files stored in a closed file source that is not accessible via a web crawler obtains file identifiers for files stored in the file source and creates a unique url for each of the identifiers. Each url may be based on a file identifier and a domain portion of a url associated with the system.

Method and system for monitoring and redirecting http requests away from unintended web sites

Embodiments are described for a system and method for redirecting internet traffic away from illegitimate web sites. A redirect process includes a typo identifier engine and a direct navigation engine.

Building of a web corpus with the help of a reference web crawl

Computer-implemented method for building a web corpus (wcd) comprising the steps of: sending by a web crawler (wc) a query to a reference web crawl agent (rwca), this query containing a least one identifier of a resource, receiving by the web crawler (wc) a response from the reference web crawl agent (rwca); if this response does not contain the resource identified by the identifier, downloading by the web crawler (wc) the resource from the website (ws) corresponding to the identifier and adding the resource to the web corpus (wcd; and if this response contains the resource identified by the identifier, adding the resource to the web corpus (wcd).. .

Look-alike website scoring

Methods and systems for searching and scoring look-alike web sites are provided. A web crawler can harvest text and page layout data from a website.

Web crawler scheduler that utilizes sitemaps from websites

Systems and methods for scheduling documents for crawling are disclosed. In some implementations, a method includes obtaining sitemap information for a plurality of websites; and analyzing the sitemap information to identify a website, in the plurality of websites.

System and method to identify machine-readable codes

A method and a system to identify machine-readable codes using a web crawler are provided. Machine-readable codes include, but are not limited to, universal product codes (upc), quick response (qr) codes, stock-keeping units (skus) and international standard book number (isbn) codes.

Indexing secure enterprise documents using generic references

A web crawler indexes documents including information about document contents and metadata including information such as a url. However, some applications rely on url's that change frequently or are constructed to include user information so that the contents retrieved is customized to the user.

Search service administration web service protocol

The embodiments described herein generally relate to a method and system for enabling a client to configure and control the crawling function available through a crawl configuration web service. A client is able to configure and control the crawling function by defining the url space of the crawl.

Optimizing web crawling with user history

A politeness manager estimates traffic to the sites based on historical log data generated and sent by plug-ins or toolbars on client web browsers. The historical log data details dates and times the web browsers visit different web sites that is used to understand what timeframes specific web sites are busy and what timeframes the web sites are not busy.

Configuring web crawler to extract web page information

Web crawling configuration includes: obtaining, using one or more computer processors, a webpage comprising a plurality of nodes; presenting the webpage to a user; receiving a user selection of a node in the webpage, the node comprising at least one element; in response to the user selection of the node, presenting a web crawling configuration option pertaining to a web crawling action to be performed with respect to the node, the web crawling configuration option depending at least in part on a type of an element included in the node; receiving a user input specifying the web crawling configuration options pertaining to the web crawling action to be performed with respect to the node; and storing user specified web crawling configuration options, performing the web crawling action on the node according to the user input, or both.. .

Providing a reliable trust indicator for content

A technique is provided for providing a trust indicator for a particular webpage. The trust indicator may indicate whether publishers of web content and/or end-users trust the content of the particular webpage and whether the particular webpage is popular.



Web Crawler topics:
  • Web Crawler
  • Downloading
  • International Standard
  • World Wide Web
  • Scheduling
  • Web Services
  • Authorization
  • Authentication
  • Search Service
  • Application Programming Interface
  • Application Program
  • User Input
  • Advertisement
  • Search Engines
  • Bulletin Board


  • Follow us on Twitter
    twitter icon@FreshPatents

    ###

    This listing is a sample listing of patent applications related to Web Crawler for is only meant as a recent sample of applications filed, not a comprehensive history. There may be associated servicemarks and trademarks related to these patents. Please check with patent attorney if you need further assistance or plan to use for business purposes. This patent data is also published to the public by the USPTO and available for free on their website. Note that there may be alternative spellings for Web Crawler with additional patents listed. Browse our RSS directory or Search for other possible listings.


    0.2423

    file did exist - 1501

    2 - 1 - 32