NAME
    Web::PageMeta - get page open-graph / meta data
SYNOPSIS
        use Web::PageMeta;
        my $page = Web::PageMeta->new(url => "https://www.apa.at/");
        say $page->title;
        say $page->image;
    async fetch previews and images:
        use Web::PageMeta;
        my @urls = qw(
            https://www.apa.at/
            http://www.diepresse.at/
            https://metacpan.org/
            https://github.com/
        );
        my @page_views = map { Web::PageMeta->new( url => $_ ) }
                @urls;
        Future->wait_all( map { $_->fetch_image_data_ft, } @page_views )->get;
        foreach my $pv (@page_views) {
            say 'title> '.$pv->title;
            say 'img_size> '.length($pv->image_data);
        }
        # alternativelly instead of Future->wait_all()
        use Future::Utils qw( fmap_void );
        fmap_void(
            sub { return $_[0]->fetch_image_data_ft },
            foreach    => [@page_views],
            concurrent => 3
        )->get;
DESCRIPTION
    Get (not only) open-graph web page meta data. can be used in both normal
    and async code.
    For any other than 200 http status codes during data downloads,
    HTTP::Exception is thrown.
ACCESSORS
  new
    Constructor, only "url" is required.
  url
    HTTP url to fetch data from.
  timeout
    In addition to AnyEvent::HTTP timeout will also check time during
    download as the data are being downloaded and dies when over the limit.
    Default 5 minutes.
  max_size
    Will die when the document or image size is greater than this limit.
    Default 100MB.
  user_agent
    User-Agent header to use for http requests. Default is one from Chrome
    89.0.4389.90.
  extra_headers
    HashRef with extra http request headers.
  cookie_jar
    Accepts optional HTTP::Cookies compatible object that must provide
    "get_cookies()" method. If set will send http cookie headers with each
    request.
  title
    Returns title of the page.
  description
    Returns description of the page.
  canonical_url
    Returns open-graph url. If not present returns "url".
  image
    Returns image location of the page.
  image_data
    Returns image binary data of "image" link.
    Will throw 404 exception if there is not "image" link.
  page_meta
    Returns hash ref with all open-graph data.
  extra_scraper
    Web::Scraper::LibXML object to fetch image, title or description from
    different than default location.
        use Web::Scraper::LibXML;
        use Web::PageMeta;
        my $escraper = scraper {
            process_first '.slider .camera_wrap div', 'image' => '@data-src';
        };
        my $wmeta = Web::PageMeta->new(
            url => 'https://www.meon.eu/',
            extra_scraper => $escraper,
        );
  page_body_hdr
    Returns array ref with page [$body,$headers]. Can be useful for
    post-processing or special/additional data extractions.
    Only "text/html" content-type is accepted for fetching.
  fetch_page_meta_ft
    Returns future object for fetching paga meta data. See "ASYNC USE". On
    done "page_meta" hash is returned.
  fetch_image_data_ft
    Returns future object for fetching image data. See "ASYNC USE" On done
    "image_data" scalar is returned.
  fetch_page_body_hdr_ft
    Returns future object for fetching page content and headers. See "ASYNC
    USE" On done "page_body_hdr" array ref is returned.
ASYNC USE
    To run multiple page meta data or image http requests in parallel or to
    be used in async programs "fetch_page_meta_ft" and fetch_image_data_ft
    returning Future object can be used. See "SYNOPSIS" or t/02_async.t for
    sample use.
SEE ALSO
    
AUTHOR
    Jozef Kutej, ""
LICENSE AND COPYRIGHT
    Copyright 2021 jkutej@cpan.org
    This program is free software; you can redistribute it and/or modify it
    under the terms of either: the GNU General Public License as published
    by the Free Software Foundation; or the Artistic License.
    See http://dev.perl.org/licenses/ for more information.