Tuesday, 18 August 2015

How do you parse and process HTML in PHP

How do you parse and process HTML in PHP


Following are different ways to parse the HTML

Use DOM: The DOM extension allows you to operate on XML documents through the DOM API with PHP 5.
$fullHTML = file_get_contents('http://www.example.com/scrap.php');
$domObj = new DOMDocument();
$domObj->loadHTML($fullHTML);
$xpath = new DOMXPath($domObj);
$tags = $xpath->query('//div[@class="myclass"]/div');
foreach ($tags as $tag) {
    print_r(trim($tag->nodeValue));
    echo "\n";
}



Use SimpleXMLElement: The SimpleXML extension provides a very simple and easily usable toolset to convert XML to an object that can be processed with normal property selectors and array iterators.
$fullHTML = file_get_contents('http://www.example.com/scrap.php');
$allData = new SimpleXMLElement($fullHTML);
print_r($allData);



Regular Expressions: It is sequence of symbols and characters expressing a string or pattern to be searched for within a longer piece of text.
$fullHTML = file_get_contents('http://www.example.com/scrap.php');
preg_match_all("/<(\w+)(\s+(\w+)\s*\=\s*(\'|")(.*?)\\4\s*)*\s*(\/>|>)/", $fullHTML, $matches);
print_r($matches);



3rd Party Libraries
There are lot of 3 party libraries which can parse your HTML/XHTMl. Following are few famous libraries.
phpQuery: phpQuery is a server-side, chainable, CSS3 selector driven Document Object Model (DOM) API based on jQuery JavaScript Library written in PHP5 and provides additional Command Line Interface (CLI).
Zend_Dom: Zend_Dom provides tools for working with DOM documents and structures.
QueryPath: QueryPath is a PHP library for manipulating XML and HTML.
FluentDom: FluentDOM provides a jQuery-like fluent XML interface for the DOMDocument in PHP.



WebServices
There are different APIs are available for the scraping the website and few of them are following.
YQL:The YQL Web Service enables applications to query, filter, and combine data from different sources across the Internet. It have like SQL syntax, familiar to any developer with database experience.
ScraperWiki: ScraperWiki's external interface allows you to extract data in the form you want for use on the web or in your own applications. You can also extract information about the state of any scraper.



Monday, 17 August 2015

Edit an commit message in Github Repository - Commands

Question: How to change the Most recent commit message?
git commit --amend -m "This is updated commit message"


Question: How to change the Most recent commit message and append the files
git commit -a --amend -m "This is updated commit message"


How to display the status of files in the index versus the working directory?
git status; //;On branch test


How to create new Github directory?
git init <directory></directory>


How to get checkout from GitHub URL?
git clone /path/to/repository


Question: How to Add a files in github?
git add <files></files>


Question: How to commit the files in github?
git commit


Question: How to Push to a repository
git push origin


Question: HOw to git push in origin branch?
To push to your branch


Question: How to transfer all changes in your github?
git add file.txt
git commit  -m "We are add file.txt"
git push


Question: How to add file with commit message?
git add file.txt commit -m "We are add file.txt"


Question: How to pull all the changes ?
git pull origin