John Vicencio created the SMCoogle (es-em-coo-gle) Search Engine web program in Professor James Geddes Jr's Computer Science 80 (Internet Programming) 2014 class at Santa Monica College.
SMCoogle Search Engine web application (app) is a program that searches, records a URL address (and content of it) and identifies search items in a database based on recorded contents from it. SMCoogle demonstrates dynamic web programming (PHP with SQL on Linux Apache MySQL PHP or LAMP framework) with responsive design (CSS3 with HTML5) as the User Interface (UI) based on the Model View Controller or MVC programming model.
SMCoogle's construction (its programming) follows this bueprint. It follows the MVC approach since the scripts used (controller) send/receive the request from the user keyword submission (view) from the database (model). Search engine web page itself is a dynamic website which changes based on the user keyword submission; it is a program or a web app that runs on a browser but mostly the same technologies used on many (static) websites (HTML/HTML5, CSS/CSS3) in addition to some programming languages (JavaScript, PHP) as well as a database on the server.
Folder and file structures - the root directory (smccs80) has seven documents with three folders
(css, image, and include) so that documents are organized depending on their intended purposes. Files are *.php
for the web pages, *.css for style, and images like *.jpg. A configuration file called php.ini
enables Linux-Apache server to use a particular PHP function. You can upload all these files to some host server on a root directory
which corresponds to a web page address like http://somehost.com.
Document structure (HTML and HTML5) - uses HTML5 that starts with <!doctype html> on top of HTML(4)
elements <html>, <head>, <title>, and <body>. To make the document (web page) meaningful, HTML5's
elements are used to make sure that it make sense where the top section, the body section, and the bottom of the document.
Style layout (CSS and CSS3) - the styles on colors, sizes, and the way blocks are positioned or display are
controlled by CSS styles.css, linked inside the <head> element located in the css folder.
Functionality, logic and control (PHP, SQL and an instance of JavaScript) - PHP is mainly the technology that is used
to program SMCoogle web app that uses SQL to query the database and a bit of a JavaScript to display a dynamic date-year on the footer.
The document structure is separated so that HTML5 structures it in different parts. Pages that are common
among different documents in the root directory are separated to make sure changes are made just one time rather than
changing them separately on each page. This is demonstrated by using PHP that adds the nav.php on the top part of the HTML5
document and footer.php at the bottom of it. Pages or scripts are compartmentalize for easy reuse
and general web app management.
Search query on the search page (index.php) - the search section of the web app uses a
GET method on an HTML form elment since the information provided by the user isn't confidential as well
that it creates a useful URL host.com/index.php?keyword=John&submit=Search in case users want to
save the query where the keyword typed is 'John' as an example. The PHP script is in the same page using
$_SERVER['PHP_SELF'] (see form's action attribute from the screen shot). For security purposes, htmlspecialchars() PHP function
is employed to avert savy hackers who can inject sneaky codes that will mess up the web app.
The server-side programming language (PHP) or script checks if the form is empty. It'll remind users to type a keyword search item in case they submitted an empty
input box. Using include("filelocation/here.php") function, the script from main page pulls other scripts
from the include folder.
Finally, once all the database connection and set-up are completed, the search script begins
(include/dbsearchquery.php). On the search script, it makes sure that
the user entered a keyword if not there will be a display that the input box is empty.
If a keyword is entered a search query using a select statement and a wildcard '%'
is used to check the database if the row in pagecontent column matches anything that is
associated with the keyword (SELECT * FROM $tablename WHERE pagecontent LIKE '%$keyword%'").
For example, if the user typed 'John', the query searches for anything that looks like 'John' before and after
the word in a sentence or a paragraph. Then a conditional statement if $queryresult->num_rows > 0
or if a search produces at least one record, it loops through the record
or while($row = $queryresult->fetch_assoc()), counts using substr_count($row[pagecontent], $keyword)
then collect the numbers through accumulation in $querytotalcount += $queryresultcount
and displays them.
Data integrity (inserting or sending clean data to and through the server) - users can add
a URL on the 'Submit' page (addurl.php) where the script file inserturl.php. In it, the script checks
if the input box is empty or not. If a keyword is entered, the script cleans it to make sure that there are no funny
characters that could inject malicious code. A cleandata() custom-made PHP function
was made to do this (cleandata.php inside the include folder). Another use of this custom-made PHP function
is to make sure that the Contact page (contact.php in the root directory) have clean input data like name, email and message.
Going back to the Submit page, the script uses a function called file_get_contents("some url"). The input field on
the Submit page assigns the typed keyword on a PHP url variable where it reads the external URL file into a string.
That string is assigned to pagedata variable where any single quotes is replaced with space. The cleandata() function took
care of cleaning the data where any special characters are converted to HTML entities (demonstrated using htmlspecialchars() function
in custom-made cleandata() function). Although validation is another form of measure to handle data
inputs on Contact page, for example, the focus of the web app is the concept of search engine even though it does show
some HTML5 validation using required, pattern (regex) or text type (such as 'url' where HTML5 forces the user to enter only http://url.com type of
string) attributes. Ideally, validations should also not just only be done on both the client-side
(even though HTML5 and JavaScript could be disabled) and server-side (in this case PHP or some other like ASP.NET).
Also, each folders in the directory has index.php so that users will not be able
to look at what files are in these folders.