Dean of Big Data

William Schmarzo

Subscribe to William Schmarzo: eMailAlertsEmail Alerts
Get William Schmarzo: homepageHomepage mobileMobile rssRSS facebookFacebook twitterTwitter linkedinLinkedIn

Related Topics: EMC Journal, Data Mining, Internet of Things Journal

Blog Post

About Machine Learning | @ExpoDX @Schmarzo #AI #MachineLearning #ArtificialIntelligence

Many of the algorithms that fall into the Machine Learning category are analytic algorithms that have been around for decades

What Tomorrow's Business Leaders Need to Know About Machine Learning

Sometimes I write a blog just to formulate and organize a point of view, and I think it’s time that I pull together the bounty of excellent information about Machine Learning. This is a topic with which business leaders must become comfortable, especially tomorrow’s business leaders (tip for my next semester University of San Francisco business students!). Machine learning is a key capability that will help organizations drive optimization and monetization opportunities, and there have been some recent developments that will place basic machine learning capabilities into the hands of the lines of business.

By the way, there is an absolute wealth of freely-available material on machine learning, so I’ve included a sources section at the end of this blog for folks who want more details on machine learning.

So strap’em on! Time to dive into the world of machine learning!

Machine Learning Basics
Much of what comprises “Machine Learning” is really not all new. Many of the algorithms that fall into the Machine Learning category are analytic algorithms that have been around for decades such as clustering, association rules and decisions trees. However, the detailed, granularity of the data, the wide variety of data sources and massive increase in computing power has re-invigorated many of these mature algorithms. Today, machine learning is being used for a variety of uses including:

  • Text translation, voice recognition and natural language processing (NLP). Machine Learning is the brains behind the continuously improving “conversations” with Apple Siri, Google Assistant, Microsoft Cortana and Amazon Alexa.

Facial, photo and image recognition. For example, the all-important question of “What is a Chihuahua puppy and what is a blueberry muffin?” can be addressed with a well-trained machine learning algorithm (see Figure 1).

Figure 1: Puppy versus blueberry muffin exercise

More applications of machine learning will be coming soon, including:

  • Cyber security
  • Insider trading
  • Money laundering
  • Personalized medicine
  • Personalized marketing
  • Fraud detection
  • Autonomous vehicles

So exactly what is machine learning?  Let’s start with a definition of machine learning:

Machine learning is a type of applied artificial intelligence (AI) that provides computers with the ability to gain knowledge without being explicitly programmed. Machine learning focuses on the development of computer programs that can change when exposed to new data.

Fundamentally, there are only two things that Machine Learning does:

  • Quantify existing relationships (quantify relationships from historical data and apply those relationships to new data sets).
  • Discover latent relationships (draw inferences buried in the data).

Machine Learning accomplishes these two tasks using either supervised or unsupervised learning algorithms. What’s the difference? Supervised learning includes the classification or categorization of the outcomes (e.g., fraudulent transaction, customer attrition, part failure, patient illness, purchase transaction, web click) in the observations. Unsupervised learning does not have the outcomes in the observations.

Supervised Learning
Supervised learning
algorithms make predictions based on a set of examples. For example, historical sales can be used to estimate the future prices. With supervised learning, you have an input variable that consists of labeled training data and a desired output variable. You use an algorithm to analyze the training data to learn the function that maps the input to the output. This inferred function maps new, unknown examples by generalizing from the training data to anticipate results in unseen situations.

  • Classification: when the objective field is categorical. For these problems, a Machine Learning algorithm is used to build a model that predicts a category (label or class) for a new example (instance). That is, it “classifies” new instances into a given set of categories (or discrete values). For example, “true or false”, “fraud or not fraud”, “high risk, low risk or medium risk”, etc. There can be hundreds of different categories.
  • Regression: when the objective field is numeric. For these problems, a Machine Learning algorithm is used to build a model that predicts a continuous value. That is, given the fields that define a new instance the model predicts a real number. For example, “the price of a house”, “the number of units sold for a product”, “the potential revenue of a lead”, “the number of hours until next system failure”, etc.

Both classification and regression problems can be solved using supervised Machine Learning techniques. They are called supervised in the sense that the values of the output variable have either been provided by a human expert (e.g., the patient had been diagnosed with diabetes or not) or by a deterministic automated process (e.g., customers who did not pay their fees in the last three months are labeled as “delinquent”). The objective field values along with the input fields need to be collected for each instance in a structured dataset that is used to train the model. The algorithms learn a predictive model that maps your input data to a predicted objective field value.

Unsupervised Learning
When performing unsupervised learning, the machine is presented with totally unlabeled data. It is asked to discover the intrinsic patterns that underlie the data, such as a clustering structure, a low-dimensional manifold, or a sparse tree and graph.

Clustering: Grouping a set of data examples so that examples in one group (or one cluster) are more similar (according to some criteria) than those in other groups. This is often used to segment the whole dataset into several groups. Analysis can be performed in each group to help users to find intrinsic patterns.


  • Association: If-then statements that uncover relationships within the data. An example of an association rule would be “If a customer buys a dozen eggs, he is 80% likely to also purchase milk.”


  • Neural Networks: Modeled after the human brain, a neural network consists of a large number of processors operating in parallel and arranged in tiers (feedforward). The first tier receives the raw input information and each successive tier receives the output from the preceding tier and performs further analysis. The last tier produces the output of the system. Neural networks are adaptive, which means they modify themselves as they learn from initial training and subsequent runs provide more information about the world.


  • Recurrent neural network (RNN) is a class of artificial neural network where connections between units form a directed cycle. This creates an internal state of the network that allows it to exhibit dynamic temporal behavior. Unlike feedforward neural networks, RNNs can use their internal memory to process arbitrary sequences of inputs. This makes them applicable to tasks such as unsegmented connected handwriting recognition or speech recognition.


Figure 2 provides a more detailed inventory of the different types of supervised and unsupervised machine learning algorithms.

Figure 2: Types of Machine Learning Algorithms

Putting Machine Learning to Work
In a recent University of San Francisco project that we conducted with a local data science company, I was introduced to a product called BigML (  I was truly blown away by the relative simplicity of the tools (think “Tableau for Machine Learning”). I have no financial interest in BigML and suspect that as soon as this blog gets published, I will hear from other startups that are building something similar. But until I get those calls, I’m going to use BigML to showcase some Machine Learning basics.

BigML is free for the first 16 gigabytes of data and comes with some pre-loaded data sets and an extensive library of documentation, some of which I used for this blog. For this exercise, we’re going to use a data set that comes bundled with the BigML product: Titanic Survivors Data Set (see Figure 3).

Figure 3: Titanic Survivors Data Set

BigML provides a nice feature to allow the data scientist to explore and understand the data sets, and provides some basic statistical information (minimum, median, mean, maximum, standard deviation, kurtosis, skewness) about each of the variables in the data set.

BigML allows you to select from a variety of supervised and unsupervised models. I selected the supervised option (because I knew the classification of the passenger as survived or not survived) and got the decision tree in Figure 4 that predicts the likelihood of a Titanic passenger surviving given a wide variety of different variables (e.g., passenger age, class of travel, fare paid, in what city the passenger boarded).

Figure 4: Titanic Survivors Decision Tree

The resulting Decision Tree provides a series of “If-then” statements; each branch “yields a story” about the chances of survival.

Hint: you want to be young and you want to be rich to improve your odds of surviving the Titanic. That’s something that might be very useful if you ever find yourself on the Titanic.

To learn more about the “Predicting Titanic Survival Outcome” exercise, check out YouTube.

BigML provides a wide variety of machine learning algorithms with which one can play. Plus their documentation on each of the different machine learning algorithms is very impressive. I think these folks would make a fortune if they created an accompanying text book (and I sent them a note telling them such).

Machine Learning Summary
Both Supervised and Unsupervised learning algorithms will find relationships and occurrences in the data that might be relevant. The data scientist and the business stakeholder still must apply common sense to the findings; they must apply domain knowledge to ensure that not only are the uncovered relationships and insights “Strategic, Actionable and Material,” but they simply must apply common sense in order to prevent making statements of fact that just don’t make sense.

No amount of machine learning is going to replace good old common sense.

The post What tomorrow’s business leaders need to know about Machine Learning? appeared first on InFocus Blog | Dell EMC Services.

DXWorldEXPO LLC, the producer of the world's most influential technology conferences and trade shows has announced the conference tracks for CloudEXPO | DXWorldEXPO 2018 New York.

DXWordEXPO New York 2018, colocated with CloudEXPO New York 2018 will be held November 11-13, 2018, in New York City.

Digital Transformation (DX) is a major focus with the introduction of DXWorldEXPO within the program. Successful transformation requires a laser focus on being data-driven and on using all the tools available that enable transformation if they plan to survive over the long term.

A total of 88% of Fortune 500 companies from a generation ago are now out of business. Only 12% still survive. Similar percentages are found throughout enterprises of all sizes.

Register for Full Conference "Gold Pass" ▸ Here (Expo Hall ▸ Here)

Sponsorship Opportunities Here

Speaking Opportunities Here

Sponsorship and Speaking Inquiries:

2018 Conference Agenda, Keynotes and 10 Conference Tracks

DXWordEXPO New York 2018 and Cloud Expo New York 2018 agenda present 222 rockstar faculty members, 200 sessions and 22 keynotes and general sessions in 10 distinct conference tracks.

  • Cloud-Native | Serverless
  • DevOpsSummit
  • FinTechEXPO - New York Blockchain Event
  • CloudEXPO - Enterprise Cloud
  • DXWorldEXPO - Digital Transformation (DX)
  • Smart Cities | IoT | IIoT
  • AI | Machine Learning | Cognitive Computing
  • BigData | Analytics
  • The API Enterprise | Mobility | Security
  • Hot Topics | FinTech | WebRTC

Register for Full Conference "Gold Pass" ▸ Here (Expo Hall ▸ Here)

DXWorldEXPO | CloudEXPO 2018 New York cover all of these tools, with the most comprehensive program and with 222 rockstar speakers throughout our industry presenting 22 Keynotes and General Sessions, 200 Breakout Sessions along 10 Tracks, as well as our signature Power Panels. Our Expo Floor brings together the world's leading companies throughout the world of Cloud Computing, DevOps, FinTech, Digital Transformation, and all they entail.

As your enterprise creates a vision and strategy that enables you to create your own unique, long-term success, learning about all the technologies involved is essential. Companies today not only form multi-cloud and hybrid cloud architectures, but create them with built-in cognitive capabilities.

Cloud-Native thinking is now the norm in financial services, manufacturing, telco, healthcare, transportation, energy, media, entertainment, retail and other consumer industries, as well as the public sector.

CloudEXPO is the world's most influential technology event where Cloud Computing was coined over a decade ago and where technology buyers and vendors meet to experience and discuss the big picture of Digital Transformation and all of the strategies, tactics, and tools they need to realize their goals.

FinTech Is Now Part of the DXWorldEXPO | CloudEXPO Program!

Financial enterprises in New York City, London, Singapore, and other world financial capitals are embracing a new generation of smart, automated FinTech that eliminates many cumbersome, slow, and expensive intermediate processes from their businesses.

Accordingly, attendees at the upcoming 22nd CloudEXPO | DXWorldEXPO November 11-13, 2018 in New York City will find fresh new content in two new tracks called:

  • FinTechEXPO
  • New York Blockchain Event

which will incorporate FinTech and Blockchain, as well as machine learning, artificial intelligence and deep learning in these two distinct tracks.

Register for Full Conference "Gold Pass" ▸ Here (Expo Hall ▸ Here)

Sponsorship Opportunities Here

Speaking Opportunities Here

Sponsorship and Speaking Inquiries:

FinTech brings efficiency as well as the ability to deliver new services and a much improved customer experience throughout the global financial services industry. FinTech is a natural fit with cloud computing, as new services are quickly developed, deployed, and scaled on public, private, and hybrid clouds.

More than US$20 billion in venture capital is being invested in FinTech this year. DXWorldEXPOCloudEXPO are pleased to bring you the latest FinTech developments as an integral part of our program.

DXWorldEXPO | CloudEXPO are accepting speaking submissions for this new track, so please visit Cloud Computing Expo for the latest information or contact us at

Register for Full Conference "Gold Pass" ▸ Here (Expo Hall ▸ Here)

Sponsorship Opportunities Here

Speaking Opportunities Here

Sponsorship and Speaking Inquiries:

Download Slide Deck ▸ Here

Only DXWorldEXPO | CloudEXPO bring together all this in a single location:

Attend DXWorldEXPO | CloudEXPO. Build your own custom experience. Learn about the world's latest technologies and chart your course to Digital Transformation.

22nd International DXWorldEXPO | CloudEXPO, taking place November 11-13, 2018, in New York City, will feature technical sessions from a rock star conference faculty and the leading industry players in the world.

Register for Full Conference "Gold Pass" ▸ Here (Expo Hall ▸ Here)

Sponsorship Opportunities Here

Speaking Opportunities Here

Sponsorship and Speaking Inquiries:

Download Slide Deck: ▸ Here

Cloud computing is now being embraced by a majority of enterprises of all sizes. Yesterday's debate about public vs. private has transformed into the reality of hybrid cloud: a recent survey shows that 74% of enterprises have a hybrid cloud strategy. Meanwhile, 94% of enterprises are using some form of XaaS - software, platform, and infrastructure as a service.

With major technology companies and startups seriously embracing Cloud strategies, now is the perfect time to attend and learn what is going on, contribute to the discussions, and ensure that your enterprise is on the right path to Digital Transformation.

Every Global 2000 enterprise in the world is now integrating cloud computing in some form into its IT development and operations. Midsize and small businesses are also migrating to the cloud in increasing numbers.

Register for Full Conference "Gold Pass" ▸ Here (Expo Hall ▸ Here)

Sponsorship Opportunities Here

Speaking Opportunities Here

Sponsorship and Speaking Inquiries:

Download Slide Deck: ▸ Here

Companies are each developing their unique mix of cloud technologies and services, forming multi-cloud and hybrid cloud architectures and deployments across all major industries. Cloud-driven thinking has become the norm in financial services, manufacturing, telco, healthcare, transportation, energy, media, entertainment, retail and other consumer industries, and the public sector.

Sponsorship Opportunities

DXWorldEXPO | CloudEXPO are the single show where technology buyers and vendors can meet to experience and discus cloud computing and all that it entails. Sponsors of DXWorldEXPO | CloudEXPO will benefit from unmatched branding, profile building and lead generation opportunities through:

  • Featured on-site presentation and ongoing on-demand webcast exposure to a captive audience of industry decision-makers.
  • Showcase exhibition during our new extended dedicated expo hours
  • Breakout Session Priority scheduling for Sponsors that have been guaranteed a 35-minute technical session
  • Online advertising on 4,5 million article pages in SYS-CON's i-Technology Publications
  • Capitalize on our Comprehensive Marketing efforts leading up to the show with print mailings, e-newsletters and extensive online media coverage.
  • Unprecedented PR Coverage: Unmatched editorial coverage on Cloud Computing Journal.
  • Tweetup to over 100,000 plus Twitter followers
  • Press releases sent on major wire services to over 500 industry analysts.

Secrets of Our Most Popular Sponsors and Exhibitors ▸ Here

For more information on sponsorship, exhibit, and keynote opportunities, contact

Sponsorship Opportunities Here

Download Slide Deck:Here

Speaking Opportunities

The upcoming 22nd International DXWorldEXPO | CloudEXPO November 11-13, 2018 in New York City, NY announces that its Call For Papers for speaking opportunities is now open.

Secrets of Our Most Popular Faculty Members ▸ Here

Submit your speaking proposal Here or by email

Download Slide Deck: ▸ Here


DXWorldEXPO LLC is a Lighthouse Point, Florida-based trade show company and the creator of DXWorldEXPODigital Transformation Conference & Expo. The company produces and presents CloudEXPO, DevOpsSummitFinTechEXPO Blockchain Event, the world's most influential conferences and trade shows.

Read the original blog entry...

More Stories By William Schmarzo

Bill Schmarzo, author of “Big Data: Understanding How Data Powers Big Business” and “Big Data MBA: Driving Business Strategies with Data Science”, is responsible for setting strategy and defining the Big Data service offerings for Dell EMC’s Big Data Practice.

As a CTO within Dell EMC’s 2,000+ person consulting organization, he works with organizations to identify where and how to start their big data journeys. He’s written white papers, is an avid blogger and is a frequent speaker on the use of Big Data and data science to power an organization’s key business initiatives. He is a University of San Francisco School of Management (SOM) Executive Fellow where he teaches the “Big Data MBA” course. Bill also just completed a research paper on “Determining The Economic Value of Data”. Onalytica recently ranked Bill as #4 Big Data Influencer worldwide.

Bill has over three decades of experience in data warehousing, BI and analytics. Bill authored the Vision Workshop methodology that links an organization’s strategic business initiatives with their supporting data and analytic requirements. Bill serves on the City of San Jose’s Technology Innovation Board, and on the faculties of The Data Warehouse Institute and Strata.

Previously, Bill was vice president of Analytics at Yahoo where he was responsible for the development of Yahoo’s Advertiser and Website analytics products, including the delivery of “actionable insights” through a holistic user experience. Before that, Bill oversaw the Analytic Applications business unit at Business Objects, including the development, marketing and sales of their industry-defining analytic applications.

Bill holds a Masters Business Administration from University of Iowa and a Bachelor of Science degree in Mathematics, Computer Science and Business Administration from Coe College.