Welcome to my site. Check the links at the bottom for where else you can find me.

I’m looking for a job right now as a Site Reliability Engineer or a similar role. Please see my resume

Job Search

End of an era and looking for what’s next Last month, I bid farewell to my coworkers at Instacart after an amazing six-year journey. I was fortunate to witness the company’s most significant period of growth and was consistently impressed by everyone’s ability to adapt swiftly to a changing market. After taking a few weeks off, I am now ready to start the search for my next big adventure. In the coming weeks, I am actively seeking opportunities in Site Reliability Engineer or Distributed Engineering roles. [Read More]

Hive UDFs in Ruby and Other Languages

Apache Hive is a very powerful tool for processing data stored in Apache Hadoop. Structured and unstructured data can be accessed, processed, and manipulated using a SQL-like query language. This architecture allows anyone with reasonable SQL knowledge to write complex jobs with little to no knowledge of Hadoop, HDFS, and Hive.

[Read More]
hive  sql 

Complex Counts in Hive

This came up on the Hive mailing list and I’m putting it here as a reminder to try it out. Here’s how to do complex count statements to simplify queries.

SELECT
    type
  , count(*)
  , count(DISTINCT u)
  , count(CASE WHEN plat=1 THEN u ELSE NULL END)
  , count(DISTINCT CASE WHEN plat=1 THEN u ELSE NULL END)
  , count(CASE WHEN (type=2 OR type=6) THEN u ELSE NULL END)
  , count(DISTINCT CASE WHEN (type=2 OR type=6) THEN u ELSE NULL END)
FROM
    t
WHERE
    dt in ("2012-1-12-02", "2012-1-12-03")
GROUP BY
    type
ORDER BY
    type
;
hive  sql