Learn to read the source Luke... really! - Mon, Oct 23, 2023
Yet another example how in depth debugging and being able to access the source code of a library can help you solve a problem
Learn to read the source Luke… really!
Some month ago I wrote a short post about this quote and the benefits of being able to read the source code of a library you are using in your program. Like speed up trouble shooting, gaining more insights into the workings of the library and generally get to know how other developers write their code.
In this post I write about an example how having -and being able to read- the source code of a library helped me finding the cause of a really weird problem.
The problem
While coding some examples
on how to use OpenAI’s Completion API
I ran into authentication problems while running the code
although the setup by setting the environment variable OPENAI_API_KEY
to my access token seemed to be correct:
openai.error.AuthenticationError: Incorrect API key provided: ''. You can find your API key at https://platform.openai.com/account/api-keys.
After several attempts to solve the problem by creating a new token and setting the token in the code did not work, the next steps was to get more debug information.
Enabling debug logging
Under the hood, the Completion API uses the requests
library to send the requests. To see the http request being send debugging, logging for the urlib3
library needs to be enabled. In addition the log level of the http client of the http module
needs to be set to 1
as described here
:
import logging
import http.client as http_client
logging.basicConfig()
logging.getLogger().setLevel(logging.DEBUG)
requests_log = logging.getLogger("requests.packages.urllib3")
requests_log.setLevel(logging.DEBUG)
requests_log.propagate = True
http_client.HTTPConnection.debuglevel = 1
Caution: Running a program with this settings might output confidential information like access tokens to the console or log files.
Besides other useful information, I got the full http request:
send: b'CONNECT api.openai.com:443 HTTP/1.0\r\n\r\n'
send: b'POST /v1/completions HTTP/1.1\r\nHost: api.openai.com\r\nUser-Agent: OpenAI/v1 PythonBindings/0.28.1\r\nAccept-Encoding: gzip, deflate\r\nAccept: */*\r\nConnection: keep-alive\r\nX-OpenAI-Client-User-Agent: {"bindings_version": "0.28.1", "httplib": "requests", "lang": "python", "lang_version": "3.11.5", "platform": "Linux-6.5.6-200.fc38.x86_64-x86_64-with-glibc2.37", "publisher": "openai", "uname": "Linux 6.5.6-200.fc38.x86_64 #1 SMP PREEMPT_DYNAMIC Fri Oct 6 19:02:35 UTC 2023 x86_64 "}\r\nAuthorization: Basic Z2l0bGFiLWNpLXRva2VuOg==\r\nOpenAI-Debug: true\r\nContent-Type: application/json\r\nContent-Length: 106\r\n\r\n'
send: b'{"model": "gpt-3.5-turbo-instruct", "prompt": "To be or not to be", "temperature": 0.1, "max_tokens": 256}'
DEBUG:urllib3.connectionpool:https://api.openai.com:443 "POST /v1/completions HTTP/1.1" 401 252
The first obvious odd thing was that Basic
authorization was used instead of a Bearer token
as the documentation
suggests. Decoding the basic authorization token showed that the token was completely different from the api key and that changing the api token did not change the token. As I came to the conclusion that more trial and error would not help, I decided to start debugging.
Finding the right spot to start debugging
Of course the authentication error was accompanied by a lengthy stack trace covering above mentioned modules. So I started right at the bottom which was here:
File ".venv/lib64/python3.11/site-packages/openai/api_requestor.py", line 775, in _interpret_response_line
raise self.handle_error_response(
After setting a break point here I was able to take a look at the stack trace and the state of the objects on the stack. On the line above, I could see that the header was already set to the wrong type and token. My strategy was to go up the stack trace to see where the header was still correct, setting an additional breakpoint and then debug from there to find the spot where the header was changed.
Tracing the objects on the stack
Being able to see all objects and how they are manipulated as the program runs is really a powerful mechanism. After some time (which was much shorter than the trial and errors I talked about above) I was able to find the spot where the header was changed:
# Set environment's basic authentication if not explicitly set.
auth = request.auth
if self.trust_env and not auth and not self.auth:
auth = get_netrc_auth(request.url)
p = PreparedRequest()
p.prepare(
method=request.method.upper(),
...
auth=merge_setting(auth, self.auth),
...
)
return p
I omitted some lines for brevity. The complete code of the session class
can be found here
. While the header was still correct for the request
object, it was switched to the basic authentication of the environments <code>.netrc file</code>
in the prepare
method of the PreparedRequest
class.
After some more debugging I found that the netrcs auth information is used to override the header in the merge_setting
method.
There was still one mystery though: Why was there authentication information in the netrc file for the OpenAI API ? After looking at my .netrc file I found the answer:
default login git-ci-token
...
I added a default login in order to facilitate the authentication when using GitLabs glab CLI. And this default information was subsequently used. Mystery solved!
Conclusion
Once again the quote “The Source is what gives a Developer his power.” proved to be correct. Although Learn to read the documentation Luke might have been more appropriate, since the behavior was well documented :-) Well, I strive to be a better Padawan in the future.